@node Characters, Conses, Numbers (Numbers), Top @chapter Characters @menu * Character Concepts:: * Characters Dictionary:: @end menu @node Character Concepts, Characters Dictionary, Characters, Characters @section Character Concepts @c including concept-characters @menu * Introduction to Characters:: * Introduction to Scripts and Repertoires:: * Character Attributes:: * Character Categories:: * Identity of Characters:: * Ordering of Characters:: * Character Names:: * Treatment of Newline during Input and Output:: * Character Encodings:: * Documentation of Implementation-Defined Scripts:: @end menu @node Introduction to Characters, Introduction to Scripts and Repertoires, Character Concepts, Character Concepts @subsection Introduction to Characters A @i{character} @IGindex{character} is an @i{object} that represents a unitary token (@i{e.g.}, a letter, a special symbol, or a ``control character'') in an aggregate quantity of text (@i{e.g.}, a @i{string} or a text @i{stream}). @r{Common Lisp} allows an implementation to provide support for international language @i{characters} as well as @i{characters} used in specialized arenas (@i{e.g.}, mathematics). The following figures contain lists of @i{defined names} applicable to @i{characters}. Figure 13--1 lists some @i{defined names} relating to @i{character} @i{attributes} and @i{character} @i{predicates}. @group @noindent @w{ alpha-char-p char-not-equal char> } @w{ alphanumericp char-not-greaterp char>= } @w{ both-case-p char-not-lessp digit-char-p } @w{ char-code-limit char/= graphic-char-p } @w{ char-equal char< lower-case-p } @w{ char-greaterp char<= standard-char-p } @w{ char-lessp char= upper-case-p } @noindent @w{ Figure 13--1: Character defined names -- 1 } @end group Figure 13--2 lists some @i{character} construction and conversion @i{defined names}. @group @noindent @w{ char-code char-name code-char } @w{ char-downcase char-upcase digit-char } @w{ char-int character name-char } @noindent @w{ Figure 13--2: Character defined names -- 2} @end group @node Introduction to Scripts and Repertoires, Character Attributes, Introduction to Characters, Character Concepts @subsection Introduction to Scripts and Repertoires @menu * Character Scripts:: * Character Repertoires:: @end menu @node Character Scripts, Character Repertoires, Introduction to Scripts and Repertoires, Introduction to Scripts and Repertoires @subsubsection Character Scripts A @i{script} is one of possibly several sets that form an @i{exhaustive partition} of the type @b{character}. The number of such sets and boundaries between them is @i{implementation-defined}. @r{Common Lisp} does not require these sets to be @i{types}, but an @i{implementation} is permitted to define such @i{types} as an extension. Since no @i{character} from one @i{script} can ever be a member of another @i{script}, it is generally more useful to speak about @i{character} @i{repertoires}. Although the term ``@i{script}'' is chosen for definitional compatibility with ISO terminology, no @i{conforming implementation} is required to use any particular @i{scripts} standardized by ISO or by any other standards organization. Whether and how the @i{script} or @i{scripts} used by any given @i{implementation} are named is @i{implementation-dependent}. @node Character Repertoires, , Character Scripts, Introduction to Scripts and Repertoires @subsubsection Character Repertoires A @i{repertoire} @IGindex{repertoire} is a @i{type specifier} for a @i{subtype} of @i{type} @b{character}. This term is generally used when describing a collection of @i{characters} independent of their coding. @i{Characters} in @i{repertoires} are only identified by name, by @i{glyph}, or by character description. A @i{repertoire} can contain @i{characters} from several @i{scripts}, and a @i{character} can appear in more than one @i{repertoire}. For some examples of @i{repertoires}, see the coded character standards ISO 8859/1, ISO 8859/2, and ISO 6937/2. Note, however, that although the term ``@i{repertoire}'' is chosen for definitional compatibility with ISO terminology, no @i{conforming implementation} is required to use @i{repertoires} standardized by ISO or any other standards organization. @node Character Attributes, Character Categories, Introduction to Scripts and Repertoires, Character Concepts @subsection Character Attributes @i{Characters} have only one @i{standardized} @i{attribute}: a @i{code}. A @i{character}'s @i{code} is a non-negative @i{integer}. This @i{code} is composed from a character @i{script} and a character label in an @i{implementation-dependent} way. See the @i{functions} @b{char-code} and @b{code-char}. Additional, @i{implementation-defined} @i{attributes} of @i{characters} are also permitted so that, for example, two @i{characters} with the same @i{code} may differ in some other, @i{implementation-defined} way. For any @i{implementation-defined} @i{attribute} there is a distinguished value called the @i{null} @IGindex{null} value for that @i{attribute}. A @i{character} for which each @i{implementation-defined} @i{attribute} has the null value for that @i{attribute} is called a @i{simple} @i{character}. If the @i{implementation} has no @i{implementation-defined} @i{attributes}, then all @i{characters} are @i{simple} @i{characters}. @node Character Categories, Identity of Characters, Character Attributes, Character Concepts @subsection Character Categories There are several (overlapping) categories of @i{characters} that have no formally associated @i{type} but that are nevertheless useful to name. They include @i{graphic} @i{characters}, @i{alphabetic}_1 @i{characters}, @i{characters} with @i{case} (@i{uppercase} and @i{lowercase} @i{characters}), @i{numeric} @i{characters}, @i{alphanumeric} @i{characters}, and @i{digits} (in a given @i{radix}). For each @i{implementation-defined} @i{attribute} of a @i{character}, the documentation for that @i{implementation} must specify whether @i{characters} that differ only in that @i{attribute} are permitted to differ in whether are not they are members of one of the aforementioned categories. Note that these terms are defined independently of any special syntax which might have been enabled in the @i{current readtable}. @menu * Graphic Characters:: * Alphabetic Characters:: * Characters With Case:: * Uppercase Characters:: * Lowercase Characters:: * Corresponding Characters in the Other Case:: * Case of Implementation-Defined Characters:: * Numeric Characters:: * Alphanumeric Characters:: * Digits in a Radix:: @end menu @node Graphic Characters, Alphabetic Characters, Character Categories, Character Categories @subsubsection Graphic Characters @i{Characters} that are classified as @i{graphic} @IGindex{graphic} , or displayable, are each associated with a glyph, a visual representation of the @i{character}. A @i{graphic} @i{character} is one that has a standard textual representation as a single @i{glyph}, such as @t{A} or @t{*} or @t{=}. @i{Space}, which effectively has a blank @i{glyph}, is defined to be a @i{graphic}. Of the @i{standard characters}, @i{newline} is @i{non-graphic} and all others are @i{graphic}; see @ref{Standard Characters}. @i{Characters} that are not @i{graphic} are called @i{non-graphic} @IGindex{non-graphic} . @i{Non-graphic} @i{characters} are sometimes informally called ``formatting characters'' or ``control characters.'' @t{#\Backspace}, @t{#\Tab}, @t{#\Rubout}, @t{#\Linefeed}, @t{#\Return}, and @t{#\Page}, if they are supported by the @i{implementation}, are @i{non-graphic}. @node Alphabetic Characters, Characters With Case, Graphic Characters, Character Categories @subsubsection Alphabetic Characters The @i{alphabetic}_1 @i{characters} are a subset of the @i{graphic} @i{characters}. Of the @i{standard characters}, only these are the @i{alphabetic}_1 @i{characters}: @t{A B C D E F G H I J K L M N O P Q R S T U V W X Y Z} @t{a b c d e f g h i j k l m n o p q r s t u v w x y z} Any @i{implementation-defined} @i{character} that has @i{case} must be @i{alphabetic}_1. For each @i{implementation-defined} @i{graphic} @i{character} that has no @i{case}, it is @i{implementation-defined} whether that @i{character} is @i{alphabetic}_1. @node Characters With Case, Uppercase Characters, Alphabetic Characters, Character Categories @subsubsection Characters With Case The @i{characters} with @i{case} are a subset of the @i{alphabetic}_1 @i{characters}. A @i{character} with @i{case} has the property of being either @i{uppercase} or @i{lowercase}. Every @i{character} with @i{case} is in one-to-one correspondence with some other @i{character} with the opposite @i{case}. @node Uppercase Characters, Lowercase Characters, Characters With Case, Character Categories @subsubsection Uppercase Characters An uppercase @i{character} is one that has a corresponding @i{lowercase} @i{character} that is @i{different} (and can be obtained using @b{char-downcase}). Of the @i{standard characters}, only these are @i{uppercase} @i{characters}: @t{A B C D E F G H I J K L M N O P Q R S T U V W X Y Z} @node Lowercase Characters, Corresponding Characters in the Other Case, Uppercase Characters, Character Categories @subsubsection Lowercase Characters A lowercase @i{character} is one that has a corresponding @i{uppercase} @i{character} that is @i{different} (and can be obtained using @b{char-upcase}). Of the @i{standard characters}, only these are @i{lowercase} @i{characters}: @t{a b c d e f g h i j k l m n o p q r s t u v w x y z} @node Corresponding Characters in the Other Case, Case of Implementation-Defined Characters, Lowercase Characters, Character Categories @subsubsection Corresponding Characters in the Other Case The @i{uppercase} @i{standard characters} @t{A} through @t{Z} mentioned above respectively correspond to the @i{lowercase} @i{standard characters} @t{a} through @t{z} mentioned above. For example, the @i{uppercase} @i{character} @t{E} corresponds to the @i{lowercase} @i{character} @t{e}, and vice versa. @node Case of Implementation-Defined Characters, Numeric Characters, Corresponding Characters in the Other Case, Character Categories @subsubsection Case of Implementation-Defined Characters An @i{implementation} may define that other @i{implementation-defined} @i{graphic} @i{characters} have @i{case}. Such definitions must always be done in pairs---one @i{uppercase} @i{character} in one-to-one @i{correspondence} with one @i{lowercase} @i{character}. @node Numeric Characters, Alphanumeric Characters, Case of Implementation-Defined Characters, Character Categories @subsubsection Numeric Characters The @i{numeric} @i{characters} are a subset of the @i{graphic} @i{characters}. Of the @i{standard characters}, only these are @i{numeric} @i{characters}: @t{0 1 2 3 4 5 6 7 8 9} For each @i{implementation-defined} @i{graphic} @i{character} that has no @i{case}, the @i{implementation} must define whether or not it is a @i{numeric} @i{character}. @node Alphanumeric Characters, Digits in a Radix, Numeric Characters, Character Categories @subsubsection Alphanumeric Characters The set of @i{alphanumeric} @i{characters} is the union of the set of @i{alphabetic}_1 @i{characters} and the set of @i{numeric} @i{characters}. @node Digits in a Radix, , Alphanumeric Characters, Character Categories @subsubsection Digits in a Radix What qualifies as a @i{digit} depends on the @i{radix} (an @i{integer} between @t{2} and @t{36}, inclusive). The potential @i{digits} are: @t{0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z} Their respective weights are @t{0}, @t{1}, @t{2}, ... @t{35}. In any given radix n, only the first n potential @i{digits} are considered to be @i{digits}. For example, the digits in radix @t{2} are @t{0} and @t{1}, the digits in radix @t{10} are @t{0} through @t{9}, and the digits in radix @t{16} are @t{0} through @t{F}. @i{Case} is not significant in @i{digits}; for example, in radix @t{16}, both @t{F} and @t{f} are @i{digits} with weight @t{15}. @node Identity of Characters, Ordering of Characters, Character Categories, Character Concepts @subsection Identity of Characters Two @i{characters} that are @b{eql}, @b{char=}, or @b{char-equal} are not necessarily @b{eq}. @node Ordering of Characters, Character Names, Identity of Characters, Character Concepts @subsection Ordering of Characters The total ordering on @i{characters} is guaranteed to have the following properties: @table @asis @item @t{*} If two @i{characters} have the same @i{implementation-defined} @i{attributes}, then their ordering by @b{char<} is consistent with the numerical ordering by the predicate @b{<} on their code @i{attributes}. @item @t{*} If two @i{characters} differ in any @i{attribute}, then they are not @b{char=}. [Reviewer Note by Barmar: I wonder if we should say that the ordering may be dependent on the @i{implementation-defined} @i{attributes}.] @item @t{*} The total ordering is not necessarily the same as the total ordering on the @i{integers} produced by applying @b{char-int} to the @i{characters}. @item @t{*} While @i{alphabetic}_1 @i{standard characters} of a given @i{case} must obey a partial ordering, they need not be contiguous; it is permissible for @i{uppercase} and @i{lowercase} @i{characters} to be interleaved. Thus @t{(char<= #\a x #\z)} is not a valid way of determining whether or not @t{x} is a @i{lowercase} @i{character}. @end table Of the @i{standard characters}, those which are @i{alphanumeric} obey the following partial ordering: @example A, char<=, char>=, @subheading char-equal, char-not-equal, char-lessp, char-greaterp, char-not-greaterp, @subheading char-not-lessp @flushright @i{[Function]} @end flushright @code{{char=}} @i{{&rest} characters^+} @result{} @i{generalized-boolean} @code{{char/=}} @i{{&rest} characters^+} @result{} @i{generalized-boolean} @code{{char<}} @i{{&rest} characters^+} @result{} @i{generalized-boolean} @code{{char>}} @i{{&rest} characters^+} @result{} @i{generalized-boolean} @code{{char<=}} @i{{&rest} characters^+} @result{} @i{generalized-boolean} @code{{char>=}} @i{{&rest} characters^+} @result{} @i{generalized-boolean} @code{char-equal} @i{{&rest} characters^+} @result{} @i{generalized-boolean} @code{char-not-equal} @i{{&rest} characters^+} @result{} @i{generalized-boolean} @code{char-lessp} @i{{&rest} characters^+} @result{} @i{generalized-boolean} @code{char-greaterp} @i{{&rest} characters^+} @result{} @i{generalized-boolean} @code{char-not-greaterp} @i{{&rest} characters^+} @result{} @i{generalized-boolean} @code{char-not-lessp} @i{{&rest} characters^+} @result{} @i{generalized-boolean} @subsubheading Arguments and Values:: @i{character}---a @i{character}. @i{generalized-boolean}---a @i{generalized boolean}. @subsubheading Description:: These predicates compare @i{characters}. @b{char=} returns @i{true} if all @i{characters} are the @i{same}; otherwise, it returns @i{false}. If two @i{characters} differ in any @i{implementation-defined} @i{attributes}, then they are not @b{char=}. @b{char/=} returns @i{true} if all @i{characters} are different; otherwise, it returns @i{false}. @b{char<} returns @i{true} if the @i{characters} are monotonically increasing; otherwise, it returns @i{false}. If two @i{characters} have @i{identical} @i{implementation-defined} @i{attributes}, then their ordering by @b{char<} is consistent with the numerical ordering by the predicate @t{<} on their @i{codes}. @b{char>} returns @i{true} if the @i{characters} are monotonically decreasing; otherwise, it returns @i{false}. If two @i{characters} have @i{identical} @i{implementation-defined} @i{attributes}, then their ordering by @b{char>} is consistent with the numerical ordering by the predicate @t{>} on their @i{codes}. @b{char<=} returns @i{true} if the @i{characters} are monotonically nondecreasing; otherwise, it returns @i{false}. If two @i{characters} have @i{identical} @i{implementation-defined} @i{attributes}, then their ordering by @b{char<=} is consistent with the numerical ordering by the predicate @t{<=} on their @i{codes}. @b{char>=} returns @i{true} if the @i{characters} are monotonically nonincreasing; otherwise, it returns @i{false}. If two @i{characters} have @i{identical} @i{implementation-defined} @i{attributes}, then their ordering by @b{char>=} is consistent with the numerical ordering by the predicate @t{>=} on their @i{codes}. @b{char-equal}, @b{char-not-equal}, @b{char-lessp}, @b{char-greaterp}, @b{char-not-greaterp}, and @b{char-not-lessp} are similar to @b{char=}, @b{char/=}, @b{char<}, @b{char>}, @b{char<=}, @b{char>=}, respectively, except that they ignore differences in @i{case} and might have an @i{implementation-defined} behavior for @i{non-simple} @i{characters}. For example, an @i{implementation} might define that @b{char-equal}, @i{etc.} ignore certain @i{implementation-defined} @i{attributes}. The effect, if any, of each @i{implementation-defined} @i{attribute} upon these functions must be specified as part of the definition of that @i{attribute}. @subsubheading Examples:: @example (char= #\d #\d) @result{} @i{true} (char= #\A #\a) @result{} @i{false} (char= #\d #\x) @result{} @i{false} (char= #\d #\D) @result{} @i{false} (char/= #\d #\d) @result{} @i{false} (char/= #\d #\x) @result{} @i{true} (char/= #\d #\D) @result{} @i{true} (char= #\d #\d #\d #\d) @result{} @i{true} (char/= #\d #\d #\d #\d) @result{} @i{false} (char= #\d #\d #\x #\d) @result{} @i{false} (char/= #\d #\d #\x #\d) @result{} @i{false} (char= #\d #\y #\x #\c) @result{} @i{false} (char/= #\d #\y #\x #\c) @result{} @i{true} (char= #\d #\c #\d) @result{} @i{false} (char/= #\d #\c #\d) @result{} @i{false} (char< #\d #\x) @result{} @i{true} (char<= #\d #\x) @result{} @i{true} (char< #\d #\d) @result{} @i{false} (char<= #\d #\d) @result{} @i{true} (char< #\a #\e #\y #\z) @result{} @i{true} (char<= #\a #\e #\y #\z) @result{} @i{true} (char< #\a #\e #\e #\y) @result{} @i{false} (char<= #\a #\e #\e #\y) @result{} @i{true} (char> #\e #\d) @result{} @i{true} (char>= #\e #\d) @result{} @i{true} (char> #\d #\c #\b #\a) @result{} @i{true} (char>= #\d #\c #\b #\a) @result{} @i{true} (char> #\d #\d #\c #\a) @result{} @i{false} (char>= #\d #\d #\c #\a) @result{} @i{true} (char> #\e #\d #\b #\c #\a) @result{} @i{false} (char>= #\e #\d #\b #\c #\a) @result{} @i{false} (char> #\z #\A) @result{} @i{implementation-dependent} (char> #\Z #\a) @result{} @i{implementation-dependent} (char-equal #\A #\a) @result{} @i{true} (stable-sort (list #\b #\A #\B #\a #\c #\C) #'char-lessp) @result{} (#\A #\a #\b #\B #\c #\C) (stable-sort (list #\b #\A #\B #\a #\c #\C) #'char<) @result{} (#\A #\B #\C #\a #\b #\c) ;Implementation A @result{} (#\a #\b #\c #\A #\B #\C) ;Implementation B @result{} (#\a #\A #\b #\B #\c #\C) ;Implementation C @result{} (#\A #\a #\B #\b #\C #\c) ;Implementation D @result{} (#\A #\B #\a #\b #\C #\c) ;Implementation E @end example @subsubheading Exceptional Situations:: Should signal an error of @i{type} @b{program-error} if at least one @i{character} is not supplied. @subsubheading See Also:: @ref{Character Syntax}, @ref{Documentation of Implementation-Defined Scripts} @subsubheading Notes:: If characters differ in their @i{code} @i{attribute} or any @i{implementation-defined} @i{attribute}, they are considered to be different by @b{char=}. There is no requirement that @t{(eq c1 c2)} be true merely because @t{(char= c1 c2)} is @i{true}. While @b{eq} can distinguish two @i{characters} that @b{char=} does not, it is distinguishing them not as @i{characters}, but in some sense on the basis of a lower level implementation characteristic. If @t{(eq c1 c2)} is @i{true}, then @t{(char= c1 c2)} is also true. @b{eql} and @b{equal} compare @i{characters} in the same way that @b{char=} does. The manner in which @i{case} is used by @b{char-equal}, @b{char-not-equal}, @b{char-lessp}, @b{char-greaterp}, @b{char-not-greaterp}, and @b{char-not-lessp} implies an ordering for @i{standard characters} such that @t{A=a}, @t{B=b}, and so on, up to @t{Z=z}, and furthermore either @t{9 and <@i{Space}> have the respective names @t{"Newline"} and @t{"Space"}. The @i{semi-standard} @i{characters} <@i{Tab}>, <@i{Page}>, <@i{Rubout}>, <@i{Linefeed}>, <@i{Return}>, and <@i{Backspace}> (if they are supported by the @i{implementation}) have the respective names @t{"Tab"}, @t{"Page"}, @t{"Rubout"}, @t{"Linefeed"}, @t{"Return"}, and @t{"Backspace"} (in the indicated case, even though name lookup by ``@t{#\}'' and by the @i{function} @b{name-char} is not case sensitive). @subsubheading Examples:: @example (char-name #\ ) @result{} "Space" (char-name #\Space) @result{} "Space" (char-name #\Page) @result{} "Page" (char-name #\a) @result{} NIL @i{OR}@result{} "LOWERCASE-a" @i{OR}@result{} "Small-A" @i{OR}@result{} "LA01" (char-name #\A) @result{} NIL @i{OR}@result{} "UPPERCASE-A" @i{OR}@result{} "Capital-A" @i{OR}@result{} "LA02" ;; Even though its CHAR-NAME can vary, #\A prints as #\A (prin1-to-string (read-from-string (format nil "#\\~A" (or (char-name #\A) "A")))) @result{} "#\\A" @end example @subsubheading Exceptional Situations:: Should signal an error of @i{type} @b{type-error} if @i{character} is not a @i{character}. @subsubheading See Also:: @ref{name-char} , @ref{Printing Characters} @subsubheading Notes:: @i{Non-graphic} @i{characters} having @i{names} are written by the @i{Lisp printer} as ``@t{#\}'' followed by the their @i{name}; see @ref{Printing Characters}. @node name-char, , char-name, Characters Dictionary @subsection name-char [Function] @code{name-char} @i{name} @result{} @i{char-p} @subsubheading Arguments and Values:: @i{name}---a @i{string designator}. @i{char-p}---a @i{character} or @b{nil}. @subsubheading Description:: Returns the @i{character} @i{object} whose @i{name} is @i{name} (as determined by @b{string-equal}---@i{i.e.}, lookup is not case sensitive). If such a @i{character} does not exist, @b{nil} is returned. @subsubheading Examples:: @example (name-char 'space) @result{} #\Space (name-char "space") @result{} #\Space (name-char "Space") @result{} #\Space (let ((x (char-name #\a))) (or (not x) (eql (name-char x) #\a))) @result{} @i{true} @end example @subsubheading Exceptional Situations:: Should signal an error of @i{type} @b{type-error} if @i{name} is not a @i{string designator}. @subsubheading See Also:: @ref{char-name} @c end of including dict-characters @c %**end of chapter