Class Charstring

Description

Hierarchy

Fields

Methods

Properties

Unit

classwork

Declaration

type Charstring = class(TObject)

Description

This class acts as a repository for some string functions that are used by various classes. These routines will work for Unicode strings encoded with UTF-8 as well as ASCII-encoded strings.

Hierarchy

TObject
Charstring

Overview

Methods

	`class function census(const thisString: string): integer;`
	`class function character(const thisString: string; index: integer): string;`
	`class function characterLength(const thisChar: string): integer;`
	`class function hashValueOf(const thisString: string): int64;`
	`class function chomp(var s: string; const delimiter: string): string;`
	`class function quote(const s: string; beginQuote: string = ''; endQuote: string = ''): string;`
	`class function toUpperCase(const s: string; const language: string = ''): string;`
	`class function toLowerCase(const s: string; const language: string = ''): string;`
	`class function positionOf(const needle, haystack: string; startPos: integer = 1): integer;`
	`class function lastPositionOf(const needle, haystack: string; startPos: integer = 1): integer;`
	`class function endsWith(const s: string; const ending: string): boolean;`
	`class function replaceEnding(const s: string; const ending: string; const replacement: string): string;`
	`class function isUppercase(const s: string): boolean;`
	`class function isLowercase(const s: string): boolean;`
	`class function expandCamelCase(const s: string): string;`
	`class function ofCharacter(const thisCharacter: string; const count: longword): string;`

Description

Methods

class function census(const thisString: string): integer;

Retrieve the number of characters in the given string.

This method differs from System.length in that it counts the number of UTF-8 characters in the string; NOT the total number of bytes. As of this writing, System.length assumes that an AnsiString has exactly one byte per character; since UTF-8 strings may use several bytes per character, this is not an accurate way to determine the total number of characters in a string.

Internally, this method calls LazUTF8.utf8Length and returns the result.

class function character(const thisString: string; index: integer): string;

Retrieve the character at the given character position within the string.

This method should be used to obtain the character at the given character position within a string. It is potentially incorrect to simply index the string, as you would do with a typical Pascal string. As of this writing, the compiler assumes that a typical AnsiString uses exactly one byte per character; thus, indexing the string will give you a byte offset into the string – but because strings encoded with UTF-8 may use more than one byte per character, this may not be the actual offset of a given character.

Because UTF-8 strings may use more than one byte per character, this method returns a UTF-8 encoded string rather than a char. The string will contain the UTF-8 encoded character at the given index. If index specifies a positive value, it is taken to refer to the desired character position relative to the beginning of the string; in such cases, the first character is always located at index 1. If index specifies a negative value, it is taken to refer to the desired character position relative to the end of the string; in such cases, the last character is always located at index -1. If index specifies a value that is greater than the number of characters in thisString, then this routine returns an empty string.

Internally, this method calls LazUTF8.utf8Copy.

class function characterLength(const thisChar: string): integer;

Determine the number of bytes occupied by the given character.

This method simply calls LazUtf8.utf8CharacterLength and returns the result. If thisChar is an empty string, then this method returns zero (0). If thisChar contains more than one character, the remaining characters are ignored.

class function hashValueOf(const thisString: string): int64;

Calculates the hash value of the given string.

This routine is useful for deriving sort keys for instances of ABinaryTree that accept strings as keys. It is therefore used heavily by AStringTree, ASymbolTable, and other classes.

The method makes use of the djb2 algorithm, originally written by Professor Daniel J. Bernstein and listed at http://www.cse.yorku.ca/˜oz/hash.html.

class function chomp(var s: string; const delimiter: string): string;

Collects characters from the string until the specified delimiter is encountered.

This routine helps to tokenize a string which contains a list of values that are separated by an arbitrary delimiter. The method "bites off" each value from the string, using the value of delimiter to determine where to stop removing characters. s is truncated from the left side, removing both the delimiter and the value.

Successive calls to this routine using the same string will result in parsing every value from the string.

Returns

The next value removed from the string, or an empty string if there are no more values to remove. An empty string is also returned if s or delimiter are empty strings. If no instance of delimiter is found in s, this routine will simply return s.

class function quote(const s: string; beginQuote: string = ''; endQuote: string = ''): string;

Return the quoted form of s.

This method adds beginQuote and endQuote to s and returns the result. If either beginQuote or endQuote are empty strings, the values in charsBeginQuoteCharacter and charsEndQuoteCharacter are used.

class function toUpperCase(const s: string; const language: string = ''): string;

Converts s to uppercase.

This method calls LazUtf8.utf8LowerCase. language should be specified using the two-character given in ISO 639-1 (e.g., 'en' for English, 'tr' for Turkish, etc.), but it may also be omitted or left blank, in which case the locale is ignored.

If s is an empty string, or if s has no lowercase characters, this routine will return s unchanged.

class function toLowerCase(const s: string; const language: string = ''): string;

Converts s to lowercase.

This method calls LazUtf8.utf8UpperCase. language should be specified using the two-character given in ISO 639-1 (e.g., 'en' for English, 'tr' for Turkish, etc.), but it may also be omitted or left blank, in which case the locale is ignored.

If s is an empty string, or if s has no uppercase characters, this routine will return s unchanged.

class function positionOf(const needle, haystack: string; startPos: integer = 1): integer;

Find the first occurrence of needle in haystack, optionally starting at the specified character offset.

This method calls LazUtf8.utf8Pos. If needle is found, the character offset of its location within haystack is returned. The first character of haystack will always be at position 1.

If needle is not found, then this routine returns zero (0).

class function lastPositionOf(const needle, haystack: string; startPos: integer = 1): integer;

Find the last occurrence of needle in haystack, optionally starting at the spcified character offset.

This method calls LazUtf8.utf8Pos. If needle is found, the character offset of its location within haystack is returned. The last character of haystack will always be at position Charstring.census.

If needle is not found, then this routine returns zero (0).

class function endsWith(const s: string; const ending: string): boolean;

Determines whether s ends with the string specified by ending.

This method checks to see whether the characters at the end of s all match those in ending; if so, it returns True; otherwise, it returns False. If s or ending are empty strings, this routine returns False.

class function replaceEnding(const s: string; const ending: string; const replacement: string): string;

Determines whether s ends with the string specified by ending and, if so, replaces it with replacement.

If s does not end with ending, or if any of the strings passed are empty, this routine returns s unchanged.

class function isUppercase(const s: string): boolean;

Determine if s consists entirely of uppercase characters.

This method might also be said to determine if s contains no lower case characters. It compares every character in s against those specified by charsLowercaseLetters. If it finds a match, then the routine returns False; otherwise, it returns True.

class function isLowercase(const s: string): boolean;

Determine if s consists entirely of lowercase characters.

This method might also be said to determine if s contains no upper case characters. It compares every character in s against those specified by charsUppercaseLetters. If it finds a match, then the routine returns False; otherwise, it returns True.

class function expandCamelCase(const s: string): string;

Expands a string that is written in CamelCase, adding spaces where a capital letter follows a lowercase letter.

This method processes each character in the string and determines whether that character is a capital letter by calling CharString.isUppercase. If the character is uppercase and follows a lower-case letter, then a space is added to the output.

class function ofCharacter(const thisCharacter: string; const count: longword): string;

Return a string that consists of the specified character repeated the specified number of times.

This method constructs a new string that contains count repetitions of the character specified by thisCharacter. thisCharacter may specify a UTF-8 or ASCII-encoded character. Only the first such character from thisCharacter is used; all others are ignored.

If thisCharacter is an empty string or count is zero (0), this method does nothing and returns an empty string.

Generated by PasDoc 0.13.0 on 2015-01-10 17:13:18

causerie

Class Charstring

Unit

Declaration

Description

Hierarchy

Overview

Methods

Description

Methods

Returns