Class Charstring

DescriptionHierarchyFieldsMethodsProperties

Unit

Declaration

type Charstring = class(TObject)

Description

This class acts as a repository for some string functions that are used by various classes. These routines will work for Unicode strings encoded with UTF-8 as well as ASCII-encoded strings.

Hierarchy

  • TObject
  • Charstring

Overview

Methods

Public class function census(const thisString: string): integer;
Public class function character(const thisString: string; index: integer): string;
Public class function characterLength(const thisChar: string): integer;
Public class function hashValueOf(const thisString: string): int64;
Public class function chomp(var s: string; const delimiter: string): string;
Public class function quote(const s: string; beginQuote: string = ''; endQuote: string = ''): string;
Public class function toUpperCase(const s: string; const language: string = ''): string;
Public class function toLowerCase(const s: string; const language: string = ''): string;
Public class function positionOf(const needle, haystack: string; startPos: integer = 1): integer;
Public class function lastPositionOf(const needle, haystack: string; startPos: integer = 1): integer;
Public class function endsWith(const s: string; const ending: string): boolean;
Public class function replaceEnding(const s: string; const ending: string; const replacement: string): string;
Public class function isUppercase(const s: string): boolean;
Public class function isLowercase(const s: string): boolean;
Public class function expandCamelCase(const s: string): string;
Public class function ofCharacter(const thisCharacter: string; const count: longword): string;

Description

Methods

Public class function census(const thisString: string): integer;

Retrieve the number of characters in the given string.

This method differs from System.length in that it counts the number of UTF-8 characters in the string; NOT the total number of bytes. As of this writing, System.length assumes that an AnsiString has exactly one byte per character; since UTF-8 strings may use several bytes per character, this is not an accurate way to determine the total number of characters in a string.

Internally, this method calls LazUTF8.utf8Length and returns the result.

Public class function character(const thisString: string; index: integer): string;

Retrieve the character at the given character position within the string.

This method should be used to obtain the character at the given character position within a string. It is potentially incorrect to simply index the string, as you would do with a typical Pascal string. As of this writing, the compiler assumes that a typical AnsiString uses exactly one byte per character; thus, indexing the string will give you a byte offset into the string – but because strings encoded with UTF-8 may use more than one byte per character, this may not be the actual offset of a given character.

Because UTF-8 strings may use more than one byte per character, this method returns a UTF-8 encoded string rather than a char. The string will contain the UTF-8 encoded character at the given index. If index specifies a positive value, it is taken to refer to the desired character position relative to the beginning of the string; in such cases, the first character is always located at index 1. If index specifies a negative value, it is taken to refer to the desired character position relative to the end of the string; in such cases, the last character is always located at index -1. If index specifies a value that is greater than the number of characters in thisString, then this routine returns an empty string.

Internally, this method calls LazUTF8.utf8Copy.

Public class function characterLength(const thisChar: string): integer;

Determine the number of bytes occupied by the given character.

This method simply calls LazUtf8.utf8CharacterLength and returns the result. If thisChar is an empty string, then this method returns zero (0). If thisChar contains more than one character, the remaining characters are ignored.

Public class function hashValueOf(const thisString: string): int64;

Calculates the hash value of the given string.

This routine is useful for deriving sort keys for instances of ABinaryTree that accept strings as keys. It is therefore used heavily by AStringTree, ASymbolTable, and other classes.

The method makes use of the djb2 algorithm, originally written by Professor Daniel J. Bernstein and listed at http://www.cse.yorku.ca/˜oz/hash.html.

Public class function chomp(var s: string; const delimiter: string): string;

Collects characters from the string until the specified delimiter is encountered.

This routine helps to tokenize a string which contains a list of values that are separated by an arbitrary delimiter. The method "bites off" each value from the string, using the value of delimiter to determine where to stop removing characters. s is truncated from the left side, removing both the delimiter and the value.

Successive calls to this routine using the same string will result in parsing every value from the string.

Returns

The next value removed from the string, or an empty string if there are no more values to remove. An empty string is also returned if s or delimiter are empty strings. If no instance of delimiter is found in s, this routine will simply return s.

Public class function quote(const s: string; beginQuote: string = ''; endQuote: string = ''): string;

Return the quoted form of s.

This method adds beginQuote and endQuote to s and returns the result. If either beginQuote or endQuote are empty strings, the values in charsBeginQuoteCharacter and charsEndQuoteCharacter are used.

Public class function toUpperCase(const s: string; const language: string = ''): string;

Converts s to uppercase.

This method calls LazUtf8.utf8LowerCase. language should be specified using the two-character given in ISO 639-1 (e.g., 'en' for English, 'tr' for Turkish, etc.), but it may also be omitted or left blank, in which case the locale is ignored.

If s is an empty string, or if s has no lowercase characters, this routine will return s unchanged.

Public class function toLowerCase(const s: string; const language: string = ''): string;

Converts s to lowercase.

This method calls LazUtf8.utf8UpperCase. language should be specified using the two-character given in ISO 639-1 (e.g., 'en' for English, 'tr' for Turkish, etc.), but it may also be omitted or left blank, in which case the locale is ignored.

If s is an empty string, or if s has no uppercase characters, this routine will return s unchanged.

Public class function positionOf(const needle, haystack: string; startPos: integer = 1): integer;

Find the first occurrence of needle in haystack, optionally starting at the specified character offset.

This method calls LazUtf8.utf8Pos. If needle is found, the character offset of its location within haystack is returned. The first character of haystack will always be at position 1.

If needle is not found, then this routine returns zero (0).

Public class function lastPositionOf(const needle, haystack: string; startPos: integer = 1): integer;

Find the last occurrence of needle in haystack, optionally starting at the spcified character offset.

This method calls LazUtf8.utf8Pos. If needle is found, the character offset of its location within haystack is returned. The last character of haystack will always be at position Charstring.census.

If needle is not found, then this routine returns zero (0).

Public class function endsWith(const s: string; const ending: string): boolean;

Determines whether s ends with the string specified by ending.

This method checks to see whether the characters at the end of s all match those in ending; if so, it returns True; otherwise, it returns False. If s or ending are empty strings, this routine returns False.

Public class function replaceEnding(const s: string; const ending: string; const replacement: string): string;

Determines whether s ends with the string specified by ending and, if so, replaces it with replacement.

If s does not end with ending, or if any of the strings passed are empty, this routine returns s unchanged.

Public class function isUppercase(const s: string): boolean;

Determine if s consists entirely of uppercase characters.

This method might also be said to determine if s contains no lower case characters. It compares every character in s against those specified by charsLowercaseLetters. If it finds a match, then the routine returns False; otherwise, it returns True.

Public class function isLowercase(const s: string): boolean;

Determine if s consists entirely of lowercase characters.

This method might also be said to determine if s contains no upper case characters. It compares every character in s against those specified by charsUppercaseLetters. If it finds a match, then the routine returns False; otherwise, it returns True.

Public class function expandCamelCase(const s: string): string;

Expands a string that is written in CamelCase, adding spaces where a capital letter follows a lowercase letter.

This method processes each character in the string and determines whether that character is a capital letter by calling CharString.isUppercase. If the character is uppercase and follows a lower-case letter, then a space is added to the output.

Public class function ofCharacter(const thisCharacter: string; const count: longword): string;

Return a string that consists of the specified character repeated the specified number of times.

This method constructs a new string that contains count repetitions of the character specified by thisCharacter. thisCharacter may specify a UTF-8 or ASCII-encoded character. Only the first such character from thisCharacter is used; all others are ignored.

If thisCharacter is an empty string or count is zero (0), this method does nothing and returns an empty string.


Generated by PasDoc 0.13.0 on 2015-01-10 17:13:18