Class AParsedLanguage

Description

Hierarchy

Fields

Methods

Properties

Unit

parsing

Declaration

type AParsedLanguage = class(AnObject)

Description

This class serves as the basis for defining a language that will be parsed. It is designed to provide a single point of reference for a parser to use as a source stream is processed; as such, it provides routines to determine the category of a given character or token, and a way to look up rules.

To define a new language to parse, simply definee a descendant of this class and implement the methods that are called on construction: AParsedLanguage.defineCharacterCategories, AParsedLanguage.defineOpcodes, and AParsedLanguage.defineRules.

Hierarchy

TObject
AnObject
AParsedLanguage

Overview

Fields

	`myCaseAwareness: boolean;`
	`myCharacterCategories: array[CHARCAT_SPECIAL..CHARCAT_EOS] of string;`
	`MyOpcodes: AnOpcodeDictionary;`
	`MySyntaxRules: ASyntaxRuleset;`

Methods

	`procedure defineCharacterCategories; virtual;`
	`procedure defineOpcodes; virtual; abstract;`
	`procedure defineRules; virtual;`
	`constructor new; override;`
	`function init: boolean; override;`
	`destructor destroy; override;`
	`function categoryOf(const ch: string): ACharacterCategory; virtual;`
	`function characterIn(const ch: string; const categoryList: array of ACharacterCategory): ACharacterCategory; virtual;`
	`function opcodeFor(const tokenString: string): TOpcode; virtual;`
	`function caseAware: boolean;`
	`function characterCategory(const category: ACharacterCategory): string; virtual;`
	`function setCharacterCategory(const category: ACharacterCategory; const newSet: string): string; virtual;`
	`function Opcodes: AnOpcodeDictionary;`
	`function SyntaxRule(const rule: TSortKey): ASyntaxRule;`
	`function SyntaxRules: ASyntaxRuleset;`

Description

Fields

	`myCaseAwareness: boolean;`
Specifies whether or not the language is case aware

	`myCharacterCategories: array[CHARCAT_SPECIAL..CHARCAT_EOS] of string;`
Stores the character categories recognized by the language

	`MyOpcodes: AnOpcodeDictionary;`
Used to manage the opcodes recognized by the langage

	`MySyntaxRules: ASyntaxRuleset;`
Used to manage the syntax rules defined by the language

Methods

procedure defineCharacterCategories; virtual;

Define the character categories used by the language.

This method should be overriden by descendant classes. It is called automatically by the constructor. Descendant classes should first call the inherited routine, then use the method to define which characters belong in the various character categories; this allows a tokenizer that references the language to determine what type of token to create when a given character is encountered in the source.

The base implementation of this routine first sets all character category strings to empty strings. It then assigns the categories as follows:

CHARCAT_LETTER is set to plcsTypicalLetter
CHARCAT_WORD is set to plcsTypicalWord
CHARCAT_DIGIT is set to plcsTypicalDigit
CHARCAT_NUMERIC is set to plcsTypicalNumeric
CHARCAT_SPACE is set to plcsTypicalWhitespace
CHARCAT_EOL is set to plcsTypicalEndOfLine
CHARCAT_EOS is set to plcsTypicalEndOfStream

procedure defineOpcodes; virtual; abstract;

Define the opcodes used by the language.

This method should be implemented by descendant classes. It is called automatically by the constructor. Descendant classes should use the method to match token strings with opcodes, usually by calling AnOpcodeDictionary.bind or AnOpcodeDictionary.bindSeveral on Self.Opcodes. This allows AParsedLanguage.opcodeFor to match a given token string with its internal representation.

procedure defineRules; virtual;

Define the rules used by the language.

This method should be overriden by descendant classes. It is called automatically by the constructor. Descendant classes should use the method to establish syntax rules for the language. This allows AParsedLanguage.SyntaxRule to return the rule required by a parser that needs to determine if a specific token is allowed to be positioned where it is found in the source.

The base implementation of this routine creates two default, empty rules: RULE_BEGIN_STATEMENT and RULE_END_STATEMENT. If these two rules are desired in a given language implementation, then descendant classes should call the inherited routine before installing their own rules.

constructor new; override;

Construct a new instance of the language. You will likely need only one instance of a given language at any given time.

The constructor calls the inherited routine, after which it calls three initialization routines: AParsedLanguage.defineCharacterCategories, AParsedLanguage.defineOpcodes, and AParsedLanguage.defineRules. These three methods should be implemented by descendant classes to define the character categories, opcodes, and rules used by the language.

	`function init: boolean; override;`
Initializer

destructor destroy; override;

Destroy the language instance.

This method is called automatically when TObject.free is called on the language instance. It frees the syntax ruleset and opcode dictionary used by the language before calling the inherited routine.

function categoryOf(const ch: string): ACharacterCategory; virtual;

Retrieve the category to which the given character belongs.

This method searches for an instance of ch in the various character category definitions provided by AParsedLanguage.myCharacterCategories. If a match is found, the appropriate character category is returned. If no match is found for the given character, then this method returns CHARCAT_ERROR.

ch is passed as a string so that the language definition can support both UTF-8 and ASCII characters.

function characterIn(const ch: string; const categoryList: array of ACharacterCategory): ACharacterCategory; virtual;

Determine whether the specified character is defined in one of the specified character categories.

This method searches for an instance of ch in the character category definitions specified by categoryList. The first category that has the specified character is returned. If none of the categories specified contains ch, then this method returns CHARCAT_DUMMY.

ch is passed as a string so that the language definition can support both UTF-8 and ASCII characters.

function opcodeFor(const tokenString: string): TOpcode; virtual;

Determine whether the given token string has a matching opcode; that is, whether the given token string is a recognized keyword, operator, or special character.

This method searches the opcode dictionary defined for the language and, if tokenString is found, returns the matching opcode. Otherwise, it returns TOKCAT_DUMMY.

If AParsedLanguage.caseAware is True, then tokenString is converted to lower case before it is checked. Otherwise, the token string is checked exactly as provided by the calling routine.

function caseAware: boolean;

Determine whether or not the language is case aware.

This method checks the value of AParsedLanguage.myCaseAwareness, which is set by the language instance when it is constructed. If the language is case aware, then token strings passed to AParsedLanguage.opcodeFor much exactly match the case specified for each keyword, operator, or special character when the language was defined. Otherwise, the token strings are converted to lower case before they are checked.

	`function characterCategory(const category: ACharacterCategory): string; virtual;`
Retrieve all characters defined by the language for the given category.

function setCharacterCategory(const category: ACharacterCategory; const newSet: string): string; virtual;

Define the characters which belong in the given category.

This method overwrites any previous characters defined for the specified category with the newSet specified. Although character category types are normally specified when the language instance is created (by AParsedLanguage.defineCharacterCategories, this method makes it possible to amend or replace these definitions on-the-fly.

Returns

The previous characters defined for the given category, if any.

function Opcodes: AnOpcodeDictionary;

Retrieve a reference to the dictionary of opcodes maintained by the language. This dictionary is used to match token strings to their internal representations for faster parsing and processing. These token strings will represent reserved words, operators, and special characters recognized by the parser for the given language.

The caller may use the reference returned by this method to search the dictionary directly, but the reference should NOT be freed. That will be done automatically when the language instance is itself freed.

function SyntaxRule(const rule: TSortKey): ASyntaxRule;

Retrieve a reference to the given syntax rule, which specifies one or more opcodes that are to be used or ignored by the parser for the language in various circumstances.

The caller may use the reference returned by this method to search the rule directly, but the reference should NOT be freed. That will be done automatically when the language instance is itself freed.

function SyntaxRules: ASyntaxRuleset;

Retrieve a reference to the list of syntax rules maintained by the language definition. These rules specify lists of opcodes that the parser should process or ignore in various circumstances.

The caller may use the reference returned by this method to manipulate the list directly, but the reference should NOT be freed. That will be done automatically when the language instance is itself freed.

Generated by PasDoc 0.13.0 on 2015-01-10 17:13:18

causerie

Class AParsedLanguage

Unit

Declaration

Description

Hierarchy

Overview

Fields

Methods

Description

Fields

Methods

Returns