Class AParsedLanguage

DescriptionHierarchyFieldsMethodsProperties

Unit

Declaration

type AParsedLanguage = class(ASingleton)

Description

This class serves as the basis for defining a language that will be parsed. It is designed to provide a single point of reference for a parser to use as a source stream is processed; as such, it provides routines to determine the category of a given character or token, and a way to look up rules.

To define a new language to parse, simply definee a descendant of this class and implement the methods that are called on construction: AParsedLanguage.defineCharacterCategories, AParsedLanguage.defineOpcodes, and AParsedLanguage.defineRules.

Hierarchy

  • AParsedLanguage

Overview

Fields

Protected myCaseAwareness: boolean;
Protected myCharacterCategories: array[CHARCAT_SPECIAL..CHARCAT_EOS] of string;
Protected MyOpcodes: AnOpcodeDictionary;
Protected MySyntaxRules: ASyntaxRuleset;

Methods

Protected procedure defineCharacterCategories; virtual;
Protected procedure defineOpcodes; virtual; abstract;
Protected procedure defineRules; virtual;
Public constructor new; override;
Public function init: boolean; override;
Public destructor destroy; override;
Public function categoryOf(const ch: string): ACharacterCategory; virtual;
Public function characterIn(const ch: string; const categoryList: array of ACharacterCategory): ACharacterCategory; virtual;
Public function opcodeFor(const tokenString: string): TOpcode; virtual;
Public function caseAware: boolean;
Public function characterCategory(const category: ACharacterCategory): string; virtual;
Public function setCharacterCategory(const category: ACharacterCategory; const newSet: string): string; virtual;
Public function Opcodes: AnOpcodeDictionary;
Public function SyntaxRule(const rule: TSortKey): ASyntaxRule;
Public function SyntaxRules: ASyntaxRuleset;

Description

Fields

Protected myCaseAwareness: boolean;

Specifies whether or not the language is case aware

Protected myCharacterCategories: array[CHARCAT_SPECIAL..CHARCAT_EOS] of string;

Stores the character categories recognized by the language

Protected MyOpcodes: AnOpcodeDictionary;

Used to manage the opcodes recognized by the langage

Protected MySyntaxRules: ASyntaxRuleset;

Used to manage the syntax rules defined by the language

Methods

Protected procedure defineCharacterCategories; virtual;

Define the character categories used by the language.

This method should be overriden by descendant classes. It is called automatically by the constructor. Descendant classes should first call the inherited routine, then use the method to define which characters belong in the various character categories; this allows a tokenizer that references the language to determine what type of token to create when a given character is encountered in the source.

The base implementation of this routine first sets all character category strings to empty strings. It then assigns the categories as follows:

Protected procedure defineOpcodes; virtual; abstract;

Define the opcodes used by the language.

This method should be implemented by descendant classes. It is called automatically by the constructor. Descendant classes should use the method to match token strings with opcodes, usually by calling AnOpcodeDictionary.bind or AnOpcodeDictionary.bindSeveral on Self.Opcodes. This allows AParsedLanguage.opcodeFor to match a given token string with its internal representation.

Protected procedure defineRules; virtual;

Define the rules used by the language.

This method should be overriden by descendant classes. It is called automatically by the constructor. Descendant classes should use the method to establish syntax rules for the language. This allows AParsedLanguage.SyntaxRule to return the rule required by a parser that needs to determine if a specific token is allowed to be positioned where it is found in the source.

The base implementation of this routine creates two default, empty rules: RULE_BEGIN_STATEMENT and RULE_END_STATEMENT. If these two rules are desired in a given language implementation, then descendant classes should call the inherited routine before installing their own rules.

Public constructor new; override;

Construct a new instance of the language. You will likely need only one instance of a given language at any given time.

The constructor calls the inherited routine, after which it calls three initialization routines: AParsedLanguage.defineCharacterCategories, AParsedLanguage.defineOpcodes, and AParsedLanguage.defineRules. These three methods should be implemented by descendant classes to define the character categories, opcodes, and rules used by the language.

Public function init: boolean; override;

Initializer

Public destructor destroy; override;

Destroy the language instance.

This method is called automatically when TObject.free is called on the language instance. It frees the syntax ruleset and opcode dictionary used by the language before calling the inherited routine.

Public function categoryOf(const ch: string): ACharacterCategory; virtual;

Retrieve the category to which the given character belongs.

This method searches for an instance of ch in the various character category definitions provided by AParsedLanguage.myCharacterCategories. If a match is found, the appropriate character category is returned. If no match is found for the given character, then this method returns CHARCAT_ERROR.

ch is passed as a string so that the language definition can support both UTF-8 and ASCII characters.

Public function characterIn(const ch: string; const categoryList: array of ACharacterCategory): ACharacterCategory; virtual;

Determine whether the specified character is defined in one of the specified character categories.

This method searches for an instance of ch in the character category definitions specified by categoryList. The first category that has the specified character is returned. If none of the categories specified contains ch, then this method returns CHARCAT_DUMMY.

ch is passed as a string so that the language definition can support both UTF-8 and ASCII characters.

Public function opcodeFor(const tokenString: string): TOpcode; virtual;

Determine whether the given token string has a matching opcode; that is, whether the given token string is a recognized keyword, operator, or special character.

This method searches the opcode dictionary defined for the language and, if tokenString is found, returns the matching opcode. Otherwise, it returns TOKCAT_DUMMY.

If AParsedLanguage.caseAware is True, then tokenString is converted to lower case before it is checked. Otherwise, the token string is checked exactly as provided by the calling routine.

Public function caseAware: boolean;

Determine whether or not the language is case aware.

This method checks the value of AParsedLanguage.myCaseAwareness, which is set by the language instance when it is constructed. If the language is case aware, then token strings passed to AParsedLanguage.opcodeFor much exactly match the case specified for each keyword, operator, or special character when the language was defined. Otherwise, the token strings are converted to lower case before they are checked.

Public function characterCategory(const category: ACharacterCategory): string; virtual;

Retrieve all characters defined by the language for the given category.

Public function setCharacterCategory(const category: ACharacterCategory; const newSet: string): string; virtual;

Define the characters which belong in the given category.

This method overwrites any previous characters defined for the specified category with the newSet specified. Although character category types are normally specified when the language instance is created (by AParsedLanguage.defineCharacterCategories, this method makes it possible to amend or replace these definitions on-the-fly.

Returns

The previous characters defined for the given category, if any.

Public function Opcodes: AnOpcodeDictionary;

Retrieve a reference to the dictionary of opcodes maintained by the language. This dictionary is used to match token strings to their internal representations for faster parsing and processing. These token strings will represent reserved words, operators, and special characters recognized by the parser for the given language.

The caller may use the reference returned by this method to search the dictionary directly, but the reference should NOT be freed. That will be done automatically when the language instance is itself freed.

Public function SyntaxRule(const rule: TSortKey): ASyntaxRule;

Retrieve a reference to the given syntax rule, which specifies one or more opcodes that are to be used or ignored by the parser for the language in various circumstances.

The caller may use the reference returned by this method to search the rule directly, but the reference should NOT be freed. That will be done automatically when the language instance is itself freed.

Public function SyntaxRules: ASyntaxRuleset;

Retrieve a reference to the list of syntax rules maintained by the language definition. These rules specify lists of opcodes that the parser should process or ignore in various circumstances.

The caller may use the reference returned by this method to manipulate the list directly, but the reference should NOT be freed. That will be done automatically when the language instance is itself freed.


Generated by PasDoc 0.13.0 on 2015-06-25 11:12:03