API¶
generic_lexer package¶
Submodules¶
generic_lexer.errors module¶
generic_lexer.lexer module¶
- class generic_lexer.lexer.Lexer(rules, skip_whitespace=False, text_buffer='')
Bases:
objectA simple pattern-based lexer/tokenizer. All the regexes are concatenated into a single one with named groups. The group names must be valid Python identifiers. The patterns without groups auto generate them. Groups are then mapped to token names.
- Parameters:
rules (
Mapping[str,str]) – A list of rules. Each rule is astrpair, where the first is the type of the token to return when it’s recognized and the second is the regular expression used to recognize the token.skip_whitespace (
bool) – If True, whitespace (s+) will be skipped and not reported by the lexer. Otherwise, you have to specify your rules for whitespace, or it will be flagged as an error.text_buffer (
str) – the string to generate the tokens from
- clear_text_buffer()
Set the text buffer to a blank string and set the text pointer to 0
- Return type:
- property current_char: str
- get_char_at(buffer_pointer)
- Return type:
- get_char_at_current_pointer()
- Return type:
- get_text_buffer()
Get the current text to be parsed into the lexer
- Return type:
- pattern_token(token_name, pattern)
- Return type:
- set_text_buffer(value)
Set the text to be parsed into the lexer and set the pointer back to 0
- Return type:
- property text_buffer: str
Set, Get or Clear the text buffer, you may use
delwith this property to clear the text buffer
- tokens(skip_whitespace=False)
- Parameters:
skip_whitespace (
bool) – just likeLexer.skip_whitespacepassed troughlexer.Lexerfor the current method call.- Raises:
generic_lexer.errors.LexerError – raised with the position and character of the error in case of a lexing error (if the current chunk of the buffer matches no rule).
- Yields:
the next token (a Token object) found in the
Lexer.text_buffer.- Return type:
Iterator[Token]
generic_lexer.logging module¶
generic_lexer.token module¶
- class generic_lexer.token.Token(name, position, val)
Bases:
objectA simple Token structure. Contains the token name, value and position.
As you can see differently from the original gist, we are capable of specifying multiple groups per token.
You may get the values of the tokens this way:
>>> from generic_lexer import Lexer >>> rules = { ... "VARIABLE": r"(?P<var_name>[a-z_]+):(?P<var_type>[A-Z]\w+)", ... "EQUALS": r"=", ... "STRING": r"\".*\"", ... } >>> data = "first_word:String = \"Hello\"" >>> variable, equals, string = tuple(Lexer(rules, True, data)) >>> variable VARIABLE({'var_name': 'first_word', 'var_type': 'String'}) at 0 >>> variable.val {'var_name': 'first_word', 'var_type': 'String'} >>> variable["var_name"] 'first_word' >>> variable["var_type"] 'String' >>> equals EQUALS('=') at 18 >>> equals.val '=' >>> string STRING('"Hello"') at 20 >>> string.val '"Hello"'
- Parameters:
-
lexer:
Lexer
-
name:
str
-
position:
int
- property type: str
For compability
Module contents¶
- class generic_lexer.Lexer(rules, skip_whitespace=False, text_buffer='')
Bases:
objectA simple pattern-based lexer/tokenizer. All the regexes are concatenated into a single one with named groups. The group names must be valid Python identifiers. The patterns without groups auto generate them. Groups are then mapped to token names.
- Parameters:
rules (
Mapping[str,str]) – A list of rules. Each rule is astrpair, where the first is the type of the token to return when it’s recognized and the second is the regular expression used to recognize the token.skip_whitespace (
bool) – If True, whitespace (s+) will be skipped and not reported by the lexer. Otherwise, you have to specify your rules for whitespace, or it will be flagged as an error.text_buffer (
str) – the string to generate the tokens from
- clear_text_buffer()
Set the text buffer to a blank string and set the text pointer to 0
- Return type:
- property current_char: str
- get_char_at(buffer_pointer)
- Return type:
- get_char_at_current_pointer()
- Return type:
- get_text_buffer()
Get the current text to be parsed into the lexer
- Return type:
- pattern_token(token_name, pattern)
- Return type:
- set_text_buffer(value)
Set the text to be parsed into the lexer and set the pointer back to 0
- Return type:
- property text_buffer: str
Set, Get or Clear the text buffer, you may use
delwith this property to clear the text buffer
- tokens(skip_whitespace=False)
- Parameters:
skip_whitespace (
bool) – just likeLexer.skip_whitespacepassed troughlexer.Lexerfor the current method call.- Raises:
generic_lexer.errors.LexerError – raised with the position and character of the error in case of a lexing error (if the current chunk of the buffer matches no rule).
- Yields:
the next token (a Token object) found in the
Lexer.text_buffer.- Return type:
Iterator[Token]
- exception generic_lexer.LexerError(char, text_buffer_pointer)
Bases:
ExceptionLexer error exception.
- Parameters:
- char
- text_buffer_pointer
- class generic_lexer.Token(name, position, val)
Bases:
objectA simple Token structure. Contains the token name, value and position.
As you can see differently from the original gist, we are capable of specifying multiple groups per token.
You may get the values of the tokens this way:
>>> from generic_lexer import Lexer >>> rules = { ... "VARIABLE": r"(?P<var_name>[a-z_]+):(?P<var_type>[A-Z]\w+)", ... "EQUALS": r"=", ... "STRING": r"\".*\"", ... } >>> data = "first_word:String = \"Hello\"" >>> variable, equals, string = tuple(Lexer(rules, True, data)) >>> variable VARIABLE({'var_name': 'first_word', 'var_type': 'String'}) at 0 >>> variable.val {'var_name': 'first_word', 'var_type': 'String'} >>> variable["var_name"] 'first_word' >>> variable["var_type"] 'String' >>> equals EQUALS('=') at 18 >>> equals.val '=' >>> string STRING('"Hello"') at 20 >>> string.val '"Hello"'
- Parameters:
-
lexer:
Lexer
-
name:
str
-
position:
int
- property type: str
For compability