aptk - module reference¶
-
class
aptk.parser.
Parser
(grammar, actions=None)¶ Parser combines grabbar and parse-actions to parser.
An object of this class combines an abstract grammar and parse-actions to a parser, which produces an abstract syntax tree.
If no actions given, defaults to
ParseActions
object.
aptk.actions - Parse Actions¶
Parse Actions are used to create an abstract syntax tree from your parse tree.
Parse Actions are expected to be attributes of the parse-actions object
passed to Parser
. This can be an object of a class derived
from ParseActions
, but can be also a module with a collection
of functions.
Parse-Action Callables¶
A parse-action is called from parser with two parameters:
- parser - current
Parser
object - lex - current
Lexem
object
Whatever the parse-action returns will be then written into the ast
attribute of the Lexem
object.
Connecting Parse-Actions to Rules¶
The parser calls a parse-action for each captured match object, which is
represented by a Lexem
object:
If there is defined a parse-action in the matching rule, it is called. In following rule there would be called parse-action “some_action”, if you captured something using
<some-rule>
:some-rule some_action= "some text"
You can map shortcuts to actions:
:parse-action-map "$" => other_action other-rule $= "other text"
In this case there would be called parse-action “other_action”, if you captured “other text” with
<other-rule>
.If there is not defined a parse-action in matching rule, it is tried to find following parse-actions if
<my_rule>
was matched:my_rule
make_my_rule
got_my_rule
If no parse-action found, there is nothing done
Pairs¶
Setting an ast to a pair (name, result), where name is the rule’s name and result is result from parse-action, can be achieved with following syntax:
paired action=> <some> <rule>
If you append a “>” to your operator and you define an action for your rule
the ast of the capture of <paired>
will be the pair
(paired, «result of action()»).
Example¶
>>> from aptk import *
>>>
>>> class DashArithmeticGrammar(Grammar):
... r"""Simple grammar for addition and substraction.
...
... dash_op <= <sum> | <difference> | <number>
... sum := <number> "+" <dash_op>
... difference := <number> "-" <dash_op>
... """
>>>
>>> class CalculatorActions(ParseActions):
... r"""inherit number from ParseActions"""
... def sum(self, p, lex):
... return lex[0].ast + lex[1].ast
... def difference(self, p, lex):
... return lex[0].ast - lex[1].ast
>>>
>>> ast("1 + 3 - 2",
... grammar = DashArithmeticGrammar,
... actions = CalculatorActions())
2
-
class
aptk.grammar.
BaseGrammar
(s=None, **kargs)¶ Most basic grammar class.
Usually you will rather use
Grammar
instead of this for deriving you classes from. If you really need a blank grammar, you can derive your grammar from this class.A Grammar class has following attributes:
- __metaclass__
GrammarType
- the type of a grammar class- _TOKENS_
A dictionary of token-parsing regexes, which can be used with
{name}
for the smart value and{:name:}
for the unchanged value.Smart value means that if you specify a token like:
token = abcd
You still can quantify the token without having strange effects:
a-rule := foo{token}+Will be translated to:
a-rule := foo(?:abcd)+
The other way of access:
b-rule := foo{:token:}+
Will be translated to:
b-rule := fooabcd+
You can use the second form for example for defining character classes:
word-chars = A-Za-z0-9_ dash = \- ident = [{:word-chars:}{:dash:}]+
The tokens are evaluated directly after a rule-part is read.
- _ACTIONS_
This dictionary maps rule-names to action-names, which are methods in either ParseAction object passed to parser or in Grammar. This map is created from implicit parse-action directives. Parse-actions are run on lexing a MatchObject and fill the ast-attribute of Lexem with life.
Implicit parse-actions are specified by _PARSE_ACTION_MAP_.
- _START_RULE_
- Name of start-rule if no other given.
-
class
aptk.grammar.
Grammar
(s=None, **kargs)¶ Default grammar with basic tokens and rules.
This is the grammar, you will usually derive your grammars from.
It provides most common tokens:
SP = \x20 NL = \r?\n LF = \n CR = \r CRLF = \r\n ws = \s+ ws? = \s* N = [^\n] HWS = [\x20\t\v] LINE = [^\n]*\n
And a general ActionMap, which lets you connect your grammar to basic
ParseActions
::parse-action-map "$" make_string "@" make_list "%" make_dict "#" make_number "<" make_inherit ">" make_name "~" make_quoted
And most common rules:
ident $= [A-Za-z_\-][\w\-]* number #= [+-]?\d+(?:\.\d+)? integer #= \d+ dq-string ~= "(?:\\\\|\\[^\\]|[^"\\])*" sq-string ~= '(?:\\\\|\\[^\\]|[^'\\])*' ws $= \b{ws}\b|{ws?} line $= [^\n]*\n
Making explicit the whitespace rule default from BaseGrammar:
:sigspace <.ws>
Define how args of BRANCH are parsed:
:args-of BRANCH string capturing non-capturing regex
Define operation precedence parser:
:args-of EXPR string capturing non-capturing raw => aptk.oprec.OperatorPrecedenceParser
-
BRANCH
(P, s=None, start=None, end=None, args=None)¶ lookahead and branch into some rule.
Example:
branched := <BRANCH{ "a" <a-rule> [bcd] <bcd-rule> a|b <a-or-b-rule> <default-rule> }>
If string to be matched startswith
-
ERROR
(P, s=None, start=None, end=None, args=None)¶ raise a syntax error.
Example:
foo := <x> | <ERROR{Expected "x"}>
Please note that whitespace will be collapsed to single space.
-
-
aptk.grammar.
compile
(input, type=None, name=None, extends=None, grammar=None, filename=None)¶ compile a grammar
You can pass different inputs to this class, which has influence on return value.
- # input is grammar
class:
class MyGrammar(Grammar): r"""This is my grammar class .. highlight:: aptk My grammar has following rule:: <foo> = "bar" """
This is the way you usually invoke
compile()
with a grammar class, becausecompile()
is invoked byGrammarType
.- # Append whatever is defined in input to
grammar:
class MyGrammar(Grammar): r"""Here are rules defined""" ... compile("here are more rules", grammar=MyGrammar)
input may be either a file object (something having a read() method) or a string.
- # Create a new grammar named name, which extends grammars passed in
- iteratable extends. If you do not pass extends, then your grammar
will extends
Grammar
, extracting the rules from input.
# Simply compile input to a list of grammars.
- list_of_grammars = compile(“”“
:grammar first some := <rule>
:grammar second another := <rule>
“””)
input may be either a file object (something having a read() method) or a string.
- Parameters
- input
- Pass a grammar class, a string or whatever, which has a read() method, e.g. a file object.
- type
- Type of input, “sphinx” or “native”.
- name
- Name of grammar, which shall be created and keep the rules given in input.
- extends
- If you pass a name you may pass extends as a list of names of grammars.
- grammar
- If you pass a grammar class, the input is added to this grammar class.
- filename
- for informative purpose
- Returns
- A GrammarClass or (if no specific grammar given in some way) a list of grammar classes.
aptk.oprec - Operation Precedence Parser¶
Operation precedence parsers are intended to parse expressions, where never is a sequence of non-terminals. Usually you will use it to parse (mathematical) expressions.
You can invoke OperationPrecedenceParser into your grammar by using:
:args-of OPTABLE string capturing non-capturing raw
=> aptk.oprec.OperatorPrecedenceParser
Then you can create rules like this:
my_rule_name1 := <OPTABLE{
:rule T <.term>
...
}>
my_rule_name2 := <OPTABLE{
:rule T <.term2>
:rule W ""
:rule E
...
}>
Every OPTABLE
invokation creates a new rule.
In any Grammar
-descending grammar this is already done for you
and operation precedence is accessible via rule EXPR
:
:grammar operation-precedence-parser-tests
expr := <EXPR{
:flags with-ops
:op L E+E
}>
You have to define a <term>
, such that a term, which is the only
non-terminal-rule in expressions, can be parsed:
term := <number> | <ident>
Expression above parses for example following expressions:
<expr> ~~ 5 + 5
-> expr( E+E( number( '5' ), op( '+' ), number( '5' ) ) )
<expr> ~~ 1 + 2 + 3
-> expr( E+E(
E+E(
number( '1' ),
op( '+' ),
number( '2' )
),
op( '+' ),
number( '3' )
) )
You see in parse trees of expressions above, that the operator is also
lexed (as “op”). This is triggered by flag with-ops
. If you
leave out this flag, operators are not lexed, as you see in further
examples:
expr2 :- <EXPR{
:op L E+E
:op L E-E = E+E
:op L E*E > E+E
:op L E/E = E*E
:op L E**E > E*E
:op L E++ > E**E
:op R ++E = E++
:op R (E) > E++
}>
First example where operator precedence table is used:
<expr2> ~~ 5 + 5 * 4
-> expr2( E+E(
number( '5' ),
E*E( number( '5' ), number( '4' ) )
) )
A more complex example:
<expr2> ~~ 5**2 + 4**2/3**1 * 2 + 1
-> expr2( E+E(
E+E(
E**E( number( '5' ), number( '2' ) ),
E*E(
E/E(
E**E( number( '4' ), number( '2' ) ),
E**E( number( '3' ), number( '1' ) )
),
number( '2' )
)
),
number( '1' )
) )
Here you see how whitespace has influence on tokenizer:
<expr2> ~~ 1*3+++++1
-> expr2( E+E(
E*E( number( '1' ), E++( E++( number( '3' ) ) ) ),
number( '1' )
) )
<expr2> ~~ 1*3++ + ++1
-> expr2( E+E(
E*E( number( '1' ), E++( number( '3' ) ) ),
++E( number( '1' ) )
) )
<expr2> ~~ 1*3+++(++1)
-> expr2( E+E(
E*E( number( '1' ), E++( number( '3' ) ) ),
(E)( ++E( number( '1' ) ) )
) )
<expr2> ~~ (1*3)++
-> expr2( E++(
(E)(
E*E(
number( '1' ),
number( '3' )
)
)
) )
Here you see how operator precedence has influence on interpretation of
a term ++1--
:
prepostest1 := <EXPR{
:op L ++E
:op L E-- > ++E
}>
<prepostest1> ~~ ++1-- -> prepostest1( ++E( E--( number( '1' ) ) ) )
prepostest2 := <EXPR{
:op L ++E
:op L E-- < ++E
}>
<prepostest2> ~~ ++1-- -> prepostest2( E--( ++E( number( '1' ) ) ) )
postcirc1 :- <EXPR{
:op R E(E)
:op R E,E < E(E)
}>
<postcirc1> ~~ sum(1, 2)
-> postcirc1( E(E)(
E,E(
number( '1' ),
number( '2' )
)
) )
<postcirc1> ~~ sum(1, 2, 3, 4)
-> postcirc1( E(E)(
E,E(
number( '1' ),
E,E(
number( '2' ),
E,E(
number( '3' ),
number( '4' )
)
)
)
) )
Typical operator association you find here:
-
class
aptk.grammar_tester.
GrammarTest
(name, op, pos, input, actions, expected, skip=None, debug=False)¶ simple class to save testdata
-
class
aptk.grammar_tester.
GrammarTestCase
(name, grammar_test, grammar)¶ A TestCase for Grammar
-
class
aptk.grammar_tester.
RuleTest
(name, op, pos, input, actions, expected, skip=None, debug=False)¶ name specifies a rule
-
class
aptk.grammar_tester.
TokenTest
(name, op, pos, input, actions, expected, skip=None, debug=False)¶ name specifies a token
-
aptk.grammar_tester.
generate_testsuite
(grammar, suite=None, patterns=None)¶ gets a grammar class and maybe a suite
-
exception
aptk.grammar_compiler.
GrammarError
(grammar_compiler, msg, **kargs)¶ exception in grammar compilation.
This exception is raised, if there is an error in grammar compilation.