aptk - module reference

class aptk.parser.Parser(grammar, actions=None)

Parser combines grabbar and parse-actions to parser.

An object of this class combines an abstract grammar and parse-actions to a parser, which produces an abstract syntax tree.

If no actions given, defaults to ParseActions object.

aptk.actions - Parse Actions

Parse Actions are used to create an abstract syntax tree from your parse tree.

Parse Actions are expected to be attributes of the parse-actions object passed to Parser. This can be an object of a class derived from ParseActions, but can be also a module with a collection of functions.

Parse-Action Callables

A parse-action is called from parser with two parameters:

  • parser - current Parser object
  • lex - current Lexem object

Whatever the parse-action returns will be then written into the ast attribute of the Lexem object.

Connecting Parse-Actions to Rules

The parser calls a parse-action for each captured match object, which is represented by a Lexem object:

  • If there is defined a parse-action in the matching rule, it is called. In following rule there would be called parse-action “some_action”, if you captured something using <some-rule>:

    some-rule  some_action=  "some text"
    

    You can map shortcuts to actions:

    :parse-action-map
        "$" => other_action
    
    other-rule $= "other text"
    

    In this case there would be called parse-action “other_action”, if you captured “other text” with <other-rule>.

  • If there is not defined a parse-action in matching rule, it is tried to find following parse-actions if <my_rule> was matched:

    • my_rule
    • make_my_rule
    • got_my_rule
  • If no parse-action found, there is nothing done

Pairs

Setting an ast to a pair (name, result), where name is the rule’s name and result is result from parse-action, can be achieved with following syntax:

paired  action=>  <some> <rule>

If you append a “>” to your operator and you define an action for your rule the ast of the capture of <paired> will be the pair (paired, «result of action()»).

Example

>>> from aptk import *
>>> 
>>> class DashArithmeticGrammar(Grammar):
...    r"""Simple grammar for addition and substraction.
... 
...    dash_op    <= <sum> | <difference> | <number>
...    sum        := <number> "+" <dash_op>
...    difference := <number> "-" <dash_op>
...    """
>>> 
>>> class CalculatorActions(ParseActions):
...    r"""inherit number from ParseActions"""
...    def sum(self, p, lex):
...        return lex[0].ast + lex[1].ast
...    def difference(self, p, lex):
...        return lex[0].ast - lex[1].ast
>>> 
>>> ast("1 + 3 - 2", 
...     grammar = DashArithmeticGrammar, 
...     actions = CalculatorActions())
2
class aptk.grammar.BaseGrammar(s=None, **kargs)

Most basic grammar class.

Usually you will rather use Grammar instead of this for deriving you classes from. If you really need a blank grammar, you can derive your grammar from this class.

A Grammar class has following attributes:

__metaclass__
GrammarType - the type of a grammar class
_TOKENS_

A dictionary of token-parsing regexes, which can be used with {name} for the smart value and {:name:} for the unchanged value.

Smart value means that if you specify a token like:

token = abcd

You still can quantify the token without having strange effects:

a-rule := foo{token}+

Will be translated to:

a-rule := foo(?:abcd)+

The other way of access:

b-rule := foo{:token:}+

Will be translated to:

b-rule := fooabcd+

You can use the second form for example for defining character classes:

word-chars = A-Za-z0-9_
dash       = \-
ident      = [{:word-chars:}{:dash:}]+

The tokens are evaluated directly after a rule-part is read.

_ACTIONS_

This dictionary maps rule-names to action-names, which are methods in either ParseAction object passed to parser or in Grammar. This map is created from implicit parse-action directives. Parse-actions are run on lexing a MatchObject and fill the ast-attribute of Lexem with life.

Implicit parse-actions are specified by _PARSE_ACTION_MAP_.

_START_RULE_
Name of start-rule if no other given.
class aptk.grammar.Grammar(s=None, **kargs)

Default grammar with basic tokens and rules.

This is the grammar, you will usually derive your grammars from.

It provides most common tokens:

SP   = \x20
NL   = \r?\n
LF   = \n
CR   = \r
CRLF = \r\n
ws   = \s+
ws?  = \s*
N    = [^\n]
HWS  = [\x20\t\v]
LINE = [^\n]*\n

And a general ActionMap, which lets you connect your grammar to basic ParseActions:

:parse-action-map
    "$" make_string
    "@" make_list
    "%" make_dict
    "#" make_number
    "<" make_inherit
    ">" make_name
    "~" make_quoted

And most common rules:

ident     $= [A-Za-z_\-][\w\-]*
number    #= [+-]?\d+(?:\.\d+)?
integer   #= \d+
dq-string ~= "(?:\\\\|\\[^\\]|[^"\\])*"
sq-string ~= '(?:\\\\|\\[^\\]|[^'\\])*'
ws        $= \b{ws}\b|{ws?}
line      $= [^\n]*\n

Making explicit the whitespace rule default from BaseGrammar:

:sigspace <.ws>

Define how args of BRANCH are parsed:

:args-of BRANCH string capturing non-capturing regex

Define operation precedence parser:

:args-of  EXPR string capturing non-capturing raw 
            => aptk.oprec.OperatorPrecedenceParser
BRANCH(P, s=None, start=None, end=None, args=None)

lookahead and branch into some rule.

Example:

branched := <BRANCH{
             "a"    <a-rule>
             [bcd]  <bcd-rule>
             a|b    <a-or-b-rule>
             <default-rule>
            }>

If string to be matched startswith

ERROR(P, s=None, start=None, end=None, args=None)

raise a syntax error.

Example:

foo := <x> | <ERROR{Expected "x"}>

Please note that whitespace will be collapsed to single space.

aptk.grammar.compile(input, type=None, name=None, extends=None, grammar=None, filename=None)

compile a grammar

You can pass different inputs to this class, which has influence on return value.

# input is grammar

class:

class MyGrammar(Grammar):
    r"""This is my grammar class

    .. highlight:: aptk

    My grammar has following rule::

        <foo> = "bar"
    """

This is the way you usually invoke compile() with a grammar class, because compile() is invoked by GrammarType.

# Append whatever is defined in input to

grammar:

class MyGrammar(Grammar):
    r"""Here are rules defined"""

...

compile("here are more rules", grammar=MyGrammar)

input may be either a file object (something having a read() method) or a string.

# Create a new grammar named name, which extends grammars passed in
iteratable extends. If you do not pass extends, then your grammar will extends Grammar, extracting the rules from input.

# Simply compile input to a list of grammars.

list_of_grammars = compile(“”“

:grammar first some := <rule>

:grammar second another := <rule>

“””)

input may be either a file object (something having a read() method) or a string.

Parameters
input
Pass a grammar class, a string or whatever, which has a read() method, e.g. a file object.
type
Type of input, “sphinx” or “native”.
name
Name of grammar, which shall be created and keep the rules given in input.
extends
If you pass a name you may pass extends as a list of names of grammars.
grammar
If you pass a grammar class, the input is added to this grammar class.
filename
for informative purpose
Returns
A GrammarClass or (if no specific grammar given in some way) a list of grammar classes.

aptk.oprec - Operation Precedence Parser

Operation precedence parsers are intended to parse expressions, where never is a sequence of non-terminals. Usually you will use it to parse (mathematical) expressions.

You can invoke OperationPrecedenceParser into your grammar by using:

:args-of OPTABLE string capturing non-capturing raw
         => aptk.oprec.OperatorPrecedenceParser

Then you can create rules like this:

my_rule_name1 := <OPTABLE{
                    :rule T <.term>
                    ...
                    }>

my_rule_name2 := <OPTABLE{
                    :rule T <.term2>
                    :rule W ""
                    :rule E
                    ...
                    }>

Every OPTABLE invokation creates a new rule.

In any Grammar-descending grammar this is already done for you and operation precedence is accessible via rule EXPR:

:grammar operation-precedence-parser-tests

expr  := <EXPR{
           :flags with-ops

           :op L E+E
       }>

You have to define a <term>, such that a term, which is the only non-terminal-rule in expressions, can be parsed:

term       := <number> | <ident>

Expression above parses for example following expressions:

<expr> ~~ 5 + 5 
       -> expr( E+E( number( '5' ), op( '+' ), number( '5' ) ) )

<expr> ~~ 1 + 2 + 3 
       -> expr( E+E( 
            E+E(
              number( '1' ), 
              op( '+' ),
              number( '2' ) 
            ),
            op( '+' ),
            number( '3' )
          ) )

You see in parse trees of expressions above, that the operator is also lexed (as “op”). This is triggered by flag with-ops. If you leave out this flag, operators are not lexed, as you see in further examples:

expr2 :- <EXPR{
           :op L E+E
           :op L E-E  = E+E
           :op L E*E  > E+E
           :op L E/E  = E*E
           :op L E**E > E*E
           :op L E++  > E**E
           :op R ++E  = E++
           :op R (E)  > E++
       }>

First example where operator precedence table is used:

<expr2> ~~ 5 + 5 * 4 
        -> expr2( E+E(
              number( '5' ), 
              E*E( number( '5' ), number( '4' ) ) 
           ) )

A more complex example:

<expr2> ~~ 5**2 + 4**2/3**1 * 2 + 1 
        ->  expr2( E+E(
               E+E(
                 E**E( number( '5' ), number( '2' ) ), 
                 E*E(
                   E/E(
                     E**E( number( '4' ), number( '2' ) ),
                     E**E( number( '3' ), number( '1' ) )
                   ),
                   number( '2' )
                 )
               ),
               number( '1' )
            ) )

Here you see how whitespace has influence on tokenizer:

<expr2> ~~ 1*3+++++1 
        -> expr2( E+E( 
             E*E( number( '1' ), E++( E++( number( '3' ) ) ) ), 
             number( '1' ) 
           ) )

<expr2> ~~ 1*3++ + ++1 
        -> expr2( E+E(
            E*E( number( '1' ), E++( number( '3' ) ) ),
            ++E( number( '1' ) )
        ) )

<expr2> ~~ 1*3+++(++1) 
        -> expr2( E+E( 
             E*E( number( '1' ), E++( number( '3' ) ) ), 
             (E)( ++E( number( '1' ) ) ) 
           ) )

<expr2> ~~ (1*3)++
        -> expr2( E++(
             (E)(
               E*E(
                 number( '1' ),
                 number( '3' )
               )
             )
           ) )

Here you see how operator precedence has influence on interpretation of a term ++1--:

prepostest1 := <EXPR{
               :op L ++E
               :op L E-- > ++E
              }>

<prepostest1> ~~ ++1-- -> prepostest1( ++E( E--( number( '1' ) ) ) )

prepostest2 := <EXPR{
               :op L ++E
               :op L E-- < ++E
              }>

<prepostest2> ~~ ++1-- -> prepostest2( E--( ++E( number( '1' ) ) ) )

postcirc1   :- <EXPR{
                  :op R E(E)
                  :op R E,E < E(E)
               }>

<postcirc1> ~~  sum(1, 2) 
            -> postcirc1( E(E)(
                 E,E(
                   number( '1' ),
                   number( '2' ) 
                 )
               ) )

<postcirc1> ~~  sum(1, 2, 3, 4)
            -> postcirc1( E(E)(
                 E,E(
                   number( '1' ),
                   E,E(
                     number( '2' ),
                     E,E(
                       number( '3' ),
                       number( '4' )
                     )
                   )
                 )
               ) )

Typical operator association you find here:

class aptk.grammar_tester.GrammarTest(name, op, pos, input, actions, expected, skip=None, debug=False)

simple class to save testdata

class aptk.grammar_tester.GrammarTestCase(name, grammar_test, grammar)

A TestCase for Grammar

class aptk.grammar_tester.RuleTest(name, op, pos, input, actions, expected, skip=None, debug=False)

name specifies a rule

class aptk.grammar_tester.TokenTest(name, op, pos, input, actions, expected, skip=None, debug=False)

name specifies a token

aptk.grammar_tester.generate_testsuite(grammar, suite=None, patterns=None)

gets a grammar class and maybe a suite

exception aptk.grammar_compiler.GrammarError(grammar_compiler, msg, **kargs)

exception in grammar compilation.

This exception is raised, if there is an error in grammar compilation.