The cli
module¶
Main module for the piyacc
command-line utility.
Contains the main()
function as required by setuptools
, so that when the package is installed, setuptools
can put a wrapper in /usr/bin
or similar place. Can also be executed directly and will call main()
itself.
ndcode.piyacc.cli.
main
()¶Runs the following steps:
Processing the command line
Parsing and analyzing the YACC/Bison-compatible input file
Generating a parser definition and corresponding automaton
Writing C or Python code to implement the requested parser
Design of the module¶
Step 1 is fairly straightforward Python code using getopt.getopt()
.
Step 2 is more complicated, it uses a πlex/πyacc/πtree generated parser which automatically creates an abstract syntax tree representing the input, and then it performs several passes over the tree to extract the needed information such as lists of terminals and non-terminals, the rules, the alternatives of each rule, and the symbols and/or actions and other directives of each alternative. How it does this is somewhat beyond the scope of this documentation, as we would first have to refer you to tutorials in how to use πlex/πyacc/πtree.
Step 3 is the focus of this document. Internally it uses a Python class library that we have developed for dealing with LR1 grammars and automata. See the classes ndcode.piyacc.lr1.LR1
and ndcode.piyacc.lr1dfa.LR1DFA
. These classes are general-purpose and there is nothing to stop you pulling them out and using them in your project (subject to the license which is GPLv2). At the moment the classes are only used for generating automata to be used later, but there are commented functions in each class that can perform parsing directly.
Step 4 is fairly straightforward Python code using str.format()
. Basically it reads a skeleton file line-by-line looking for special lines similar to:
# GENERATE SECTION1
and then expands these into a sequence like:
# GENERATE SECTION1 BEGIN
...
# GENERATE SECTION1 END
where the middle part (denoted ...
) is printed with a str.format()
format string containing the C or Python code to be inserted into the skeleton.
Often this C or Python code will contain repetitive material (e.g. switch cases or action handler functions or similar) and these are sent into the format string as an string argument, which is a str.join()
of a list of items, each formatted in turn with str.format()
. These may also contain repetitive material embedded in a similar manner, up to several levels deep. To illustrate how this works, here’s an example where we are given a list of strings in the variable lines
and we generate C code to write them out with fwrite()
:
sys.stdout.write(
'''#include <stdio.h>
int main(void) {{
{0:s}}}
'''.format(
''.join(
[
' fwrite(stdout, 1, {0:d}, "{1:s}");\n'.format(
len(line),
line
)
for line in lines
]
)
)
)