PLY (Python lex-Yacc) - An Introduction
Last Updated :
16 Feb, 2022
We all have heard of lex which is a tool that generates lexical analyzer which is then used to tokenify input streams and yacc which is a parser generator but there is a python implementation of these two tools in form of separate modules in a package called PLY.
These modules are named lex.py and yacc.py and work similar to the original UNIX tools lex and yacc.
PLY works differently from its UNIX counterparts in a way that it doesn't require a special input file instead it takes the python program as inputs directly. The traditional tools also make use of parsing tables which are hard on compiler time whereas PLY caches the results generated and saves them for use and regenerates them as needed.
lex.py
This is one of the key modules in this package because the working of yacc.py also depends on lex.py as it is responsible for generating a collection of tokens from the input text and that collection is then identified using the regular expression rules.
To import this module in your python code use import ply.lex as lex
Example:
Suppose you wrote a simple expression: y = a + 2 * b
When this is passed through ply.py, the following tokens are generated
'y','=', 'a', '+', '2', '*', 'b'
These generated tokens are usually used with token names which are always required.
#Token list of above tokens will be
tokens = ('ID','EQUAL','ID', 'PLUS', 'NUMBER', 'TIMES','ID' )
#Regular expression rules for the above example
t_PLUS = r'\+'
t_MINUS = r'-'
t_TIMES = r'\*'
t_DIVIDE = r'/'
More specifically, these can be represented as tuples of token type and token
('ID', 'y'), ('EQUALS', '='), ('ID', 'a'), ('PLUS', '+'),
('NUMBER', '2'), ('TIMES', '*'), ('NUMBER', '3')
This module provides an external interface too in the form of token() which returns the valid tokens from the input.
yacc.py
Another module of this package is yacc.py where yacc stands for Yet Another Compiler Compiler. This can be used to implement one-pass compilers. It provides a lot of features that are already available in UNIX yacc and some extra features that give yacc.py some advantages over traditional yacc
You can use the following to import yacc into your python code import ply.yacc as yacc.
These features include:
- LALR(1) parsing
- Grammar Validation
- Support for empty productions
- Extensive error checking capability
- Ambiguity Resolution
The explicit token generation token() is also used by yacc.py which continuously calls this on user demand to collect tokens and grammar rules. yacc.py spits out Abstract Syntax Tree (AST) as output.
Advantage over UNIX yacc:
Python implementation yacc.py doesn't involve code-generation process instead it uses reflection to make its lexers and parsers which saves space as it doesn't require any extra compiler constructions step and code file generation.
For importing the tokens from your lex file use from lex_file_name_here import tokens where tokens are the list of tokens specified in the lex file.
To specify the grammar rules we have to define functions in our yacc file. The syntax for the same is as follows:
def function_name_here(symbol):
expression = expression token_name term
References:
https://www.dabeaz.com/ply/ply.html
Similar Reads
Introduction to Python Black Module
Python, being a language known for its readability and simplicity, offers several tools to help developers adhere to these principles. One such tool is Black, an uncompromising code formatter for Python. In this article, we will delve into the Black module, exploring what it is, how it works, and wh
5 min read
Python Introduction
Python was created by Guido van Rossum in 1991 and further developed by the Python Software Foundation. It was designed with focus on code readability and its syntax allows us to express concepts in fewer lines of code.Key Features of PythonPythonâs simple and readable syntax makes it beginner-frien
3 min read
Introduction to Python Pydantic Library
In modern Python development, data validation and parsing are essential components of building robust and reliable applications. Whether we're developing APIs, working with configuration files, or handling data from various sources, ensuring that our data is correctly validated and parsed is crucial
6 min read
Introduction to Python Levenshtein Module
When working with text processing or natural language processing (NLP) tasks, one common requirement is to measure the "distance" or difference between two strings. One popular method to achieve this is through the Levenshtein distance. The Python-Levenshtein module is an efficient way to compute th
10 min read
Introduction to Python Typing-Extensions Module
The typing-extensions module provides backports of the latest typing features to ensure that developers working with older versions of Python can still leverage these advanced tools. This module acts as a bridge between future releases of Python and existing codebases, enabling us to stay up to date
8 min read
TextaCy module in Python
In this article, we will introduce ourselves to the TextaCy module in python which is generally used to perform a variety of NLP tasks on texts. It is built upon the SpaCy module in Python. Some of the features of the TextaCy module are as follows:It provides the facility of text cleaning and prepr
12 min read
Introduction to Poetry: Overview and benefits
Poetry is a powerful tool for managing Python project dependencies, streamlining package management, and simplifying the overall development workflow. In this article, we will learn more about Poetry, its benefits, and the importance of dependency management. What is Poetry?In Python programming, we
4 min read
Introduction to Python for Absolute Beginners
Are you a beginner planning to start your career in the competitive world of Programming? Looking resources for Python as an Absolute Beginner? You are at the perfect place. This Python for Beginners page revolves around Step by Step tutorial for learning Python Programming language from very basics
6 min read
llist module in Python
Up until a long time Python had no way of executing linked list data structure. It does support list but there were many problems encountered when using them as a concept of the linked list like list are rigid and are not connected by pointers hence take a defined memory space that may even be waste
3 min read
Parse a YAML file in Python
YAML is the abbreviation of Yet Another Markup Language or YAML ain't markup Language which is the data format used to exchange data. YAML can store only data and no commands. It is similar to the XML and JSON data formats. In this article, we will dive deep into the concept of parsing YAML files in
4 min read