Flex (Fast Lexical Analyzer Generator)
Last Updated :
17 Sep, 2024
Flex (Fast Lexical Analyzer Generator), or simply Flex, is a tool for generating lexical analyzers scanners or lexers. Written by Vern Paxson in C, circa 1987, Flex is designed to produce lexical analyzers that is faster than the original Lex program. Today it is often used along with Berkeley Yacc or GNU Bison parser generators. Both Flex and Bison are more flexible, and produce faster code, than their ancestors Lex and Yacc.
In this process, Bison creates a parser from the input file provided by the user. In turn, Flex creates the function yylex() from a .l file, which is a file containing rules of lexical analysis. The function yylex(), created by Flex automatically, retrieves tokens from the input stream and is invoked by the parser to analyze the text of the input.
yylex() function Note: This is the main function generated from Flex. It performs the rules in the rule section of the .l file to conduct the lexical analysis.
Installing Flex on Ubuntu:
sudo apt-get update
sudo apt-get install flex
Note: If Update command is not run on the machine for a while, it's better to run it first so that a newer version is installed as an older version might not work with the other packages installed or may not be present now.
Given image describes how the Flex is used:

Step 1: An input file describes the lexical analyzer to be generated named lex.l is written in lex language. The lex compiler transforms lex.l to C program, in a file that is always named lex.yy.c.
Step 2: The C compiler compile lex.yy.c file into an executable file called a.out.
Step 3: The output file a.out take a stream of input characters and produce a stream of tokens.
Program Structure:
In the input file, there are 3 sections:
1. Definition Section: The definition section contains the declaration of variables, regular definitions, manifest constants. In the definition section, text is enclosed in "%{ %}" brackets. Anything written in this brackets is copied directly to the file lex.yy.c
Syntax:
%{
// Definitions
%}
2. Rules Section: The rules section contains a series of rules in the form: pattern action and pattern must be unintended and action begin on the same line in {} brackets. The rule section is enclosed in "%% %%".
Syntax:
%%
pattern action
%%
Examples: Table below shows some of the pattern matches.
Pattern | It can match with |
---|
[0-9] | all the digits between 0 and 9 |
[0+9] | either 0, + or 9 |
[0, 9] | either 0, ', ' or 9 |
[0 9] | either 0, ' ' or 9 |
[-09] | either -, 0 or 9 |
[-0-9] | either - or all digit between 0 and 9 |
[0-9]+ | one or more digit between 0 and 9 |
[^a] | all the other characters except a |
[^A-Z] | all the other characters except the upper case letters |
a{2, 4} | either aa, aaa or aaaa |
a{2, } | two or more occurrences of a |
a{4} | exactly 4 a's i.e, aaaa |
. | any character except newline |
a* | 0 or more occurrences of a |
a+ | 1 or more occurrences of a |
[a-z] | all lower case letters |
[a-zA-Z] | any alphabetic letter |
w(x | y)z | wxz or wyz |
3. User Code Section: This section contains C statements and additional functions. We can also compile these functions separately and load with the lexical analyzer.
Basic Program Structure:
%{
// Definitions
%}
%%
Rules
%%
User code section
How to run the program:
To run the program, it should be first saved with the extension .l or .lex. Run the below commands on terminal in order to run the program file.
Step 1: flex filename.l or flex filename.lex depending on the extension file is saved with
Step 2: gcc lex.yy.c
Step 3: ./a.out
Step 4: Provide the input to program in case it is required
Note: Press Ctrl+D or use some rule to stop taking inputs from the user. Please see the output images of below programs to clear if in doubt to run the programs.
Example 1: Count the number of characters in a string
C
/*** Definition Section has one variable
which can be accessed inside yylex()
and main() ***/
%{
int count = 0;
%}
/*** Rule Section has three rules, first rule
matches with capital letters, second rule
matches with any character except newline and
third rule does not take input after the enter***/
%%
[A-Z] {printf("%s capital letter\n", yytext);
count++;}
. {printf("%s not a capital letter\n", yytext);}
\n {return 0;}
%%
/*** Code Section prints the number of
capital letter present in the given input***/
int yywrap(){}
int main(){
// Explanation:
// yywrap() - wraps the above rule section
/* yyin - takes the file pointer
which contains the input*/
/* yylex() - this is the main flex function
which runs the Rule Section*/
// yytext is the text in the buffer
// Uncomment the lines below
// to take input from file
// FILE *fp;
// char filename[50];
// printf("Enter the filename: \n");
// scanf("%s",filename);
// fp = fopen(filename,"r");
// yyin = fp;
yylex();
printf("\nNumber of Capital letters "
"in the given input - %d\n", count);
return 0;
}
Output:

Example 2: Count the number of characters and number of lines in the input
C
/* Declaring two counters one for number
of lines other for number of characters */
%{
int no_of_lines = 0;
int no_of_chars = 0;
%}
/***rule 1 counts the number of lines,
rule 2 counts the number of characters
and rule 3 specifies when to stop
taking input***/
%%
\n ++no_of_lines;
. ++no_of_chars;
end return 0;
%%
/*** User code section***/
int yywrap(){}
int main(int argc, char **argv)
{
yylex();
printf("number of lines = %d, number of chars = %d\n",
no_of_lines, no_of_chars );
return 0;
}
Output:

Advantages
- Efficiency: Flex-generated lexical analyzers are very fast and efficient, which can improve the overall performance of the programming language.
- Portability: Flex is available on many different platforms, making it easy to use on a wide range of systems.
- Flexibility: Flex is very flexible and can be customized to support many different types of programming languages and input formats.
- Easy to Use: Flex is relatively easy to learn and use, especially for programmers with experience in regular expressions.
Disadvantages
- Steep Learning Curve: While Flex is relatively easy to use, it can have a steep learning curve for programmers who are not familiar with regular expressions.
- Limited Error Detection: Flex-generated lexical analyzers can only detect certain types of errors, such as syntax errors and misspelled keywords.
- Limited Support for Unicode: Flex has limited support for Unicode, which can make it difficult to work with non-ASCII characters.
- Code Maintenance: Flex-generated code can be difficult to maintain over time, especially as the programming language evolves and changes. This can make it challenging to keep the lexical analyzer up to date with the latest version of the language.
Conclusion
Flex is powerful in generating fast and efficient lexical analyzers; that is why it is so essential when building compilers and interpreters. While it boasts flexibility, portability, and speed, mastering it is hard because of the regular expression-based syntax, and it has some limitations in error detection and Unicode support. Due to the steepness, it requires considerable effort from a user but still remains popular among users working with numerous programming languages.
Similar Reads
Introduction of Compiler Design A compiler is software that translates or converts a program written in a high-level language (Source Language) into a low-level language (Machine Language or Assembly Language). Compiler design is the process of developing a compiler.The development of compilers is closely tied to the evolution of
9 min read
Compiler Design Basics
Introduction of Compiler DesignA compiler is software that translates or converts a program written in a high-level language (Source Language) into a low-level language (Machine Language or Assembly Language). Compiler design is the process of developing a compiler.The development of compilers is closely tied to the evolution of
9 min read
Compiler construction toolsThe compiler writer can use some specialized tools that help in implementing various phases of a compiler. These tools assist in the creation of an entire compiler or its parts. Some commonly used compiler construction tools include: Parser Generator - It produces syntax analyzers (parsers) from the
4 min read
Phases of a CompilerA compiler is a software tool that converts high-level programming code into machine code that a computer can understand and execute. It acts as a bridge between human-readable code and machine-level instructions, enabling efficient program execution. The process of compilation is divided into six p
10 min read
Symbol Table in CompilerEvery compiler uses a symbol table to track all variables, functions, and identifiers in a program. It stores information such as the name, type, scope, and memory location of each identifier. Built during the early stages of compilation, the symbol table supports error checking, scope management, a
8 min read
Error Handling in Compiler DesignDuring the process of language translation, the compiler can encounter errors. While the compiler might not always know the exact cause of the error, it can detect and analyze the visible problems. The main purpose of error handling is to assist the programmer by pointing out issues in their code. E
5 min read
Language Processors: Assembler, Compiler and InterpreterComputer programs are generally written in high-level languages (like C++, Python, and Java). A language processor, or language translator, is a computer program that convert source code from one programming language to another language or to machine code (also known as object code). They also find
5 min read
Generation of Programming LanguagesProgramming languages have evolved significantly over time, moving from fundamental machine-specific code to complex languages that are simpler to write and understand. Each new generation of programming languages has improved, allowing developers to create more efficient, human-readable, and adapta
6 min read
Lexical Analysis
Introduction of Lexical AnalysisLexical analysis, also known as scanning is the first phase of a compiler which involves reading the source program character by character from left to right and organizing them into tokens. Tokens are meaningful sequences of characters. There are usually only a small number of tokens for a programm
6 min read
Flex (Fast Lexical Analyzer Generator)Flex (Fast Lexical Analyzer Generator), or simply Flex, is a tool for generating lexical analyzers scanners or lexers. Written by Vern Paxson in C, circa 1987, Flex is designed to produce lexical analyzers that is faster than the original Lex program. Today it is often used along with Berkeley Yacc
7 min read
Introduction of Finite AutomataFinite automata are abstract machines used to recognize patterns in input sequences, forming the basis for understanding regular languages in computer science. They consist of states, transitions, and input symbols, processing each symbol step-by-step. If the machine ends in an accepting state after
4 min read
Classification of Context Free GrammarsA Context-Free Grammar (CFG) is a formal rule system used to describe the syntax of programming languages in compiler design. It provides a set of production rules that specify how symbols (terminals and non-terminals) can be combined to form valid sentences in the language. CFGs are important in th
4 min read
Ambiguous GrammarContext-Free Grammars (CFGs) is a way to describe the structure of a language, such as the rules for building sentences in a language or programming code. These rules help define how different symbols can be combined to create valid strings (sequences of symbols).CFGs can be divided into two types b
7 min read
Syntax Analysis & Parsers
Syntax Directed Translation & Intermediate Code Generation
Syntax Directed Translation in Compiler DesignSyntax-Directed Translation (SDT) is a method used in compiler design to convert source code into another form while analyzing its structure. It integrates syntax analysis (parsing) with semantic rules to produce intermediate code, machine code, or optimized instructions.In SDT, each grammar rule is
8 min read
S - Attributed and L - Attributed SDTs in Syntax Directed TranslationIn Syntax-Directed Translation (SDT), the rules are those that are used to describe how the semantic information flows from one node to the other during the parsing phase. SDTs are derived from context-free grammars where referring semantic actions are connected to grammar productions. Such action c
4 min read
Parse Tree and Syntax TreeParse Tree and Syntax tree are tree structures that represent the structure of a given input according to a formal grammar. They play an important role in understanding and verifying whether an input string aligns with the language defined by a grammar. These terms are often used interchangeably but
4 min read
Intermediate Code Generation in Compiler DesignIn the analysis-synthesis model of a compiler, the front end of a compiler translates a source program into an independent intermediate code, then the back end of the compiler uses this intermediate code to generate the target code (which can be understood by the machine). The benefits of using mach
6 min read
Issues in the design of a code generatorA code generator is a crucial part of a compiler that converts the intermediate representation of source code into machine-readable instructions. Its main task is to produce the correct and efficient code that can be executed by a computer. The design of the code generator should ensure that it is e
7 min read
Three address code in CompilerTAC is an intermediate representation of three-address code utilized by compilers to ease the process of code generation. Complex expressions are, therefore, decomposed into simple steps comprising, at most, three addresses: two operands and one result using this code. The results from TAC are alway
6 min read
Data flow analysis in CompilerData flow is analysis that determines the information regarding the definition and use of data in program. With the help of this analysis, optimization can be done. In general, its process in which values are computed using data flow analysis. The data flow property represents information that can b
6 min read
Code Optimization & Runtime Environments
Practice Questions