CODING PART
Basics:
LEX = LEXical analyzer generator
It is a tool used to create lexical analyzers (scanners)
Basically, LEX programs that read input and break it into tokens, which are
meaningful sequences of characters (like keywords, numbers, or identifiers)
Alternatives to LEX – FLEX (Fast LEX) and JLEX (Java LEX)
Structure of a LEX specification (specification ≈ program):
Structure of a LEX Specification: There are three sections – Declarations,
Transition rules and Auxiliary functions
%{Declarations%}
%%
Transition rules
%%
Auxiliary functions
Here, %% acts as a separator between two sections
Some codes
1. Pattern matching:
%{
#include <stdio.h>
%}
%%
[0-9]+ { printf("NUMBER: %s\n", yytext); }
[a-zA-Z]+ { printf("WORD: %s\n", yytext); }
[ \t\n]+ { /* ignore whitespace */ }
. { printf("UNKNOWN: %s\n", yytext); }
%%
int main() {
yylex(); // Call the lexer
return 0;
}
Here,
a. [0-9]+ matches a sequence of digits from 0 to 9 (it is basically a regular
expression) from the input, and categorises it as a number
b. [a-zA-Z]+ matches a sequence of lowercase or uppercase characters
c. [ \t\n]+ matches a sequence of spaces, tabs and new line characters
d. All other sequences (indicated by ‘.’) are ignored
yytext is a built in variable that contains the text matched by the current rule
(like for instance, in “April 1st 2025”, the sequence “April” is matched by the
second rule as we scan from left to right ⟶ “April” is stored in the variable
yytext temporarily, and that is printed
After that, the space is ignored
And then the sequence “1” is matched by the first rule ⟶ “1” is stored in the
variable yytext temporarily (replaces “April”, and that is printed. This goes on
yylex() is the function that starts the lexical analysis from left to right and that
must be called in the main function
2. Counting the number of words and numbers:
%{
#include <stdio.h>
int words = 0, numbers = 0;
%}
%%
[0-9]+ { numbers++; }
[a-zA-Z]+ { words++; }
[ \t\n]+ { /* do nothing & skip spaces */ }
%%
int main() {
yylex();
printf("Total Words: %d\n", words);
printf("Total Numbers: %d\n", numbers);
return 0;
}
We declare variables words = 0 and numbers = 0 initially
When the lexer scans from left to right and identifies a sequence of digits (0 to
9), that is considered as a number, and the variable ‘numbers’ is incremented
Similarly, when a sequence of characters is encountered, it is considered as a
word and the variable ‘words’ is incremented
Here, we don’t really have to make use of yytext
When we run yylex(), it takes in the input and based on that, performs whatever
has been described amongst the productions
3. Breaking down the components of a C Code:
%{
#include <stdio.h>
%}
%%
"if" { printf("IF keyword\n"); }
"else" { printf("ELSE keyword\n"); }
"while" { printf("WHILE keyword\n"); }
"return" { printf("RETURN keyword\n"); }
[a-zA-Z_][a-zA-Z0-9_]* { printf("ID: %s\n", yytext); }
[0-9]+ { printf("NUMBER: %s\n", yytext); }
. { /* ignore other characters */ }
%%
int main() {
yylex();
return 0;
}
The ones within “” are counted as strings, and are matched based only if the
length, and the characters match (including lower/upper case)
Basic pattern matches:
Pattern Matches
a Only the character ‘a’
a|b Either the character ‘a’ or ‘b’
Anything except whatever has been
.
declared previously (equivalent to ‘else’)
\n New line characters
\t Tab character
\r Carriage return
\\ A single backslash ⟶ \
\” A single doublequote ⟶ “
[abc] Any one of a, b, or c
[^abc] Any character except a, b, c
[a-z] Any lowercase letter
[A-Z] Any uppercase letter
[0-9] Any digit
[a-zA-Z] Any letter
[a-zA-Z0-9_] Any letter, digit, or underscore
a* Zero or more a
a+ One or more a
a? Zero or one a
a{3} Exactly three as
a{2,4} Between 2 and 4 as
ab a followed by b
“something” The exact word “something”
^ Start of line (outside brackets)
End of line (not supported in old LEX
$
versions)
\ Escape next character
[aA][a-zA-Z0-9_]* Words starting with a or A
4. Counting the length of a string:
%{
#include <stdio.h>
#include <string.h>
%}
%%
[a-zA-Z0-9]+ {
printf("Length of input: %lu\n", strlen(yytext));
}
.|\n { /* ignore everything else */ }
%%
int main() {
yylex();
return 0;
}
5. Counting the number of vowels and consonants:
%{
#include <stdio.h>
#include <ctype.h>
int v_count = 0;
int c_count = 0;
%}
%%
[aAeEiIoOuU] { v_count++; }
[b-df-hj-np-tv-zB-DF-HJ-NP-TV-Z] { c_count++; }
.|\n
%%
int main() {
yylex(); // Start scanning input
printf("Vowels: %d\n", v_count);
printf("Consonants: %d\n", c_count);
return 0;
}