Regular Expressions, Regular Grammar and Regular Languages
Last Updated :
23 Jul, 2025
To work with formal languages and string patterns, it is essential to understand regular expressions, regular grammar, and regular languages. These concepts form the foundation of automata theory, compiler design, and text processing.
Regular Expressions
Regular expressions are symbolic notations used to define search patterns in strings. They describe regular languages and are commonly used in tasks such as validation, searching, and parsing.
A regular expression represents a regular language if it follows these rules:
- ϵ (epsilon) is a regular expression representing the language {ϵ} (the empty string).
- Any symbol 'a' from the input alphabet Σ is a regular expression, representing the language {a}.
- Union (a + b) is a regular expression if a and b are regular expressions, representing the language {a, b}.
- Concatenation (ab) is a regular expression if a and b are regular expressions.
- Kleene star (a) is a regular expression*, meaning zero or more occurrences of 'a', forming a regular language.
| Description | Regular Expression | Regular Languages |
|---|
| Set of vowels | `(a | e |
| 'a' followed by 0 or more 'b' | ab* | {a, ab, abb, abbb, abbbb, ...} |
| Any number of vowels followed by any number of consonants | [aeiou]*[bcdfghjklmnpqrstvwxyz]* | {ε, a, aou, aiou, b, abcd, ...} (ε represents empty string) |
Regular Grammar
A regular grammar is a formal grammar that generates regular languages. It consists of:
- Terminals: Symbols that form strings (e.g.,
a, b). - Non-terminals: Variables used to derive strings (e.g.,
S, A). - Production Rules: Rules for transforming non-terminals into terminals or other non-terminals.
- Start Symbol: The non-terminal from which derivations begin.
Types of Regular Grammar
- Right-linear Grammar: All production rules are of the form:
A -> aB or A -> a.
Example:S -> aS | bS | ε
- Left-linear Grammar: All production rules are of the form:
A -> Ba or A -> a.
read more about - Regular grammar
Regular Languages
Regular languages are the class of languages that can be represented using finite automata, regular expressions, or regular grammar. These languages have predictable patterns and are computationally efficient to recognize.
Properties of Regular Languages
1. Closure Properties
Regular languages are closed under operations like union, concatenation, and Kleene star.
- Union: If L1 and If L2 are two regular languages, their union L1 ? L2 will also be regular. For example, L1 = {an | n ? 0} and L2 = {bn | n ? 0} L3 = L1 ? L2 = {an ? bn | n ? 0} is also regular.
- Intersection: If L1 and If L2 are two regular languages, their intersection L1 ? L2 will also be regular. For example, L1= {ambn | n ? 0 and m ? 0} and L2= {ambn ? bnam | n ? 0 and m ? 0} L3 = L1 ? L2 = {ambn | n ? 0 and m ? 0} is also regular.
- Concatenation: If L1 and If L2 are two regular languages, their concatenation L1.L2 will also be regular. For example, L1 = {an | n ? 0} and L2 = {bn | n ? 0} L3 = L1.L2 = {am. bn | m ? 0 and n ? 0} is also regular.
- Kleene Closure: If L1 is a regular language, its Kleene closure L1* will also be regular. For example, L1 = (a ? b) L1* = (a ? b)*
- Complement: If L(G) is regular language, its complement L’(G) will also be regular. Complement of a language can be found by subtracting strings which are in L(G) from all possible strings. For example, L(G) = {an | n > 3} L’(G) = {an | n <= 3}
read more about - Closure properties of Regular languages
2. Finite Automata:
Every regular language can be recognized by a finite automaton (DFA or NFA).
3. Decision Problems:
Problems like membership testing, emptiness, and equivalence can be solved for regular languages.
Note: Two regular expressions are equivalent if languages generated by them are same. For example, (a+b*)* and (a+b)* generate same language. Every string which is generated by (a+b*)* is also generated by (a+b)* and vice versa.
Comparison of Regular Expressions, Grammar, and Languages
| Aspect | Regular Expressions | Regular Grammar | Regular Languages |
|---|
| Definition | Pattern representation of strings | Rule-based generation of strings | Language class described by regex and grammar |
| Representation | Symbols and operators | Terminals, non-terminals, production rules | Finite automata, regex, or grammar |
| Use Case | Text processing, validation | Syntax generation for compilers | Language recognition |
How to solve problems on regular expression and regular languages?
Question 1
Which one of the following languages over the alphabet {0,1} is described by the regular expression?
*(0+1)0(0+1)0(0+1)
(A) The set of all strings containing the substring 00.
(B) The set of all strings containing at most two 0s.
(C) The set of all strings containing at least two 0s.
(D) The set of all strings that begin and end with either 0 or 1.
Solution:
- Option A: This suggests the language must have the substring
00. However, the string 10101 is part of the language but does not contain 00. So, Option A is incorrect. - Option B: This states that the language can have a maximum of two
0s. But the string 00000 is part of the language, which violates this condition. So, Option B is incorrect. - Option C: This states that the language must contain at least two
0s. The regular expression ensures at least two 0s are present. Hence, Option C is correct. - Option D: This claims the language includes all strings that begin and end with either
0 or 1. However, the language can generate strings starting with 0 and ending with 1 or vice versa. So, Option D is incorrect.
Correct Answer: (C)
Question 2
Which of the following languages is generated by the given grammar?
S -> aS | bS | ε
(A) {aⁿbᵐ | n,m ≥ 0}
(B) {w ∈ {a,b}* | w has an equal number of as and bs}
(C) {aⁿ | n ≥ 0} ∪ {bⁿ | n ≥ 0} ∪ {aⁿbⁿ | n ≥ 0}
(D) {a,b}*
Solution:
- Option A: This describes strings with
n as followed by m bs. However, the grammar can produce strings like ba (S -> bS -> ba), which violates the pattern. So, Option A is incorrect. - Option B: This states that strings have an equal number of
as and bs. But the string b (S -> bS -> b) does not satisfy this condition. So, Option B is incorrect. - Option C: This describes strings with only
as, only bs, or n as followed by n bs. However, the string ba does not fit this pattern. So, Option C is incorrect. - Option D: This includes all strings with any number of
as and bs in any order. The grammar can generate all such strings, including ba. Hence, Option D is correct.
Correct Answer: (D)
Question 3
The regular expression 0*(10*)* denotes the same set as:
(A) (10)1
(B) 0 + (0 + 10)
(C) (0 + 1)10(0 + 1)
(D) None of these
Solution:
Two regular expressions are equivalent if the languages they generate are the same.
- Option A: This can generate all strings generated by
0*(10*)*, making them equivalent. So, Option A is correct. - Option B: The null string cannot be generated by this option, but
0*(10*)* can generate the null string. So, Option B is incorrect. - Option C: This ensures that
10 is a substring, but 0*(10*)* may or may not have 10 as a substring. So, Option C is incorrect.
Correct Answer: (A)
Question 4
The regular expression for the language with input alphabets a and b, where two as do not come together, is:
(A) (b + ab)* + (b + ab)a
(B) a(b + ba) + (b + ba)*
(C) Both (A) and (B)
(D) None of the above
Solution:
The language can be expressed as:
L = {ε, a, b, bb, ab, aba, ba, bab, baba, abab, ...}
- Option A: Uses
ab as the building block for strings where two as are not adjacent. (b + ab)* covers strings ending with b, while (b + ab)*a covers strings ending with a. - Option B: Uses
ba as the building block and covers strings starting with a or b.
Both expressions correctly describe the given language.
Correct Answer: (C)
Explore
Automata _ Introduction
Regular Expression and Finite Automata
CFG
PDA (Pushdown Automata)
Introduction of Pushdown Automata
5 min read
Pushdown Automata Acceptance by Final State
4 min read
Construct Pushdown Automata for given languages
4 min read
Construct Pushdown Automata for all length palindrome
6 min read
Detailed Study of PushDown Automata
3 min read
NPDA for accepting the language L = {anbm cn | m,n>=1}
2 min read
NPDA for accepting the language L = {an bn cm | m,n>=1}
2 min read
NPDA for accepting the language L = {anbn | n>=1}
2 min read
NPDA for accepting the language L = {amb2m| m>=1}
2 min read
NPDA for accepting the language L = {am bn cp dq | m+n=p+q ; m,n,p,q>=1}
2 min read
Construct Pushdown automata for L = {0n1m2m3n | m,n ⥠0}
3 min read
Construct Pushdown automata for L = {0n1m2n+m | m, n ⥠0}
2 min read
NPDA for accepting the language L = {ambncm+n | m,n ⥠1}
2 min read
NPDA for accepting the language L = {amb(m+n)cn| m,n ⥠1}
3 min read
NPDA for accepting the language L = {a2mb3m|m>=1}
2 min read
NPDA for accepting the language L = {amb2m+1 | m ⥠1}
2 min read
NPDA for accepting the language L = {aibjckdl | i==k or j==l,i>=1,j>=1}
3 min read
Construct Pushdown automata for L = {a2mc4ndnbm | m,n ⥠0}
3 min read
NPDA for L = {0i1j2k | i==j or j==k ; i , j , k >= 1}
2 min read
NPDA for accepting the language L = {anb2n| n>=1} U {anbn| n>=1}
2 min read
NPDA for the language L ={wÐ{a,b}* | w contains equal no. of a's and b's}
3 min read
Turing Machine
Decidability
TOC Interview preparation
TOC Quiz and PYQ's in TOC