Minor Project Report
Minor Project Report
PROJECT REPORT
ON
LEXICAL ANALYZER
Submitted to:
University of Rajasthan
INDEX
1. Project description
1.1 Objective of the project
2. Project contents
2.6 Feasibility
2.10 Testing
2.10.1testing methodology
2.10.2testing strategy
2.13 Evaluation
2.14 code
4. Conclusion
5. reference
ACKNOWLEDGEMENT
This is o certify that Ms. Ankita Verma, Ms. Harshi Yadav, Ms. Sameeksha
Chauhan have successfully completed the project entitled
“LEXICAL ANALYZER”
Under the able guidance of Ms. Deepti Arora towards the partial
fulfillment of the of Bachelor’s degree course in Information Technology.
• Token:
• Lexeme:
KEYWORDS: int, char, float, double, if, for, while, else, switch, struct,
printf, scanf, case, break, return, typedef, void
OPERATORS: +, ++, -, --, ||, *, ?, /, >, >=, <, <=, =, ==, &, &&.
BRACKETS: [ ], { }, ( ).
The output file will consist of all the tokens present in our input file along
with their respective token values.
SYSTEM DESIGN:
Process:
The lexical analyzer is the first phase of a compiler. Its main task is to
read the input characters and produce as output a sequence of tokens
that the parser uses for syntax analysis. This interaction, summarized
schematically in fig. a.
Upon receiving a “get next token “command from the parser, the lexical
analyzer reads the input characters until it can identify next token.
The scanner is responsible for doing simple tasks, while the lexical
analyzer proper does the more complex operations.
The lexical analyzer which we have designed takes the input from an
input file. It reads one character at a time from the input file, and
continues to read until end of the file is reached. It recognizes the valid
identifiers, keywords and specifies the token values of the keywords.
GOALS
Overview
Phases
Initiation Phase
Planning Phase
Further define and refine the functional and data requirements and
document them in the Requirements Document,
Develop detailed data and process models (system inputs, outputs, and
the process.
Design Phase
Development Phase
Implementation Phase
This phase is initiated after the system has been tested and accepted by
the user. In this phase, the system is installed to support the intended
business functions. System performance is compared to performance
objectives established during the planning phase. Implementation
includes user notification, user training, installation of hardware,
installation of software onto production computers, and integration of
the system into daily work processes.
Disposition Phase
Disposition activities ensure the orderly termination of the system and
preserve the vital information about the system so that some or all of the
information may be reactivated in the future if necessary. Particular
emphasis is given to proper preservation of the data processed by the
system, so that the data can be effectively migrated to another system or
archived for potential future access in accordance with applicable
records management regulations and policies. Each system should have
an interface control document defining inputs and outputs and data
exchange. Signatures should be required to verify that all dependent
users and impacted systems are aware of disposition.
Summary
Purpose
COMPILER
To define what a compiler is one must first define what a translator is. A
translator is a program that takes another program written in one
language, also known as the source language, and outputs a program
written in another language, known as the target language.
1.)Lexical Analysis
2.)Syntax Analysis
3.)Semantic Analysis
4.)Code Optimization
5.)Code Generation
Semantic Analysis is the act of determining whether or not the parse tree
is relevant and meaningful. The output is intermediate code, also known
as an intermediate representation (or IR). Most of the time, this IR is
closely related to assembly language but it is machine independent.
Intermediate code allows different code generators for different
machines and promotes abstraction and portability from specific machine
times and languages. (I dare say the most famous example is java’s byte-
code and JVM). Semantic Analysis finds more meaningful errors such as
undeclared variables, type compatibility, and scope resolution.
Code Generation is the final step in the compilation process. The input to
the Code Generator is the IR and the output is machine language code.
PLATFORM (TECHNOLOGY/TOOLS)
Characteristics
Like most imperative languages in the ALGOL tradition, C has facilities for
structured programming and allows lexical variable scope and recursion,
while a static type system prevents many unintended operations. In C, all
executable code is contained within functions. Function parameters are
always passed by value. Pass-by-reference is achieved in C by explicitly
passing pointer values. Heterogeneous aggregate data types (struct) allow
related data elements to be combined and manipulated as a unit. C
program source text is free-format, using the semicolon as a statement
terminator (not a delimiter).
C does not have some features that are available in some other
programming languages:
Operators
• assignment (=, +=, -=, *=, /=, %=, &=, |=, ^=, <<=, >>=)
increment and decrement (++, --) Main article: Operators in C and C++
• arithmetic (+, -, *, /, %)
• equality testing (==, !=)
• conditional evaluation (? :)
• type conversion (( ))
• sequencing (,)
• subexpression grouping (( ))
Data structures
C has a static weak typing type system that shares some similarities with
that of other ALGOL descendants such as Pascal. There are built-in types
for integers of various sizes, both signed and unsigned, floating-point
numbers, characters, and enumerated types (enum). C99 added a boolean
datatype. There are also derived types including arrays, pointers, records
(struct), and untagged unions (union).
Arrays
Deficiencies
It is inevitable that C did not choose limit the size or endianness of its
types—for example, each compiler is free to choose the size of an int
type as any anything over 16 bits according to what is efficient on the
current platform. Many programmers work based on size and endianness
assumptions, leading to code that is not portable.
Also inevitable is that the C standard defines only a very limited gamut of
functionality, excluding anything related to network communications,
user interaction, or process/thread creation. Its parent document, the
POSIX standard, includes such a wide array of functionality that no
operating system appears to support it exactly, and only UNIX systems
have even attempted to support substantial parts of it.
Windows XP
• Fast user switching, which allows a user to save the current state
and open applications of their desktop and allow another user to log
on without losing that information
Starting with version 3.0, Borland segmented their C++ compiler into two
distinct product-lines: "Turbo C++" and "Borland C++". Turbo C++ was
marketed toward the hobbyist and entry-level compiler market, while
Borland C++ targeted the professional application development market.
Borland C++ included additional tools, compiler code-optimization, and
documentation to address the needs of commercial developers. Turbo C++
3.0 could be upgraded with separate add-ons, such as Turbo Assembler
and Turbovision 1.0.
HARDWARE REQUIREMENT
RAM : 256 MB
Hard Disk : 40 GB
FDD : 4 GB
Monitor : LG
SOFTWARE REQUIREMENT
Languages :C
FEASIBILITY STUDY
• Resource Feasibility
• Schedule Feasibility
• Economic Feasibility
• Operational feasibility
• Technical feasibility
p1 {action 1}
p2 {action 2}
............
pn {action n}
Using NFA
The transition table for the NFA N is constructed for the
composite pattern p1|p2|. . .|pn, The NFA recognizes the longest prefix of
the input that is matched by a pattern. In the final NFA, there is an
accepting state for each pattern pi. The sequence of steps the final NFA
can be in is after seeing each input character is constructed. The NFA is
simulated until it reaches termination or it reaches a set of states from
which there is no transition defined for the current input symbol. The
specification for the lexical analyzer generator is so that a valid source
program cannot entirely fill the input buffer without having the NFA reach
termination. To find a correct match two things are done. Firstly,
whenever an accepting state is added to the current set of states, the
current input position and the pattern pi is recorded corresponding to this
accepting state. If the current set of states already contains an accepting
state, then only the pattern that appears first in the specification is
recorded. Secondly, the transitions are recorded until termination is
reached. Upon termination, the forward pointer is retracted to the
position at which the last match occurred. The pattern making this match
identifies the token found, and the lexeme matched is the string between
the lexeme beginning and forward pointers. If no pattern matches, the
lexical analyser should transfer control to some default recovery routine.
Using DFA
A data flow diagram can also be used for the visualization of data
processing (structured design).
This level shows the overall context of the system and its operating
environment and shows the whole system as just one process. It does not
usually show data stores, unless they are "owned" by external systems,
e.g. are accessed by but not maintained by this system, however, these
are often shown as external entities.
Level 1
This level shows all processes at the first level of numbering, data stores,
external entities and the data flows between them. The purpose of this
level is to show the major high level processes of the system and their
interrelation. A process model will have one, and only one, level 1
diagram. A level 1 diagram must be balanced with its parent context level
diagram, i.e. there must be the same external entities and the same data
flows, these can be broken down to more detail in the level 1, e.g. the
"enquiry" data flow could be split into "enquiry request" and "enquiry
results" and still be valid.
Level 2
A Level 2 Data flow diagram showing the "Process Enquiry" process for the
same system.
The first stage of information system design uses these models during the
requirements analysis to describe information needs or the type of
information that is to be stored in a database. The data modeling
technique can be used to describe any ontology (i.e. an overview and
classifications of used terms and their relationships) for a certain universe
of discourse (i.e. area of interest). In the case of the design of an
information system that is based on a database, the conceptual data
model is, at a later stage (usually called logical design), mapped to a
logical data model, such as the relational model; this in turn is mapped to
a physical model during physical design. Note that sometimes, both of
these phases are referred to as "physical design".
FLOW CHART
• Symbols
• Arrows
• Processing steps
• Input/Output
• Conditional or decision
• Validation: Have we built the right software (i.e., is this what the
customer wants?)? It is product based.
Testing methods
Software testing methods are traditionally divided into black box testing
and white box testing. These two approaches are used to describe the
point of view that a test engineer takes when designing test cases.
Black box testing treats the software as a black box without any
knowledge of internal implementation. Black box testing methods include
equivalence partitioning, boundary value analysis, all-pairs testing, fuzz
testing, model-based testing, traceability matrix, exploratory testing and
specification-based testing.
Specification-based testing
The black box tester has no "bonds" with the code, and a tester's
perception is very simple: a code MUST have bugs. Using the principle,
"Ask and you shall receive," black box testers find bugs where
programmers don't. BUT, on the other hand, black box testing is like a
walk in a dark labyrinth without a flashlight, because the tester doesn't
know how the back end was actually constructed.
That's why there are situations when
1. A black box tester writes many test cases to check something that can
be tested by only one test case and/or
White box testing, by contrast to black box testing, is when the tester has
access to the internal data structures and algorithms (and the code that
implement these)
White box testing methods can also be used to evaluate the completeness
of a test suite that was created with black box testing methods. This
allows the software team to examine parts of a system that are rarely
tested and ensures that the most important function points have been
tested.
Two common forms of code coverage are:
Unit Testing
Ideally, each test case is independent from the others; Double objects
like stubs, mock or fake objects as well as test harnesses can be used to
assist testing a module in isolation. Unit testing is typically done by
software developers to ensure that the code other developers have
written meets software requirements and behaves as the developer
intended.
Benefits
The goal of unit testing is to isolate each part of the program and show
that the individual parts are correct. A unit test provides a strict, written
contract that the piece of code must satisfy. As a result, it affords several
benefits. Unit tests find problems early in the development cycle.
Integration Testing
Integration testing takes as its input modules that have been unit tested,
groups them in larger aggregates, applies tests defined in an integration
test plan to those aggregates, and delivers as its output the integrated
system ready for system testing.
Purpose
Some different types of integration testing are big bang, top-down, and
bottom-up.
System Testing
Information security
Information security means protecting information and information
systems from unauthorized access, use, disclosure, disruption,
modification, or destruction.[1]
Identification
Identification is an assertion of who someone is or what something is. If a
person makes the statement "Hello, my name is John Doe." they are
making a claim of who they are. However, their claim may or may not be
true. Before John Doe can be granted access to protected information it
will be necessary to verify that the person claiming to be John Doe really
is John Doe.
Authentication
Authentication is the act of verifying a claim of identity. When John Doe
goes into a bank to make a withdrawal, he tells the bank teller he is John
Doe (a claim of identity). The bank teller asks to see a photo ID, so he
hands the teller his driver's license. The bank teller checks the license to
make sure it has John Doe printed on it and compares the photograph on
the license against the person claiming to be John Doe. If the photo and
name match the person, then the teller has authenticated that John Doe
is who he claimed to be.
Authorization
Authorization to access information and other computing services begins
with administrative policies and procedures. The policies prescribe what
information and computing services can be accessed, by whom, and
under what conditions. The access control mechanisms are then
configured to enforce these policies.
Implementation
The final phase of the progress process is the implementation of the new
system. This phase is culmination of the previous phases and will be
performed only after each of the prior phases has been successfully
completed to the satisfaction of both the user and quality assurance. The
tasks, comprise the implementation phase, include the installation of
hardware, proper scheduling of resources needed to put the system in to
introduction, a complete of instruction that support both the users and IS
environment.
Coding
This means program construction with procedural specification has
finished and the coding for the program begins:
Main emphasis while coding was on style so that the end result was an
optimized code.
The code has been written so that the definitions and implementation of
each function is contained in one file.
Naming convention
As the project size grows, so does complexity of resigning the purpose of
the variable. Thus the variable were given meaningful names, which
would help in understanding the context and the purpose of the variable.
The function names are also given meaningful names that can be easily
understood by the user.
Indentation
Judicious use of indentation can make the task of reading and
understanding a program much simpler. Indentation is an essential part of
a good program. If code id intended without thought it will seriously
affect the readability of the program.
The higher level statement like the definition of the variable, constants
and the function are intended, with each nested block intended, stating
their purpose in the code.
Blank line is also left between each function definition to make the code
look neat.
Indentation for each source file stating the purpose of the file is also
done.
Maintenance
Maintenance testing is that testing which is
performed to either identify equipment problems,
diagnose equipment problems or to confirm that
repair measures has been effective. It can be
performed at either the system level (e.g., the
HVAC system), the equipment level (e.g., the
blower in a HVAC line), or the component level
(e.g., a control chip in the control box for the
blower in the HVAC line).
Preventive maintenance
To make it simple:
Corrective maintenance
The idle time for production machines in a factory is mainly due to the
following reasons:
Lack of materials
Breakdowns
Taking into consideration only breakdown idle time it can be split in some
components:
Maintenance dead time - Time lost by machine operator waiting for the
machine to be repair by maintenance personnel, from the time they start
doing it until the moment they finish their task.
In the corrective environment the system has been conceived to reduce
the breakdown detection and diagnosis times and supply the adequate
information required to perform the repairing operations.
KEYWORDS: int, char, float, double, if, for, while, else, switch, struct,
printf, scanf, case, break, return, typedef, void
OPERATORS: +, ++, -, --, ||, *, ?, /, >, >=, <, <=, =, ==, &, &&.
BRACKETS: [ ], { }, ( ).
The output file will consist of all the tokens present in our input file along
with their respective token values.
CODE
/* Program to make lexical analyzer that generates the tokens......
#include<stdio.h>
#include<conio.h>
#include<string.h>
#include<ctype.h>
#define MAX 30
void main()
char str[MAX];
int state=0;
clrscr();
str[strlen(str)]=' ';
printf("\n\nAnalysis:");
while(str[i]!=NULL)
i++;
switch(state)
state=17;
startid=i;
} //identifiers
else if(isdigit(str[i]))
state=25; startcon=i;
//constant
//operator '-'
break;
break;
printf("\n\nif : Keyword");
state=0;
i--;
break;
break;
break;
break;
printf("\n\nwhile : Keyword");
state=0;
i--;
break;
break;
case 9: if(str[i]=='{' || str[i]==' ' || str[i]==NULL || str[i]=='(')
printf("\n\ndo : Keyword");
state=0;
i--;
break;
break;
break;
break;
printf("\n\nelse : Keyword");
state=0;
i--;
}
break;
break;
break;
printf("\n\nfor : Keyword");
state=0;
i--;
break;
case 17:
if(isalnum(str[i]) || str[i]=='_')
state=18; i++;
else if(str[i]==NULL||str[i]=='<'||str[i]=='>'||str[i]=='('||str[i]==')'||
str[i]==';'||str[i]=='='||str[i]=='+'||str[i]=='-') state=18;
i--;
break;
case 18:
endid=i-1;
printf(" ");
printf("\n\n%c", str[j]);
printf(" : Identifier");
state=0;
i--;
break;
//States for relational operator '<' & '<='
i--;
state=0;
break;
i--;
state=0;
break;
state=0;
break;
i--;
state=0;
break;
else
i--;
state=0;
break;
{
printf("\n\n== : Relational operator");
state=0;
i--;
break;
puts(str);
printf(" ");
printf("^");
state=99;
endcon=i-1;
printf(" ");
for(j=startcon; j<=endcon; j++)
printf("\n\n%c", str[j]);
printf(" : Constant");
state=0;
i--;
break;
startid=i;
state=0;
i--;
break;
state=0;
i--;
break;
i--;
break;
state=0;
i--;
break;
state=0;
i--;
break;
//Error State
i++;
printf("\n\nEnd of program");
END:
getch();
/* Output
Correct input
-------------
Analysis:
for : Keyword
( : Special character
x1 : Identifier
= : Assignment operator
0 : Constant
; : Special character
x1 : Identifier
<= : Relational operator
10 : Constant
; : Special character
x1 : Identifier
+ : Operator
+ : Operator
) : Special character
; : Special character
End of program
Wrong input
-----------
Analysis:
for : Keyword
( : Special character
x1 : Identifier
= : Assignment operator
0 : Constant
; : Special character
X1 : Identifier
*/
ADVANTAGES
AND
Disadvantages
OF LEXICAL
ANALYZER
ADVANTAGES
• Easier and faster development.
• More efficient and compact.
• Very efficient and compact.
DISADVANTAGES
• Done by hand.
• Development is complicate
CONCLUSION
Lexical analysis is a stage in compilation of any
program. In this phase we generate tokens from
the input stream of data. For performing this task
we need Lexical Analyzer.
• www.wikipedia.com