0% found this document useful (0 votes)
31 views

Computational Linguistics: Dr. Dina Khattab

The document provides an example of a finite state automaton with states Q={1,2,3}, initial state I={1}, final states F={3}, alphabet T={a,b}, and transitions E between states. It explains that

Uploaded by

Dalia Ahmed
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

Computational Linguistics: Dr. Dina Khattab

The document provides an example of a finite state automaton with states Q={1,2,3}, initial state I={1}, final states F={3}, alphabet T={a,b}, and transitions E between states. It explains that

Uploaded by

Dalia Ahmed
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Computational Linguistics

Lecture 2

Dr. Dina Khattab


[email protected]
Representations for languages
 We will discuss the two principal methods for defining
languages: the generator and the recognizer

 In particular we will focus on a particular class of


generators (grammars) and of recognizers (automata)

 Regular languages are the simplest formal languages:


• Their generators are the regular expressions
• Their recognizers are the finite state automata
2
Concepts and Notations
 Set: An unordered collection of unique elements
S1 = { a, b, c } S2 = { 0, 1, …, 19 } empty set: 
membership: x  S union: S1  S2 = { a, b, c, 0, 1, …, 19 }
universe of discourse: U subset: S1  U
complement: if U = { a, b, …, z }, then S1' = { d, e, …, z } = U - S1

Alphabet: A finite set of symbols


• Examples:
• S1 = { a, b } S2 = { Spring, Summer, Autumn, Winter }

 String/word: A sequence of zero or more symbols from an


alphabet
• The empty string: e
Concepts and Notations
Language: A set of strings over an alphabet
• Also known as a formal language; may not bear any resemblance to a
natural language, but could model a subset of one.
• The language comprising all strings over an alphabet  is written as: *
Graph: A set of nodes (or vertices), some or all of which may be
connected by edges.
• An example: – A directed graph example:

1 2 a c

3 b
Finite State
Automata (FSA)
5
Finite State Automata
 Language Recognition Problem:
Whether a word belonging to
language?

i.e. given a language description and a


string, is there an algorithm which will
answer yes or no correctly?
6
Finite State Automata
A finite state automaton is an abstract model of a
simple machine (or computer) i.e. a computational
device to solve the language recognition problem

The machine can be in a finite number of states. It


receives symbols as input, and the result of receiving a
particular input in a particular state moves the machine
to a specified new state.

Certain states are finishing states, and if the machine is


in one of those states when the input ends, it has ended
successfully (or has accepted the input). 7
FSA: Formal Definition
A Finite State Automaton (FSA) is a 5-tuple (Q, I, F, T, E) where:
Q = states a finite set;
I = initial states a nonempty subset of Q;
F = final states a subset of Q;
T = an alphabet;
E = edges a subset of Q  T  Q.

FSA can be represented by a labelled, directed graph


= set of nodes (some final/initial) +
directed arcs (arrows) between nodes +
each arc has a label from the alphabet. 2
a
Example: formal definition of A1 1 b
a
b
Q = {1, 2, 3}
3 b
I = {1}
F = {3}
T = {a, b}
E = { (1,a,2), (1,b,3), (2,b,3), (3,a,2), (3,b,3) }
What does it mean to accept
a string/language?
If the FSA is in a final (or accepting) state after all
input symbols have been consumed, then the string
is accepted (or recognized), otherwise it is rejected

2
e.x. String: abb a
1 b
a
Give other Examples! b
3 b
9
The language accepted by A1 is the
set of strings of a's and b's which end
in b, and in which no two a's are
adjacent a
2

1 b
a
b
3 b
10
Finite-state Automata
An FSA defines a regular language over an
alphabet :
•  is a regular language: q0

• Any symbol from  is a regular language:

 = { a, b, c} q0 b q1

• Two concatenated regular languages is a regular


language: b c
q0 q1 q0 q1

 = { a, b, c}
q0 b q1 c q2
FSA Example
Consider the following FSA
T: {0, 1}
Q: {s1, s2}
I: s1
F: s2 0 1
S1 S1 S2
E: S2 S2 S1 12
FSA Example
0 1

S1 S2

13
FSA Example
Determine which string is accepted and
which is rejected:
01101
 011011
 00000
 11111
 10101010 14
Assignment (due to 14 Oct.)
th

Consider the following FSA


T: {a, b, c}
Q: {s1, s2, s3}
I: s1
F: s2, s3 a b c
S1 S1 S2 S2
E: S2 S1 S2 S3 15
S3 S3 S1 S2
Determine which string is
accepted and which is rejected

abb
abba
bcbccc
caaabbc
16

You might also like