Artificial Intelligence
Natural Language Processing
Lecture 10
(3 November, 1999) Tralvex (Rex) Yeap MAAI MSCS University of London
Content: Natural Language Processing
Quick Review of Lecture 9 Introduction to NLP NL and Computer Language Motivations for NLP A Generic NL System Architecture Language and Knowledge Five Processing Stages in a NLP
System
(1) Phonological Analysis (2) Morphological Analysis (3) Syntactic Analysis (4) Semantic Analysis (5) Pragmatic Analysis
NLP History
Major NLP Accomplishments Real World NLP Applications
MT: Deluxe Universal Translator IR: Buzzcity IR: Altavista Search Engine IV: Cartias Themescape Autonomous interacting bots: Elizas grand-daughter - Lisa Grammer Checking Systems: MS Word Grammer Checker
Class Activity: Real-world Paper
Reading
Students Mini Research Presentation
by Group E
Whats in Store for Lecture 11
2
Quick Review on Lecture 9
Introduction to Planning Examples of Planning Systems Blocks World Assumptions of the "Standard" Partial Ordered Planning
- Introduction - An Example - Interpretation
Partially ordered plans vs. Non-linear
AI Planning Paradigm
STRIPS - Linear Planner STRIPS Example State Space Searching
- Progression Planners - Regression Planners
planning
Shortcomings of AI Planning in
General
Class Activity: Real-world Paper
Reading
Plan Space Searching
3
Introduction to NLP
A natural language is a human spoken language, such as
English.
One of the aims of AI is to build machines that can
"understand" commands in natural language, written or spoken.
A computer that can do this requires very powerful
hardware and sophisticated software.
At the present time, this is at the early stages of
development.
4
Introduction to NLP (cont)
It is not an easy task to teach a person or computer a
natural language.
The main problems are syntax (the rules governing the way
in which words are arranged), and understanding context to determine the meaning of a word.
To interpret even simple phrases requires a vast amount of
knowledge.
The basic goal of Natural language Processing is to enable a
person to communicate with a computer in a language that they use in their everyday life.
5
Natural Language and Computer Language
Natural language are those that we use for communicating
with each other, eg. English, French, Japanese, etc.
Natural language are expressive and easy for us to use. Computer languages are those that we use for controlling
the operations of a computer, eg. Prolog, C, C++, Java, etc.
Computer languages are easy for a computer to understand,
but they are not expressive.
6
Motivations for NLP
Traditional
Intellectual challenge for AI Easier human-computer dialogue
Recent
Information Revolution - Knowledge-Based Economy Information Highway, World Wide Web Globalization
NLP History
1950-60: Machine Translation
Georgetown University MT Experiment (1952-54): Crude models of mainly word-by-word translation with minimal syntactic analysis.
1960-70: Semantics-less
String analysis models (1965) based on Chomsky's Transformational Grammar Natural Language Dialogue - ELIZA (Weizenbaum 1966): Keyword-based NL conversation
NLP History (cont)
1970-80: AI takes the lead
Development of NL-mediated understanding systems Augmented Transition Networks (Woods 1971) - A computationally feasible grammar formalism with the power of transformational grammar. Used for LUNAR (Woods 197?): System able to answer NL questions about moon rock samples. SHRDLU (Winograd 1971): NL dialogue with a simulated robot operating in a simple blocks world". System able to act and plan as well as answer questions. MARGIE (Schank 1973): NL understanding by making inferences based on conceptual knowledge.
1980-90: Grammar Formalisms
Definite Clause Grammars (DCGs) - Parsing based on logic programming (Pereira and Warren 1980). Unication and Constraint Based Grammar Dialogue Systems Use of planning models for both understanding (Perrault and Cohen) and generation (Appelt 1985). 9
NLP History (cont)
1990-Present: Integrated Language Engineering.
Statistical Methods - (1) Performance models (2) Highly empirical evaluation criteria. Multimodality - (1) Integration of Language and Speech (2) Large-scale language/speech projects. Multilinguality - (1) Multi Lingual Information Society (2) Machine Translation (3) Internationalisation of software (4) European Dimension. Language Resources - (1) Lexicons, Grammars (2) Text and speech corpora (3) Representation Standards
10
Major NLP Accomplishments
Chomsky (1957) Syntactic Structures Weizenbaum (1966), ELIZA Woods (1967), Procedural semantics Thorne et al. and Woods (1968-70), ATNs Winograd (1970), Shrdlu Colby, Weber & Hilf, 1971; Colby, 1975, PARRY Wilks (1972), Preference semantics Woods et al. (1972), LSNLIS / Lunar Charniak (1972), Frames and demons Wilks (1973), Stanford machine translation project Montague (1973) IL semantics (Montague Grammar) in PTQ Grosz (1977), Focus in task-oriented dialogues
Marcus (1977), Deterministic parsing Davey (1978) Cohen, Phil (1979), Planning speech acts Allen (1980), Understanding speech acts McDonald (1980), MUMBLE Heim/Kamp (1981) Discourse Representation Theory McKeown (1982), TEXT Appelt (1982), KAMP (Integration of Functional Grammar with Discourse Plans) Shieber (1984) Noncontextfreeness of NL syntax proven Pollack (1986), Plan inference Mann & Thompson (1987), Rhetorical Structure Theory
11
Real World NLP Applications
Machine Translation Information Retrieval / NL interface Information Visualization Autonomous interacting bots Grammer Checking Systems Speech Recognition Systems / Speech Synthesizers
Document Summary Systems
12
Real World NLP Applications
Machine Translation: Deluxe Universal Translator
Able to translate text across 33 languages [Link]
13
Real World NLP Applications
Information Retrieval: Buzzcity ([Link]
Automatically tracks/report user specify interest over Internet.
14
Real World NLP Applications
Information Retrieval: Altavista Search Engine
Search engine supporting natural language query
15
Real World NLP Applications
Information Visualization: Cartias Themescape
Unsupervised self visualization of a lengthy document - Starrs report [Link] [Link] 16
Real World NLP Applications
Autonomous interacting bots: Elizas grand-daughter - Lisa
A short conversation with Lisa. [Link] ( [Link] is available too)
17
Real World NLP Applications
Grammer Checking Systems: MS Word Grammer Checker
18
A Generic NL System Architecture
19
Language and Knowledge
20
Five Processing Stages in a NLP System
Phonological Analysis
Morphological Analysis Syntactic Analysis Semantic Analysis Pragmatic Analysis
21
Five Processing Stages in a NLP System
(1) Phonological Analysis
Phonetics: deals with the physical building blocks of a
language sound system.
eg. sounds of k, t and e in kite
Phonology: organisation of speech sounds within a
language.
eg. (1) different k sounds in kite vs coat (2) different t and p sounds in top vs pot
22
Five Processing Stages in a NLP System
(2) Morphological Analysis
Morphology is the structure of words.
It is concerned with inflection1. It is also concerned with derivation of new words from
existing ones, eg. lighthouse (formed from light & house).
In NLP, words are also known as lexicon items and a
set of words form a lexicon.
1
The various forms of the same basic word. eg. run-ran, dog-dogs, etc.
23
Five Processing Stages in a NLP System
(2) Morphological Analysis: Why is it important?
Any NL analysis system needs a lexicon {a module that tells what
words there are and what properties they have.
Simplest model is a full form dictionary that lists every word
explicitly.
Simply expanding the dictionary fails to take advantages of the
regularities.
No dictionary contains all the words one is likely to encounter in
real input. - Languages with highly productive morphology (e.g. Finnish, where a verb can have many thousands of forms.) - Noun Compounding
24
Five Processing Stages in a NLP System
(2) Morphological Analysis: The Lexicon
The black box behaviour of Full form lexicon. Lexicon with morphological
the lexicon is to relate words to different kinds of information, as shown by the outer square.
analysis.
The lexicon is a repository
of all the exceptions in the language.
A lexicon is also as the
dictionary
25
Five Processing Stages in a NLP System
(3) Syntactic Analysis
Syntactic analysis is concerned Syntax tree is assigned by a
with the sentences.
construction
of
grammer and a lexicon.
Lexicon indicates syntactic
Syntactic structure indicates how
category of words.
Grammar (typically Context Free
the words are related to each other.
Grammer) specifies legitimate concatenations of constituents.
26
Five Processing Stages in a NLP System
(4) Semantic Analysis
Semantic analysis is concerned Unfortunately, many words have
with the meaning of the language.
This stage uses the meanings of the
several meanings, for example, the word diamond might have the following set of meanings:
(1) a geometrical shape with four equal sides. (2) a baseball field (3) an extremely hard and valuable gemstone
word to extend and perhaps disambiguate the result returned by the syntactic parse.
The first step in any semantic
To select the correct meaning for
the word diamond in the sentence
Joan saw Susans diamond shimmering from across the room.
processing system is to look up the individual words in a dictionary (or lexicon) and extract their meanings.
It is necessary to know that neither geometrical shapes nor baseball fields shimmer, whereas gemstones do (process of elimination).
27
Five Processing Stages in a NLP System
(4) Semantic Analysis (cont)
The process of determining the correct meaning of an individual
word is call word sense disambiguation or lexical disambiguation.
It is done by associating, with each word in the lexicon, information
about the contexts in which each of the words senses may appear.
Each of the words in a sentence can serve as part of the context in
which the meanings of the other words must be determined.
eg. The baseball field interpretation of diamond could be marked as a LOCATION. Then the meaning of diamond in the sentences Ill meet you at the diamond could easily be determined if the fact at requires a TIME or a LOCATION as its object were recorded as part of the lexical entry for at.
28
Five Processing Stages in a NLP System
(4) Semantic Analysis (cont)
Other useful semantic markers are PHYSICAL-OBJECT ANIMATE-OBJECT ABSTRACT-OBJECT
Using these markers, the correct meaning of diamond in the sentence I dropped my diamond can be computed. As part of the lexical entry, the verb drop will specify that its object must be a PHYSICAL-OBJECT.
Unfortunately, to solve lexical disambiguation problem complete, it
becomes necessary to introduce more and more finely grained semantic markers.
29
Five Processing Stages in a NLP System
(4) Semantic Analysis (cont)
Finally, we have to process the text Third, conceptual parsing in which
at sentence level. There are four approaches to this.
First, semantic grammars which
syntactic and semantic knowledge are combined into a single interpretation system that is driven by the semantic knowledge.
Lastly, approximately compositional
combine syntactic, semantic and pragmatic knowledge into a single set of rules in the form of a grammar.
Second, case grammar in which the
semantic interpretation, in which semantic processing is applied to the result of performing a syntactic parse.
For details on each method, refer to
structure that is built by the parser contains some semantic information, although further interpretation may also be necessary.
Rich & Knight AI book, pg 400-414
30
Five Processing Stages in a NLP System
(5) Pragmatic Analysis
This is an additional stage of analysis concerned with the pragmatic
use of the language.
This is important in the understanding of texts and dialogues.
There are many important relationships that may hold between
phrases and parts of their discourse context, as outlined below.
Identical entities. Consider
- Bill had a red balloon. - John wanted it.
The word it should be identified as referring to the red balloon. References such as this are call anaphoric or anaphora.
31
Five Processing Stages in a NLP System
(5) Pragmatic Analysis (cont)
Parts of entities. Consider the text
- Tracy opened the book she just bought. - The title page was torn.
The phrase the title page should be recognized as being part of the book tat was just bought.
Parts of action. Consider the text
- Lynn went on a business trip to New York. - She left on an early morning flight.
Taking a flight should be recognized as part of going on a trip.
Entities involving in actions. Consider the text
- Hir house was broken into last week. - They took the TV and the stereo.
The pronoun they should be recognized as referring to the burglars who broke into the house.
32
Five Processing Stages in a NLP System
(5) Pragmatic Analysis (cont)
Elements of sets. Consider the text
- The stickers we have in stocks are stars, the moons, item and a flag. - Ill take two moons.
The moons in the 2nd sentences should be understood to be some of the moons mentioned in the 1st sentence. Notice that to understand the 2nd sentence at all requires that we use the context of the first sentence to establish that the word moons means moon stickers.
Names of individuals. Consider the text
- Dan went to the movies.
Dan should be understood to be some person named Dan. Although there are many, the speaker had one particular one in mind and the discourse context should tell us which.
33
Five Processing Stages in a NLP System
(5) Pragmatic Analysis (cont)
Causal chains. Consider the text - There was a big snow storm yesterday. - The schools were closed today.
The snow should be recognized as the reason that the schools were closed.
Planning sequences. Consider the text
- Margaret wanted a new car. - She decided to get a job.
Sallys sudden interest in a job should be recognized as arising out of her desire for a new car and thus for the money to buy one.
Illocutionary1 force. Consider the sentence
- It sure is cold in here.
In many circumstances, this sentence should be recognized as having, as its intended effect, that the hearer should do something like close the window or turn up the thermostat.
1
relating to or being the communicative effect (as commanding or requesting) of an utterance <"There's a 34 snake under you" may have the illocutionary force of a warning>
Five Processing Stages in a NLP System
(5) Pragmatic Analysis (cont)
Implicit presupposition. Consider the query
- Did Adam fail CS310?
The speakers presupposition, including the fact that CS310 is a valid course, that Adam is a student, and that Adam took CS310, should be recognized so that if any of them is not satisfied, the speaker can be informed.
In order be able to recognized these kind of relationships among sentences,
a great deal of knowledge about the world being discussed is required.
Programs that do multiple-sentence understanding rely either on large
knowledge bases or on strong constraints on the domain of discourse so that only a limited knowledge based is necessary.
35
Class Activity: Real-world Paper Reading
Paper 1. Intelligent Text Processing, and Intelligent Tradecraft
Introduction Paradigm Shift Patent Searching Competitive Intelligence
Intelligence Tradecraft and Visualization
Automated Technology Trend Spotting Intelligent Agents
Breaking the Language Barrier
Never Possible Before
36
Class Activity: Real-world Paper Reading
Paper 2. Comparing DIALOG, TARGET, and DR-LINK
Introduction Three Search Systems
NLP vs Boolean
Changing Perspectives
Implications
37
Class Activity: Real-world Paper Reading
Paper 3. Text Mining Technology: Turning Information into Knowledge
Mining Text Text Analysis Functions Language Identification, Feature Extraction, Clustering, Categorization, Text Search Functions, Text Search Engine Scenarios A Portfolio of Technology Clustering and visualization applications, Prompt Query Refinement, Lexical Navigation, Advanced Feature Extraction, Feature Extraction for non-English language, Text Classification Technologies
38
Whats in Store for Lecture 11
Natural Language Processing II
Context Free Grammer Chomskys Grammer Hierarchy Semantics and -calculus
39
Students Mini Research Presentation by Group E
40
End of Lecture 10
Good Night.