0% found this document useful (0 votes)
26 views

Li18 Lecture 1 Slides (2021)

Uploaded by

Mary Magdalene
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Li18 Lecture 1 Slides (2021)

Uploaded by

Mary Magdalene
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Lecture 1: Introduction to Computational Linguistics

Nigel Collier
Faculty of Modern and Medieval Languages and Linguistics

Li18 Computational Linguistics

1
Summary

1. Course Admin.

2. What is Computational Linguistics?

3. Language Models?

4. Complexity of Language Tasks

2
Course Admin.

Course Supervisors for Undergraduates

• Should be in touch with you by the end of the first week of term - if
not then please contact me (nhc30)

3
Course Admin.

Course Supervisors

Carlos Balhana ceb81


Alan Ansell aja63
Andrew Caines apc38
Nigel Collier nhc30
Evgeniia Razumovska er563
Olga Majeswska om304
Ulla Petti ump20
Fangyu Liu fl399

4
Course Admin.

Course Textbook

• In 2018 I updated the course to bring it up to speed with the latest


developments that are shaping the field

• The main course text book remains as Speech and Language


Processing by Daniel Jurafsky and James Martin (‘J&M’)

• Second Edition. International Edition. 2008 - There are 4


copies in the library with 2 on over night loan. Ask your DoS to
get a copy for the college library

5
Course Admin.

Make sure you know which edition/version you’ve got

You’re Be mindful
ready to
go

6
Course Admin.

The lecture notes and slides will be drawing on new material

• Third Edition. Online. In draft - available online at


https://web.stanford.edu/~jurafsky/slp3/

• Page references in the lecture notes will point to the Second


edition. Where necessary I will indicate chapter titles for new
material in the online Third edition

• Recommended reading and Going Further in the lecture notes


will point to published papers. Check Google Scholar!

• Feedback is very much encouraged!

7
Undergraduates – Part IIA

Mathieu Barrier mb2407 Suchir Salhan sas245


Tabassum Chowdhury tc592 Emily Shen eys23
Georgia Clothier gc629 Alice Sizer as2999
Benjamin Conway bpc41 Jadd Virji jv427
Jacob Davies jed74 Katherine Wu jw2134
Beatrice Greenhalgh beg31
Ben Hunt bsh35
Harriet Innes hi257
Madeleine Jaeger maj70
Eve James emj42
Mirriam Lay ml968
Rosa Millard ram204
Jasper Pennings jcp75
Alex Provost awbp2
Charis Saer cas240

8
Undergraduates – IIB, MML, Erasmus

Patrick Farnworth IIB prf24


Henrietta Manning IIB hejm3
Anastasia Karamzina MML aak62
Sandra Perosa MML sp924

9
Course Admin.

Other Undergraduates

If you haven’t seen your name, please email me (nhc30).

TAL MPhils

Welcome onboard!

Other Language Science Interdisciplinary Programme MPhils

Please email me (nhc30).

10
Study and Supervisions

• Lecture notes and slides will be available on Moodle before the


lecture

• Recommended Reading in lecture notes to extend your


understanding (focus on quality and accessibility)

• Going Further - not obligatory - to take your understanding of the


subject beyond the course (focus on quality and influence on the
field. Warning: no boundary on technical accessibility)

• Post-lecture exercises and pre-lecture exercises to help you study

• Starred exercises to be handed in for supervisions

• No mathematical or programming ability required

• Maths crib sheet available on Moodle

11
Michaelmas Overview

12
What Topics Changed in 2021?

• Lecture 2: added in a subsection on byte-pair encoding;


• Lecture 6: Named entity recognition is now added to part of speech
as an example of sequence labelling;
• Lecture 7: Treebanking has been moved here to show how human
labelled corpora can be useful;
• Updated Recommended and Going Further references
• Python for Computational Linguistics becomes integrated into the
course!

13
Undergraduates: marking and examinations in 2021/22

20% Python Lab practicals


(assessed in Weeks 7 of
Michaelmas and Lent)

80% End of year written


examination

14
Information about Programming

Python is one of the most popular and well supported programming languages used
for Natural Language Processing. Python for Computational Linguists – self-paced
Jupyter notebooks:

https://github.com/cambridgeltl/python4cl

15
Information about Programming

Undergraduates: Python Programming for Linguists course is now


a core part of Li18 – self-paced Jupyter notebooks supported by 6
hours of labs per term in Michaelmas and Lent. See Moodle for
how to get started. First lab is on Tuesday October 12th from 1pm to
3pm. Assessments in weeks 7 of Michaelmas and Lent.

MPhils: Python Programming for Linguists course is voluntary and


no credits but recommended. Can be pursued as a self-paced
activity online from home. Again, see Moodle for how to get started.

16
Python for Computational Linguists

• A self-paced course on Python specially designed for Li18 students


• Uses Jupyter Notebooks
• Trialled on a volunteer basis in 2019 and 2020, now an essential part of
the course
• Two sets of modules – one in Michaelmas and one in Lent
• Expect 7 to 12 hours of work per term
• Full instructions on Moodle

17
Summary

1. Course Admin.

2. What is Computational Linguistics?

3. Language Models?

4. Complexity of Language Tasks

18
What is Computational Linguistics?

• A language science that involves building computational models of


languages

• A computational model can refer to any formally (mathematically)


specified model that describes a language phenomenon

19
Computational Linguistics is a Multi-disciplinary Field

20
Computational Linguistics Splits into Two Broad Areas

Natural Language Processing


• Construction of language models for use in computational tasks and
applications, e.g. Machine Translation (MT), Questions Answering (QA)
or Conversational Agents
• A branch of computer science (’natural’ language as opposed to artificial
programming languages)
• Our focus in this course

Computational Cognitive Linguists


• Construct language models to further our understanding into the
cognition of language
• Includes computational psycholinguists and computational neurolinguists

21
Summary

1. Course Admin.

2. What is Computational Linguistics?

3. Language Models?

4. Complexity of Language Tasks

22
Natural Language Analysis

The models used in NLP are used to automatically analyse language to produce
the possible structures/annotations that you have been taught to think about.

Consider some linguistic ambiguities you have encountered ...

o Morphology
o Syntax
o Semantics
o Pragmatics
o ...

You have been learning to associate structure (or annotation) to linguistic units
and in cases of ambiguity, demonstrating that there was more than one possible
structure

23
Combining Language Models

From a computational perspective language is full of ambiguity. A sentence may


have a high number of possible meanings if it contains a number of different
types of ambiguity.

I made her duck

24
Combining Language Models

I made her duck

1. I cooked waterfowl for her


2. I cooked waterfowl belonging to her
3. I created the (plaster?) duck she owns
4. I caused her to quickly lower her head
5. I turned her into a duck

Several types of ambiguity combine to cause many meanings:

• morphological (her can be a dative pronoun or possessive pronoun and


duck can be a noun or a verb)
• syntactic (make can behave both transitively and ditransitively; make can
select a direct object or a verb)
• semantic (make can mean create, cause, cook, ...)

25
Combining Language Models

What types of ambiguity are we seeing in these examples?

At the party there were young men and women.

My neighbor’s hat was taken by the wind. He tried to catch it.

Thank you for not eating or playing music without earphones.

Doctor testifies in horse suit.

26
Summary

1. Course Admin.

2. What is Computational Linguistics?

3. Language Models?

4. Complexity of Language Tasks

27
Complexity of Language Tasks and Applications

* Thanks to Dan Jurafsky for this visual.

28
Complexity of Language Tasks

Sentiment Analysis – Classifying Product Reviews

"... Julie Delpy is far too good for this movie. She imbues Serafine with spirit,
spunk, and humanity. This isn’t necessarily a good thing, since it prevents us
from relaxing and enjoying AN AMERICAN WEREWOLF IN PARIS as a
completely mindless, campy entertainment experience. Delpy’s injection of class
into an otherwise classless production raises the spectre of what this film could
have been with a better script and a better cast ... She was radiant, charismatic,
and effective ...“

- "a good actor trapped in a bad movie" from Po Bang et al. (2002).

Pang, B., Lee, L., & Vaithyanathan, S. (2002, July). Thumbs up?: sentiment classification using
machine learning techniques. In Proceedings of the ACL-02 conference on Empirical methods in
natural language processing-Volume 10 (pp. 79-86)

29
Complexity of Language Tasks

What features of the text help to predict the number of stars? Are the features
hard to identify and disambiguate?

Sentiment Analysis – Classifying Product Reviews

"... Julie Delpy is far too good for this movie. She imbues Serafine with spirit,
spunk, and humanity. This isn’t necessarily a good thing, since it prevents us
from relaxing and enjoying AN AMERICAN WEREWOLF IN PARIS as a
completely mindless, campy entertainment experience. Delpy’s injection of class
into an otherwise classless production raises the spectre of what this film could
have been with a better script and a better cast ... She was radiant, charismatic,
and effective ...“

30
Complexity of Language Tasks

Sentiment Analysis – Classifying Product Reviews

"... Julie Delpy is far too good for this movie. She imbues Serafine with spirit,
spunk, and humanity. This isn’t necessarily a good thing, since it prevents us
from relaxing and enjoying AN AMERICAN WEREWOLF IN PARIS as a
completely mindless, campy entertainment experience. Delpy’s injection of class
into an otherwise classless production raises the spectre of what this film could
have been with a better script and a better cast ... She was radiant, charismatic,
and effective ...“

31
Complexity of Language Tasks

Question Answering: Alexa, Amazon’s virtual assistant

32
Complexity of Language Tasks

Information Extraction about rocket launches

33
Adherence to Linguistic Theory

• The field of computational linguistics has not derived directly from


traditional linguistics.

• It is an interdisciplinary subject that is as closely related to mathematics


and computer science as it is to linguistics.

• The extent to which computational models of language draw directly from


linguistic theory is very varied.

34
What Types of Language Models Will We Look At?

• There are many different types of language model and ways of describing
them.

• The choice of the model will depend on the linguistic unit being
described, and often the task to which it is applied.

• In this course we will look at: rule-based models, finite state machines,
(lexical and context-free grammar) statistical models, neural models.

35
Exercises (see Lecture Notes for details)

Post-Lecture Exercise

1. Read J&M (2nd edition), Chapter 1.

Pre-Lecture Exercises

1. Read about Eliza in Weizenbaum (1966) and try it online. Think about the
Process that Eliza uses to identify keywords and transform them into
responses. What linguistic knowledge would be necessary to make it more
proficient in its task?

36

You might also like