Named Entity
Recognition
I N T R O D U C T I O N T O N AT U R A L L A N G U A G E P R O C E S S I N G I N P Y T H O N
Katharine Jarmul
Founder, kjamistan
What is Named Entity Recognition?
NLP task to identify important named entities in the text
People, places, organizations
Dates, states, works of art
... and other categories!
Can be used alongside topic identi cation
... or on its own!
Who? What? When? Where?
INTRODUCTION TO NATURAL LANGUAGE PROCESSING IN PYTHON
Example of NER
(Source: Europeana Newspapers (h p://[Link]-
[Link]))
INTRODUCTION TO NATURAL LANGUAGE PROCESSING IN PYTHON
nltk and the Stanford CoreNLP Library
The Stanford CoreNLP library:
Integrated into Python via nltk
Java based
Support for NER as well as coreference and dependency
trees
INTRODUCTION TO NATURAL LANGUAGE PROCESSING IN PYTHON
Using nltk for Named Entity Recognition
import nltk
sentence = '''In New York, I like to ride the Metro to
visit MOMA and some restaurants rated
well by Ruth Reichl.'''
tokenized_sent = nltk.word_tokenize(sentence)
tagged_sent = nltk.pos_tag(tokenized_sent)
tagged_sent[:3]
[('In', 'IN'), ('New', 'NNP'), ('York', 'NNP')]
INTRODUCTION TO NATURAL LANGUAGE PROCESSING IN PYTHON
print(nltk.ne_chunk(tagged_sent))
(S
In/IN
(GPE New/NNP York/NNP)
,/,
I/PRP
like/VBP
to/TO
ride/VB
the/DT
(ORGANIZATION Metro/NNP)
to/TO
visit/VB
(ORGANIZATION MOMA/NNP)
and/CC
some/DT
restaurants/NNS
rated/VBN
well/RB
by/IN
(PERSON Ruth/NNP Reichl/NNP)
./.)
INTRODUCTION TO NATURAL LANGUAGE PROCESSING IN PYTHON
Let's practice!
I N T R O D U C T I O N T O N AT U R A L L A N G U A G E P R O C E S S I N G I N P Y T H O N
Introduction to
SpaCy
I N T R O D U C T I O N T O N AT U R A L L A N G U A G E P R O C E S S I N G I N P Y T H O N
Katharine Jarmul
Founder, kjamistan
What is SpaCy?
NLP library similar to gensim , with di erent implementations
Focus on creating NLP pipelines to generate models and
corpora
Open-source, with extra libraries and tools
Displacy
INTRODUCTION TO NATURAL LANGUAGE PROCESSING IN PYTHON
Displacy entity recognition visualizer
(source: h ps://[Link]/displacy-ent/)
INTRODUCTION TO NATURAL LANGUAGE PROCESSING IN PYTHON
import spacy
nlp = [Link]('en_core_web_sm')
[Link]
<[Link] at 0x7f76b75e68b8>
doc = nlp("""Berlin is the capital of Germany;
and the residence of Chancellor Angela Merkel.""")
[Link]
(Berlin, Germany, Angela Merkel)
print([Link][0], [Link][0].label_)
Berlin GPE
INTRODUCTION TO NATURAL LANGUAGE PROCESSING IN PYTHON
Why use SpaCy for NER?
Easy pipeline creation
Di erent entity types compared to nltk
Informal language corpora
Easily nd entities in Tweets and chat messages
Quickly growing!
INTRODUCTION TO NATURAL LANGUAGE PROCESSING IN PYTHON
Let's practice!
I N T R O D U C T I O N T O N AT U R A L L A N G U A G E P R O C E S S I N G I N P Y T H O N
Multilingual NER
with polyglot
I N T R O D U C T I O N T O N AT U R A L L A N G U A G E P R O C E S S I N G I N P Y T H O N
Katharine Jarmul
Founder, kjamistan
What is polyglot?
NLP library which uses word
vectors
Why polyglot ?
Vectors for many di erent
languages
More than 130!
INTRODUCTION TO NATURAL LANGUAGE PROCESSING IN PYTHON
Spanish NER with polyglot
from [Link] import Text
?ext = """El presidente de la Generalitat de Cataluña,
Carles Puigdemont, ha afirmado hoy a la alcaldesa
de Madrid, Manuela Carmena, que en su etapa de
alcalde de Girona (de julio de 2011 a enero de 2016)
hizo una gran promoción de Madrid."""
ptext = Text(text)
[Link]
[I-ORG(['Generalitat', 'de']),
I-LOC(['Generalitat', 'de', 'Cataluña']),
I-PER(['Carles', 'Puigdemont']),
I-LOC(['Madrid']),
I-PER(['Manuela', 'Carmena']),
I-LOC(['Girona']),
I-LOC(['Madrid'])]
INTRODUCTION TO NATURAL LANGUAGE PROCESSING IN PYTHON
Let's practice!
I N T R O D U C T I O N T O N AT U R A L L A N G U A G E P R O C E S S I N G I N P Y T H O N