NLP | Extracting Named Entities
Last Updated :
26 Nov, 2022
Recognizing named entity is a specific kind of chunk extraction that uses entity tags along with chunk tags. Common entity tags include PERSON, LOCATION and ORGANIZATION. POS tagged sentences are parsed into chunk trees with normal chunking but the trees labels can be entity tags in place of chunk phrase tags. NLTK has already a pre-trained named entity chunker which can be used using ne_chunk() method in the nltk.chunk module. This method chunks a single sentence into a Tree. Code #1 : Using ne-chunk() on tagged sentence of the treebank_chunk corpusÂ
Python3
from nltk.corpus import treebank_chunk
from nltk.chunk import ne_chunk
ne_chunk(treebank_chunk.tagged_sents()[0])
Output :
Tree('S', [Tree('PERSON', [('Pierre', 'NNP')]), Tree('ORGANIZATION',
[('Vinken', 'NNP')]), (', ', ', '), ('61', 'CD'), ('years', 'NNS'),
('old', 'JJ'), (', ', ', '), ('will', 'MD'), ('join', 'VB'), ('the', 'DT'),
('board', 'NN'), ('as', 'IN'), ('a', 'DT'), ('nonexecutive', 'JJ'),
('director', 'NN'), ('Nov.', 'NNP'), ('29', 'CD'), ('.', '.')])
two entity tags are found: PERSON and ORGANIZATION. Each of these subtrees contains a list of the words that are recognized as a PERSON or ORGANIZATION. Â Â Code #2 : Method to extract named entities using leaves of all the subtreesÂ
Python3
def sub_leaves(tree, label):
return [t.leaves()
for t in tree.subtrees(
lambda s: label() == label)]
 Code #3 : using method to get all the PERSON or ORGANIZATION leaves from a treeÂ
Python3
tree = ne_chunk(treebank_chunk.tagged_sents()[0])
from chunkers import sub_leaves
print ("Named entities of PERSON : ",
sub_leaves(tree, 'PERSON'))
print ("\nNamed entities of ORGANIZATION : ",
sub_leaves(tree, 'ORGANIZATION'))
Output :
Named entities of PERSON : [[('Pierre', 'NNP')]]
Named entities of ORGANIZATION : [[('Vinken', 'NNP')]]
To process multiple sentences at a time, chunk_ne_sents() is used. In the code below, first 10 sentences from treebank_chunk.tagged_sents() are processed to get ORGANIZATION sub_leaves(). Â Â Code #4 : Let's understand chunk_ne_sents()Â
Python3
from nltk.chunk import chunk_ne_sents
from nltk.corpus import treebank_chunk
trees = chunk_ne_sents(treebank_chunk.tagged_sents()[:10])
[sub_leaves(t, 'ORGANIZATION') for t in trees]
Output :
[[[('Vinken', 'NNP')]], [[('Elsevier', 'NNP')]], [[('Consolidated', 'NNP'),
('Gold', 'NNP'), ('Fields', 'NNP')]], [], [], [[('Inc.', 'NNP')],
[('Micronite', 'NN')]], [[('New', 'NNP'), ('England', 'NNP'),
('Journal', 'NNP')]], [[('Lorillard', 'NNP')]], [], []]
Similar Reads
Extracting Numeric Entities using Duckling in Python
Wit.ai is a natural language processing (NLP) platform that allows developers to build conversational experiences for various applications. One of the key features of Wit.ai is its entity extraction system, which can recognize and extract entities from user input. One of the key features provided by
4 min read
Named Entity Recognition in NLP
In this article, we'll dive into the various concepts related to NER, explain the steps involved in the process, and understand it with some good examples. Named Entity Recognition (NER) is a critical component of Natural Language Processing (NLP) that has gained significant attention and research i
6 min read
Keyword Extraction Methods in NLP
Keyword extraction is a vital task in Natural Language Processing (NLP) for identifying the most relevant words or phrases from text, and enhancing insights into its content. The article explores the basics of keyword extraction, its significance in NLP, and various implementation methods using Pyth
11 min read
Information Extraction in NLP
Information Extraction (IE) in Natural Language Processing (NLP) is a crucial technology that aims to automatically extract structured information from unstructured text. This process involves identifying and pulling out specific pieces of data, such as names, dates, relationships, and more, to tran
6 min read
NLP | Location Tags Extraction
Different kind of ChunkParserI subclass can be used to identify the LOCATION chunks. As it uses the gazetteers corpus to identify location words. The gazetteers corpus is a WordListCorpusReader class that contains the following location words: Country names U.S. states and abbreviations Mexican stat
2 min read
NLP | Named Entity Chunker Training
Self Named entity chunker can be trained using the ieer corpus, which stands for Information Extraction: Entity Recognition. The ieer corpus has chunk trees but no part-of-speech tags for the words, so it is a bit tedious job to perform. Named entity chunk trees can be created from ieer corpus using
2 min read
Feature Extraction Techniques - NLP
Introduction : This article focuses on basic feature extraction techniques in NLP to analyse the similarities between pieces of text. Natural Language Processing (NLP) is a branch of computer science and machine learning that deals with training computers to process a large amount of human (natural)
10 min read
NLP | Proper Noun Extraction
Chunking all proper nouns (tagged with NNP) is a very simple way to perform named entity extraction. A simple grammar that combines all proper nouns into a NAME chunk can be created using the RegexpParser class. Then, we can test this on the first tagged sentence of treebank_chunk to compare the res
2 min read
Relationship Extraction in NLP
Relationship extraction in natural language processing (NLP) is a technique that helps understand the connections between entities mentioned in text. In a world brimming with unstructured textual data, relationship extraction is an effective technique for organizing information, constructing knowled
10 min read
What is Data Extraction?
Extracting data is keÂy in managing and analyzing information. As firms collect stacks of data from different placeÂs, finding important info becomes crucial. We gatheÂr specific info from different placeÂs like databases, files, weÂbsites, or APIs to analyze and proceÂss it better. Doing this helps
10 min read