NLP | Location Tags Extraction Last Updated : 26 Feb, 2019 Comments Improve Suggest changes Like Article Like Report Different kind of ChunkParserI subclass can be used to identify the LOCATION chunks. As it uses the gazetteers corpus to identify location words. The gazetteers corpus is a WordListCorpusReader class that contains the following location words: Country names U.S. states and abbreviations Mexican states Major U.S. cities Canadian provinces LocationChunker class looking for words that are found in the gazetteers corpus by iterating over a tagged sentence. It creates a LOCATION chunk using IOB tags when it finds one or more location words. The IOB LOCATION tags are produced in the iob_locations() and the parse() method converts the IOB tags to Tree. Code #1 : LocationChunker class Python3 1== from nltk.chunk import ChunkParserI from nltk.chunk.util import conlltags2tree from nltk.corpus import gazetteers class LocationChunker(ChunkParserI): def __init__(self): self.locations = set(gazetteers.words()) self.lookahead = 0 for loc in self.locations: nwords = loc.count(' ') if nwords > self.lookahead: self.lookahead = nwords Code #2 : iob_locations() method Python3 1== def iob_locations(self, tagged_sent): i = 0 l = len(tagged_sent) inside = False while i < l: word, tag = tagged_sent[i] j = i + 1 k = j + self.lookahead nextwords, nexttags = [], [] loc = False while j < k: if ' '.join([word] + nextwords) in self.locations: if inside: yield word, tag, 'I-LOCATION' else: yield word, tag, 'B-LOCATION' for nword, ntag in zip(nextwords, nexttags): yield nword, ntag, 'I-LOCATION' loc, inside = True, True i = j break if j < l: nextword, nexttag = tagged_sent[j] nextwords.append(nextword) nexttags.append(nexttag) j += 1 else: break if not loc: inside = False i += 1 yield word, tag, 'O' def parse(self, tagged_sent): iobs = self.iob_locations(tagged_sent) return conlltags2tree(iobs) Code #3 : use the LocationChunker class to parse the sentence Python3 1== from nltk.chunk import ChunkParserI from chunkers import sub_leaves from chunkers import LocationChunker t = loc.parse([('San', 'NNP'), ('Francisco', 'NNP'), ('CA', 'NNP'), ('is', 'BE'), ('cold', 'JJ'), ('compared', 'VBD'), ('to', 'TO'), ('San', 'NNP'), ('Jose', 'NNP'), ('CA', 'NNP')]) print ("Location : \n", sub_leaves(t, 'LOCATION')) Output : Location : [[('San', 'NNP'), ('Francisco', 'NNP'), ('CA', 'NNP')], [('San', 'NNP'), ('Jose', 'NNP'), ('CA', 'NNP')]] Comment More infoAdvertise with us Next Article NLP | Location Tags Extraction mohit gupta_omg :) Follow Improve Article Tags : Python NLP AI-ML-DS Python-nltk Natural-language-processing +1 More Practice Tags : python Similar Reads Information Extraction in NLP Information Extraction (IE) in Natural Language Processing (NLP) is a crucial technology that aims to automatically extract structured information from unstructured text. This process involves identifying and pulling out specific pieces of data, such as names, dates, relationships, and more, to tran 6 min read Relationship Extraction in NLP Relationship extraction in natural language processing (NLP) is a technique that helps understand the connections between entities mentioned in text. In a world brimming with unstructured textual data, relationship extraction is an effective technique for organizing information, constructing knowled 10 min read Extracting locations from text using Python In this article, we are going to see how to extract location from text using Python. While working with texts, the requirement can be the detection of cities, regions, states, and countries and relationships between them in the received text. This can be very useful for geographical studies. In this 3 min read NLP | Word Collocations Collocations are two or more words that tend to appear frequently together, for example - United States. There are many other words that can come after United, such as the United Kingdom and United Airlines. As with many aspects of natural language processing, context is very important. And for coll 3 min read Unsupervised Noun Extraction in NLP Unsupervised noun extraction is a technique in Natural Language Processing (NLP) used to identify and extract nouns from text without relying on labelled training data. Instead, it leverages statistical and linguistic patterns to detect noun phrases. This approach is particularly valuable for proces 11 min read Like