0% found this document useful (0 votes)
2 views

Named_Entity_Recognition(LAbsheet-07).ipynb (20221CSE0413)- Colab

The document outlines the installation and usage of the spaCy library for natural language processing in Python. It demonstrates how to extract named entities from text, including examples of entities like organizations, dates, and monetary values. Additionally, it includes a function to find unique named entities from a given document.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Named_Entity_Recognition(LAbsheet-07).ipynb (20221CSE0413)- Colab

The document outlines the installation and usage of the spaCy library for natural language processing in Python. It demonstrates how to extract named entities from text, including examples of entities like organizations, dates, and monetary values. Additionally, it includes a function to find unique named entities from a given document.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

20221CSE0413-SHARATH PATIL

!pip install spacy

Requirement already satisfied: spacy in /usr/local/lib/python3.11/dist-packages (3.8.5)


Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.11 in /usr/local/lib/python3.11/dist-packages (from spacy) (3.0.12)
Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in /usr/local/lib/python3.11/dist-packages (from spacy) (1.0.5)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /usr/local/lib/python3.11/dist-packages (from spacy) (1.0.12)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /usr/local/lib/python3.11/dist-packages (from spacy) (2.0.11)
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /usr/local/lib/python3.11/dist-packages (from spacy) (3.0.9)
Requirement already satisfied: thinc<8.4.0,>=8.3.4 in /usr/local/lib/python3.11/dist-packages (from spacy) (8.3.4)
Requirement already satisfied: wasabi<1.2.0,>=0.9.1 in /usr/local/lib/python3.11/dist-packages (from spacy) (1.1.3)
Requirement already satisfied: srsly<3.0.0,>=2.4.3 in /usr/local/lib/python3.11/dist-packages (from spacy) (2.5.1)
Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in /usr/local/lib/python3.11/dist-packages (from spacy) (2.0.10)
Requirement already satisfied: weasel<0.5.0,>=0.1.0 in /usr/local/lib/python3.11/dist-packages (from spacy) (0.4.1)
Requirement already satisfied: typer<1.0.0,>=0.3.0 in /usr/local/lib/python3.11/dist-packages (from spacy) (0.15.2)
Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /usr/local/lib/python3.11/dist-packages (from spacy) (4.67.1)
Requirement already satisfied: numpy>=1.19.0 in /usr/local/lib/python3.11/dist-packages (from spacy) (2.0.2)
Requirement already satisfied: requests<3.0.0,>=2.13.0 in /usr/local/lib/python3.11/dist-packages (from spacy) (2.32.3)
Requirement already satisfied: pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4 in /usr/local/lib/python3.11/dist-packages (from spacy) (2.11.2
Requirement already satisfied: jinja2 in /usr/local/lib/python3.11/dist-packages (from spacy) (3.1.6)
Requirement already satisfied: setuptools in /usr/local/lib/python3.11/dist-packages (from spacy) (75.2.0)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.11/dist-packages (from spacy) (24.2)
Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in /usr/local/lib/python3.11/dist-packages (from spacy) (3.5.0)
Requirement already satisfied: language-data>=1.2 in /usr/local/lib/python3.11/dist-packages (from langcodes<4.0.0,>=3.2.0->spacy)
Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.11/dist-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,
Requirement already satisfied: pydantic-core==2.33.1 in /usr/local/lib/python3.11/dist-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>
Requirement already satisfied: typing-extensions>=4.12.2 in /usr/local/lib/python3.11/dist-packages (from pydantic!=1.8,!=1.8.1,<3.0
Requirement already satisfied: typing-inspection>=0.4.0 in /usr/local/lib/python3.11/dist-packages (from pydantic!=1.8,!=1.8.1,<3.0
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.11/dist-packages (from requests<3.0.0,>=2.13.0->sp
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.11/dist-packages (from requests<3.0.0,>=2.13.0->spacy) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.11/dist-packages (from requests<3.0.0,>=2.13.0->spacy)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.11/dist-packages (from requests<3.0.0,>=2.13.0->spacy)
Requirement already satisfied: blis<1.3.0,>=1.2.0 in /usr/local/lib/python3.11/dist-packages (from thinc<8.4.0,>=8.3.4->spacy) (1.2
Requirement already satisfied: confection<1.0.0,>=0.0.1 in /usr/local/lib/python3.11/dist-packages (from thinc<8.4.0,>=8.3.4->spacy
Requirement already satisfied: click>=8.0.0 in /usr/local/lib/python3.11/dist-packages (from typer<1.0.0,>=0.3.0->spacy) (8.1.8)
Requirement already satisfied: shellingham>=1.3.0 in /usr/local/lib/python3.11/dist-packages (from typer<1.0.0,>=0.3.0->spacy) (1.5
Requirement already satisfied: rich>=10.11.0 in /usr/local/lib/python3.11/dist-packages (from typer<1.0.0,>=0.3.0->spacy) (13.9.4)
Requirement already satisfied: cloudpathlib<1.0.0,>=0.7.0 in /usr/local/lib/python3.11/dist-packages (from weasel<0.5.0,>=0.1.0->spa
Requirement already satisfied: smart-open<8.0.0,>=5.2.1 in /usr/local/lib/python3.11/dist-packages (from weasel<0.5.0,>=0.1.0->spacy
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.11/dist-packages (from jinja2->spacy) (3.0.2)
Requirement already satisfied: marisa-trie>=1.1.0 in /usr/local/lib/python3.11/dist-packages (from language-data>=1.2->langcodes<4.0
Requirement already satisfied: markdown-it-py>=2.2.0 in /usr/local/lib/python3.11/dist-packages (from rich>=10.11.0->typer<1.0.0,>=0
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /usr/local/lib/python3.11/dist-packages (from rich>=10.11.0->typer<1.0.0,>
Requirement already satisfied: wrapt in /usr/local/lib/python3.11/dist-packages (from smart-open<8.0.0,>=5.2.1->weasel<0.5.0,>=0.1.0
Requirement already satisfied: mdurl~=0.1 in /usr/local/lib/python3.11/dist-packages (from markdown-it-py>=2.2.0->rich>=10.11.0->typ

 

import spacy
#from spacy import displacy
from collections import Counter
nlp = spacy.load("en_core_web_sm")

sample = "European authorities fined Google a record $5.1 billion on Wednesday for abusing its power in the mobile phone market and order
doc = nlp(sample)

for entity in doc.ents:


print(entity.text + ' - ' + entity.label_ + ' - ' + str(spacy.explain(entity.label_)))

European - NORP - Nationalities or religious or political groups


Google - ORG - Companies, agencies, institutions, etc.
$5.1 billion - MONEY - Monetary values, including unit
Wednesday - DATE - Absolute or relative dates or periods

def getTextFromFile(fileName):
with(open(fileName, 'r') as fp):
return fp.read()

fileName = "File17.txt"
text = getTextFromFile(fileName)
#print(len(text.split(" ")))

doc = nlp(text)
for entity in doc.ents:
print(entity.text + " - " + entity.label_ + " - " + str(spacy.explain(entity.label_)))

the Oxford English Dictionary - ORG - Companies, agencies, institutions, etc.


2009 - DATE - Absolute or relative dates or periods
India - GPE - Countries, cities, states
the Classical Latin India - ORG - Companies, agencies, institutions, etc.
South Asia - LOC - Non-GPE locations, mountain ranges, bodies of water
India - GPE - Countries, cities, states
Hellenistic Greek India - ORG - Companies, agencies, institutions, etc.
Ancient Greek Indos - ORG - Companies, agencies, institutions, etc.
Ἰνδός - PERSON - People, including fictional
Persian Hindush - PERSON - People, including fictional
the Achaemenid Empire - GPE - Countries, cities, states
the Sanskrit Sindhu - ORG - Companies, agencies, institutions, etc.
the Indus River - LOC - Non-GPE locations, mountain ranges, bodies of water
The Ancient Greeks - WORK_OF_ART - Titles of books, songs, etc.
Indians - NORP - Nationalities or religious or political groups
Indoi - NORP - Nationalities or religious or political groups
Bharat - NORP - Nationalities or religious or political groups
Bhārat - NORP - Nationalities or religious or political groups
Indian - NORP - Nationalities or religious or political groups
the Constitution of India,[75][76 - LAW - Named documents made into laws.
Indian - NORP - Nationalities or religious or political groups
Bharatavarsha - GPE - Countries, cities, states
North India,[77][78 - GPE - Countries, cities, states
Bharat - ORG - Companies, agencies, institutions, etc.
the mid-19th century - DATE - Absolute or relative dates or periods
Middle Persian - NORP - Nationalities or religious or political groups
India - GPE - Countries, cities, states
13th - ORDINAL - "first", "second", etc.
the Mughal Empire - ORG - Companies, agencies, institutions, etc.
Hindustan - ORG - Companies, agencies, institutions, etc.
Indian - NORP - Nationalities or religious or political groups
India - GPE - Countries, cities, states
Pakistan - GPE - Countries, cities, states
India - GPE - Countries, cities, states
2000–500 - CARDINAL - Numerals that do not fall under another type
BCE - ORG - Companies, agencies, institutions, etc.
Chalcolithic - PERSON - People, including fictional
Hinduism,[88 - NORP - Nationalities or religious or political groups
Vedic - PRODUCT - Objects, vehicles, foods, etc. (not services)
Punjab - NORP - Nationalities or religious or political groups
Gangetic - NORP - Nationalities or religious or political groups
Indo-Aryan - ORG - Companies, agencies, institutions, etc.
the Deccan Plateau - LOC - Non-GPE locations, mountain ranges, bodies of water
South India - LOC - Non-GPE locations, mountain ranges, bodies of water

print(len(doc.ents))

44

def findOutUniqueNamedEntities(doc):
entities = set()
for entity in doc.ents:
element = entity.text.upper() + " - " + entity.label_
entities.add(element)
return entities

print(len(findOutUniqueNamedEntities(doc)))

38

You might also like