0% found this document useful (0 votes)
10 views

NLPNOTES

Uploaded by

MohitKhemka
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

NLPNOTES

Uploaded by

MohitKhemka
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Natural Language ocessing 20it

Natural Language Processing or NlP velers to

the banch 'of Avtificial ntelhgence that gves fhe


yoY
machines the a biltts to redd, underStand and

devive meovlng rom human larquages sirh


NLP Combnes the fie ld of knguisties and Computtv
SCclence to deciphe bnguage structure and guldelnes

and to make models which can Com pre herid; bréaddo


wn and Separat slqni fcontto:ls ypm tub 2 Spech,
Computt
SCtenC Cono ponent of
NLP
byioy2
NLU
Hurman iAl ofysm2) NLG
2 u anguae
2 pp tog 9 209

Netural Larguoge Underst oncling nhtural Lanquage ienevotim.


NLU helps the machine to NLG acts as a trans latoy
urders tard and analuse
humanthat Converts the Computi
lonquage by extracttng the 2ed datoh natural
lanqu-
e ta data fom Contnt suchas a98 re prejento fiom. ît
invo lies
ConCepts, entittes, keytwor dsTe dt plainng, Senttnce
mot on, velat ons, Semantic Yoles planning and Text RealiZatom
NLU the process of reoding2 NLG S the
procesS of
Ond ntypve tng language wrting o qeneroting language.
APpkcatioms of NLPspriazs3o pupin
auestiom Answe nq EXAlexap ob
2 Spom peteottbr ExirSpom mall oletrctioy
3 Sentiment AnalySS EM- Delicions food
(tve
Unhappy with 0er (-ve)
Hochine Tranglatiby
E-Google TYanslaton (trat w

S:Speltng, Covre cti m e ExGvammerly Speech


G: Chat bot/eno)o tcsstomer
Suppovt
nfomatlibn Extractlon E Resune ATs
NLP Ppelines
Steps InVolved-
Sentonce Seqnentatitn:- Serrtcnde Seqmentatim s

tUa
to breaks the paraqyaph int Seperate Senttn ces.
EX A bo plauing icket atch staytd at 1bAM.
rdor909
He S
otol 2 7 Soo
9 20 tredo in or 9cvD 2

41ty.SS A bou0S plau/ng Cicke b0 rat2


-ro faiori Matsh Stay trd at 1b AH.x po
l ov f?-0Sy 4eS Soo tved2 Ttro) ovev ohob
2 Wlovd 1okenl Zation lord To kenzatlon s used to
olhnsbreoak the Sen tende mto sepevale wod oV tokens-
S
o 22930r 9r12 S h o i t 225307
ToEeni Zey gerafotes fhe follooing resu lt
The starsS are tuunklingatight

The Stovs aYe tuxnt at gbt


Each Wlord s Called a tokon.

Rernoving Stoplords n Eqtsh, theve ave a lot

o words fhat appear Very frequert ly tke "is", and


wods
thea NLP pipelies wll flag these
as

be filkved out befove


Stop LodS Stop words mght
StaHsilcal anolySiSs
efl
dong any
at night
The stars are tuinkling
2 2
910 9
Stars tuink nq night ol
4Steromings- Stemrong is used to novmali Ze words

nto HBmse form or root Form. For exarmple,


Celebrotes, cdebro ttd ad Cele bratng, all these ods
4Te Oranatd wltho a s/ngle root wordcelë bvo te
th Sternmi ng is that Sometes i t
he big problem wi

PYroduCes the root Loord whch may not have any

meaning n t ellh genc Intrlkgen


o intellugent ntellgen Stt
Snteltgently - nklkgen
SEipingc Spting SEiP
Stip
Stm
SKiped Skip4ed SKP

sLemmat2atlom,-temmat 2ationis qut f Simlay.

to the stamming 3tts used to qYoup dittevent

inflecttd foms of the wovd Called Lemma The

ma'n dffevence blw Semmlng& leromalilta tion is th

it produces the root-wrd, whlch. hasa meaning.


2Dr0

EXintctJigence ntelkgent) Lemm g


inteltgent antellg ent t
Tntelttgently ntelltgent which hos

DependenCy ParSiviq Dependency PayStngs used to


fnd that how all the twovds 'n the Sentnce are
relattd to each other
951m0 ob)N 21 ptyoy poio
Povt ot Speech Tagging: Now, ue must expoin The
f

bnCept of: nouns, Vverbs avticles .and othey povts f


Speech tp the moch:ne by adding these. tags to ouY
woYds2 1o, 2 otonf2 lkang pe
DetvmintYsloun VerbAdecttuePrepeston| ]Nou

the Stas ave twmking at imgh


8:Named Entity Rec09nition (WER) #-
Mamed Entity Recoonition Is the process of
armed of
the ramed entity Such as person rame movie
Hectng gioytr
hame, Ovqani tatlon ame OY lbcatlon.

FX Steve Jobs inthroduced iPhone

The
q.Chun Rth9 Chunkig is used to Collect
irem
individual pece of infmotlon and 9Ybuping
yrtiluboV Yo Yd
nto blggey pleces of SentnCes

Data'set Test-21wovds
preproce smg preproeSing vectovS
To Ken 7atlon Ostmmng
u ORo
touerng the Case Lemmatt iation TE 10OF
woids O Stopuods ud 2Vec
Baslc Atrmno loces uofd in NLP
COR PUS Poraqraph
POCume nts SentnG
Vo Cobulary Ontqw wods po

lord WoYd
Bag of blords (Bol);
2293 3T
The Baq of lords (Bohl) mode 1sa
repre Sentat'on
epresentait
i no297 20 2
that tuvns avb hvavy text into ted-lergth vectmi
dlfond YO
DS
yCounting hoo many times each lo0Yd appears

ThiS proCess is of en refevred to as vectotattion.


holla) bs2
Rag of blbrds wTYKS on two Tbings

A knbwn word VoCabulary


29%cmo2 ho 25)sk7 yhpok
2 A meaNure of hoo many Known wods are present

The moel doeS not (onSieY the ovder Stuchure oY

LId's OY the infom atin present in t al thot


26 S dCaxYded. The model omly deals with whethey
ndfo re33
known words oC ur in the doCument, even Wilhut
60
ConS'devin9 uhere in the dbument

Steps
IData Collection: Comsider 3 ltnes of teut as, a
Se perate doCument which needs to be Cc tol Zel.

O the dog at
the dog Sat in the hat
el
the doq with the hat.
.Detcmne
2 Detc the VoCabulayg.gngofotdsy a
all the words
Vocobuhry s defined as the Set, of
the doCuments the uw0cls in the doument
Purnd in
with
abovethe, dog, Sab, in, the, hat,
3-CUuntin9 The Vectorltatin process tnvolves Counting
the number of tlmeS each wor d appears.

PoCument the dog Sat th hat wth

The dog Sat

The dog Sat in the hat 1

The dog witb the hat

This generatesa 6 lenqth vecto foY each doGument


As you Can See, the bow vecto only Contans info

a but what wovds OCCurE and h o many times wi thout

Contrxtual tnformatim or where They oCuv

4Hanaging Vora bulary:


frorn prev. example, as
oCobu lay 9Tous
AS we Can See
doCuments also grows.
The vecho repre'sentation in the
This means that for very larqe O u ments, books
The ve cky length Can stretth up o thousands of poston
Contain a fco known
ince each doCu ment Can also
Zevos
Loords, that cveatt a lot of empty lofs Wth
Y9
Called a Spavse vectuv Doh92 2 7 . 2 - 3

33hys2 22
2T
kle use chtaCleoning methodS to heduo the '
Ste
The Vocabulary Ths includes gnong cos
punctuatt on, finng Ynsspelt Loovds, gnovlStbp
ng Stootoor
S Stovinq wovds Stoing hu wods s ply atte
Simply attachi
a tho
numeviCal valu to maYk o CCuveng&ot tfho
uwbrds înabbve example, Scoving Oas onary
Me sena Y ab SenCe Of wOTds fr
ofhey SCovng methols ncluc
.COunts: this is to Count eveny time The Loord Oppear
in The doCument

requencies: CalCu late tho. freque nCu Of The twods


) O 0 9 101
a doCument in Contiost to the total
udioJ 2 words
yro
n, The dolument hvo to
pisadvon tages ssw o Padvantoge s huo
Spar sity OStmple 2tuitiw.
Ovdeving. Of or oi s t o t vau
Semontl.c mêaning not able to Caprune i3
'N-qrams sprol yrv osh p09 2

N-qYam iS a Sequend of the N-words in the


Jeabob h09 90
modeling of NLP )a3in rD
Jol oo/ ov,2hrop
OngYarn or One qram-There iS a one-twOYd Seouent
EY-ThisS 1S a SenttnC This sa, Sevrknce
bl-Qram bY two qYaml- Two-Word Se nCe

Th:S s a Sen tnG i s S, 1s a,a Sentnc,

TY-q1am or
thre-Q Thre-wOrd Sequgna

oxt ThiSS a Sen tent This s a,is a Serntnc


e Can CalCilate N-granms
Sorme way
translatlonet
Applica tiorSpeech YeCoqnitim machine
enhofontroyo uon
9oht-21
T-IDF

TP-DF Stords f 1erm hvequenCynverse DoCutnt

frequency, and The tf-idf weiq ht is a e'ght often


text
used in tn formatin retrieval and erpuS mining mt
b No
this welqht s a stati: stcal meaSure used to evaluate
a
hau tmportant a a doCumenttn
word s to

Cor pus The impYtanG nre a8es propoYt-


collectton or

humber 6f times a wovd appeors in


onally to the
s offSet by the freque ny of
the doCument out
rob2) zfe gos
The worde in the Corpug2
are
of the tf-idf Loelghting SCheme
Varia tions
toolin
of ttn uSted by Search engines as a Central

SCoring and ranktnga cbaument'S velevanG gien


SCoring
a usey query2
16-10f an be SucCe sSfully used for Stop-twods
fling tn var bus Subject fie lods
tncludi n
tnt SummaY ta tion and Classift Calion

TF-Tevm Fequeny ulhich m eosuves hoo f


afrm OCCurt n a dolumêht Y
frecquer
T TE NoD- times termtappers in a doCume
Total ho- ofttms n The dolument
PF- Inverse Doumennt reguenCy nlhich mêasum
houd impovtant a ttrm iS. While lomputi9
all tevns ane
Consi dered equally im pr tant:Htoee.
It is kn0von That ceytatn
ttrms, Such as S, be
eA 2
thot may appeaa lot of tmes but have
xgbriov9hyfr
have
diormos
ttle
lttl
mpbr tanC. Thus loe need to
ueigh doon The
frequent Herms while Scale. up fhe Yave ones

IDf log, Potal n0:Of documents)


Nb: Df
doCurnentt wiTh
Hrm tn
911
it
Exomple l-lets ConstdeY 3 SenttnCes
(doCuments)
Good boy Here Bo Shauld be qven more
Good g mpor ane o ue'ght then god,Sine E
frequant
th fhe Copus)
Aind the voCabu la in the Sentence S
and
-1 t b20 plut2es)bw2 9d
TP pirdoX 2 1OF
Sent1 Sent2 Sent3
word S 3OF
go0d
gpod loge)o
boy 0 '3 boy 1oge(h)
gw toe (h)
Nou Hulttply 1F,and ADFor
to obtain 7-aDE Leigt
good boy 81g tals 2
Sent 1

Senb2 OTtolr0ptng loget)


senb 3
6Vlae()"Y%lye()
Advantoges bonti oSadantagestoodolla
ntultie i o t SpovSity e
2 wlord 1mportanda Dou2Out VoCobuanbdo
vo of,
Sgetting Capture
Word Em beddingsoto nob ot 9 boT
of word veprUen ta.
hoTd Embe cings are type
à

Cion that allnws wods wlth imtar mearnings fo


have a Sim;lar vepre sentationd gong 9uob uel
Enbe ctings tvanslot lav9e sparse Vec tors nto a

oldedmenSomal S2G that preserves emattle Yeloton


Ships.
hlord2Vec:
a neuyal hetwork model to leorn
loc2vec Uses txt Onde'
trom a arge Copusof
ord atsocatlons dekt syronym Uus
Can
hontd. Such a model for a portiel
est adoi tional words
d DY Sug9
imples word2 vec Conveyh wod
Sentne: As naru
Such that they (aptur
The vect are choen Carehully as

syntactic q ualtes of luords, such


the Seantc and
mathe rna tial tuntton[osie Sirolority]
a' Srmple 6l
the level of Sergn tic siroilbrlty
Can iati a te
oods Yepresent d by thote vec tors.
the
(osine Siroilovtty:
Points and find the Cosine angle
letls Cons'der 2
beten them.
A- B
CoSO =

’so the ponts are o29 (212) S0-29

If O:P , then point are bighly Smibr stane=1


Abrd2vec Can ut:lze either ot to model archi
these hsttbu td vepresentatms
to podu
Continuous. Rag-ot-werds (CBow)
1
2. SEip-gram sshiat
|-Continurus Bag cf ods cBow):
th.
Th:s method takey the Conteyt of eath wod as
input ors ties to pretct the word Comespocing
to the C o n t n t . s o n t r e
the entocarenh b
EXornple - Let's Consder
'odvec has áokp leavnlng mode| wotn9 in Thy
baclcend

’Lets nSdey Context atndow SiZe =2. we ll have


poivs like
[uord2Vec,a) has
Consicer'ng mdk
Valw In The Yange as
(hos,deep.] 9209
output and other
[deep, model] leoning words cs inuts
[lenvining,wotingl mode ! ’ so, e ernd upcreathg
pat's in a Sentnd.
nputs ledm outputs
’ e wl use these inouts 2 output dota jmto a
Deep leaning odel (Neural Nehust) to prett
Hte torget uords based on the Content worcde.
the input to the hen layer (aewal nehoork),ue
use p0g of words to Convert the word into vec te

deep"wod ’ to vet ’0001 D00 00Ot


leorning o0 00100066

hdden (ayer
byeoDl

outpet

windnwsi2e

frst
The an ext oods ane passed as an input to an
enbe ddling layer (nitahed xth Sone Yanon ueight)
’The oor d embedd ings are then pased to a lambda
layer tohee we average out thu togd embeddings.
7We then pas the se emhecengs to a deníe
othman loayer thatpredtcts our targ et lod le
match th s with buy foraet lood and Comput the
Ioss ond then we perfurm backpropaga tlon
toch epch to updat the ernteoling layer in t.
proa Ss.
’ lale Con extract out fhe erobeoblngs of the
netded tubrds rn bur tmbeilrng ayt, ona Th
otning is Comple itl.
2- Skp-gram:
Stp-grom is just the reverse pro (ess of CRoA.
where a made lisqlven a torget (Centre) 00rd as
input and Contct Words ae predtcttd.
’ from he previous exanple
Snput utput
has

teaming (deep, rmode1]


-’ Since the sEp-qrarn oodel has to predhct multp
twords brorm a Ssngle tooyd .Ths is done by
is (reatng Positive ijp Sompes &negotve }p Sorper.
’ Postive,p Sorgples will have trang da ta in
hs furn (ctayget, Cun et)) where targt the
Cerntre wCr d, Contxt repre gentt Contet Wor ds
and tabel1 ntots i so relevant patr.
>Nlegall ip sarples wll ha halring dota
the

an irrekvant par.
y We Can use deep heural netuorks, for traininq
the hidden layer of the koodel in wod2vec.o

& Rut RNN 1s One of the best use Case foy


NLP model Also there are Sorne (hottations in RNN.
&HereComéS LsTM RNN to overcome thiKttatons
and helps the model to remermber thu previous
LDo1ds ohich helps to preohct the fu ture outpt
bosed on the Contot of previous so
Wotd

RNN, LSTM, both are Cove red in Deep learnin9.


please re fer there
*After Completing LSTH in deep-learing ,due to
Some daLobacks we, ane movng here atht
-directl onal LsTH.
* LSTM; weve failing whern to trys to get
Context fror the future woc.
EX Ravi likes to eat, n Hyberabad.
Here the blank Should be filled bastd on future
oord Context (Hyberabod ).So, Biryoni is famous
in bad· LSTH Carnlt pred|t this types.
Recunent Neural netuork(RNN)
RNN Lrks on the pinct ple o Sowng Thu utput
portiular layev and feedtna this knck to the
of a
tnput in oYder to predtct the outpt of the layer.
t+). yt)
A

hlt-1) hlt) htu)


J

x(t)
x(t)
Hully Conrechd Recunent Netral netant
| A,8 C are the pavamete vs of the Netuork.
Aere'x is the inpt hyer, 'h' ts the hidden
layer, and yis the output hyer n,gc ore the
netork, arameBers used to nprove fhe output of
the model. At any qiven time t, tho Cuent inpt
s a Cormblnaton of input at x(t) and x/t-1).
The output at any qlven time is fetched bock ts
the net0ork to improve 0n fhe outpt
h(t)-’ new state
ht): (, (ht-),x(t) f. function wtth
parametr C:
RNNIS nlt-)’old state
x(t)’ ilp vector at
RninN uoeYe (reated because thev Eimg storm t
Were a few issues in the
feed- forword neural etoork (ANNY).
’ Cannot handle Sequettal data
Considevs only the currernt inpt.
-’ Cannot nenori 2e pre vious inputs
>The solution to these (ses (s RNN. which Can
handle sequenttal dota, accep trng the urent ip
data and previously recelved ps and also, Can.
memol2e peisus ips due to thety inttmal emony
* Hou) does RlN ork?
nRrN,the inforrn atlon Cucles Thrpugh 4 lbop th
the middle hidden layer
’he input lauer 'x' takes in the input to fhe
neural netuork and proeSses Tt and passes it
onfo the miode layey:end mo
’ The rocdle laueyh Car
Corsst of trultlple td
layers each with tts Don activotton fun
ctlons,
elahts and Bases
’ The RNN Lxll Standaydi Ze the
different
actvatlon functions and wetoh ts,blase s so Thot ech
bdden layer has the Sarme parame ters. Then Inteod
of crating trultple bdcen
loyers, !t ill Creat bre
and lbop over it as
many times as required.

1Smage
2
Captining 27s62r3
Natural Lanquogt
Processingciectsr
3:Tme Series prechcti or u
Types of RNN
L One to One
RN op
ingle.input andb t
Single output
E0-Srage Classifcaton.
Q. One to
t Tony RNN:
ing inpt arnd multple
outputsnl hetst
x îmage option.
3- Many to One RNN:-
RNN takes a sequione of
inputs and gerera ks Sirgle
out puttnd
EX!- Sentiment aralsis (which
ip
ta kes mony inputt and t (it
4 Mony to Hany RNN whe ther Ave) or (eve) Senttement

RrNN takes sequene of inpub


and qeneToats a sequen of
Outputs b
EX-Machtne tronslation.
ISSues of Stondard RNNE
1Vonishng Grodtent Problen
RNN Suftey frorm the probtrn of Vanishing gadienfs
he grodients ary informa ti on used in the RNN,
ord when the qrodtent beComes too Small, the
Pevdnetr updates beCome nstonif Cant Tbls mafes
ne teornng oF long da ta, Sequenas chfficalt:
-los of in forrmatton through tme
9Explodng Gradent Proslermt
For CetimeS, t only heed to look at eCernt
infomaBon to perform the present tosk
NLP EY lhen oe tny to predict the hst toord
the clouds ave in the
’RNN Can able to preckc the word as 'sky'. Sie.
the qap betoeevy the relavant in forma ieo and
the plac that it's needed is
Small RNN Con
leavn to se the past tnforna tton.
ho

’But there are also Cases where we need ore


tontyt.
NLP EX: Let's tring to preict thu ast wort nin
The txt î qrew uup in frong. îSSpeak fhet
french
The Detert nforratin the next
looNd prkably Thv
narne of a lanquage, but f ux wnnt which
Mar
bnque, then we nee d the Context oft 'Fronce
4 from
prevous Sentno. Heve gap betueen Yelavent
infgta hon an the place it it neeckd is Vey arge
Unfortunatcly,as hat gap qrows RNNs be Come
Anable to learn to Connect the intomatisn.

ho he (h

Veng lange
aap

’Here Cormes fhe leqend. LSTM RNN.


LSTM-RNN
Tlermory Recurfent Neural Netarks
Long Shot Terrm
uSualy called "LSTMS" ave a Special khd of RNN
Ca poble of learning long- term dependenCies.
-’LSTMS haye ability of remerherino information tr
long peviods ot tine.
- All RNNs have the form of a thain of repeating
modules of neuwal network în stancard RNNs
the vepeating nodlak will be a single tonh layer.
(ht)

Hanh A

RNN
L
’LSTHS also have ths chaln l(ke Structu
Ahe epeatng modle has adtkrent struchuy
there are fou ,intrachng in a veng pecial tia

menong Cell

tomh)
A A

LSTM RNN
()
Notators!obcoe
Neuval Netuek posntu'sr Vectr Cona tonate
Copy
layer operation
’ nth absve daqyam, each Ine conies an erentre.
tor, fror the olp of one node to the inputs of othe
Vec o,
Stp-by-step LsTm lalk Thvugbe
OThe first Step in ur lST s to de cide tohat
intor mati bn oeve obing to thyuw Quny from the cell
State. This dect stom is tmade, by a Sgmod loyer
called the fovqet qate loyeri St loors at he
and and outputy a nmber between o and !
(sqroid)) for each humber in the Cell stat Ctr
1 ompktly keep thi' 0-"Cormpletly grt tidth:s"of
EX- Lets Cons cer NLP example,
st may in luce qencer of the
hl present subj eet - When-we See a
new Subject e wont to forget
forget:gate gendev (old Subject).

f (We [he i]tbe)


D The. nent stp is to decde
d what hew nfovnatton
going to Stove in the cell stat. This has
to partt. A Siqmoid layer alled nput gate layer'
dect des which valus well upda e Nent, a tanh layer
p ,

creates a vector of new Condda te vaVa lus, , that


Could be adde d to the Stat. în nent stee, oe'
Combine these tuoo to Crea te an upda te to the State.
ti bt EX- TE oe wont to add The
a gendey ot the neo Ssubject to
he ell sta te, to replae the old
one oe're forgettl'ng
h
( Noto, we oill upoate the Ola oll state, Cu Into th
new cell State Cq he multipy the old state
lorgettng the hings we decide to forqel eorbey
then we add i, x, This is te new Cardidoi
Valus, Sta led by how much we dedide d to updok
each Statk va lue.

Ewhere we'd actually drop the into oH Subect(aen


and add the hew intomation as we dect ded prev.
UFiralty, we ned to decide chat we're qoing to Cutpat
This output wll be based on burcell state, bt, wll e
a fltered Ver on.
First, o Yun. a Sigmolcd layer which
decides what ports of the cell Stat
ere goirng ro
output. Then, we put the cell state tnch tanh
(to puth Values totuYen -l&)and
o fhe Sigmoid qate,so we multiply tt by
onls Olp the pavts we
decidedta

hy O,xtonh(Ce)
EX! NLP Tohn payed tremendbus ly tuell and
LOon fo his tean for hs Contributlon S, brae
was aoYde dplayer of the matth.
’Theve ould be many Choices for ernphy spae.
The Cuent i/p orave is adjective and adjechive
descrl bes a noun(John). So, Tohn' Could be fhe
best outpt af ter bravy.

You might also like