0% found this document useful (0 votes)
51 views

Lecture6 Live Notes

1. Naive Bayes classification uses event models like multivariate Bernoulli or multinomial models to classify documents. Laplace smoothing is used to avoid zero probabilities. 2. Kernel methods can be used to transform data into a higher dimensional space to allow for nonlinear classification. Common kernels include linear, polynomial, and Gaussian kernels. 3. Gradient descent can be used to optimize kernel method parameters by iteratively updating the parameters in the direction of the negative gradient to minimize a loss function. Representing the parameters as a linear combination of basis functions allows efficient computation.

Uploaded by

Mukul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views

Lecture6 Live Notes

1. Naive Bayes classification uses event models like multivariate Bernoulli or multinomial models to classify documents. Laplace smoothing is used to avoid zero probabilities. 2. Kernel methods can be used to transform data into a higher dimensional space to allow for nonlinear classification. Common kernels include linear, polynomial, and Gaussian kernels. 3. Gradient descent can be used to optimize kernel method parameters by iteratively updating the parameters in the direction of the negative gradient to minimize a loss function. Representing the parameters as a linear combination of basis functions allows efficient computation.

Uploaded by

Mukul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Outline

Naive Bayes
Laplacesmoothing
Event Models

Kernel Methods

Recap
aardvark
x i
buy n examples

1
Xj word appears an email

Generative Model
not spam
PCxly ply y g spam
p x1y
If p x ly

Parameters add kio


Oily PIX 1 9 1

9s y o PIX I
y o

Oy Ply s
Joint Likelihood

L Oy Guy I P x y Oy Aly
MLE

ÉIY
one.si
EEiiI
Prediction Guys
Ply I a
Play 1 1Ply 1
Play 1 Ply pculyad.PH
djly
COVIDj
X1273127
119 P 1 0127314
1
g
PCX12 3 1 9 0 4127314 0
g
PCaly 1
II PCxsly
msn.se
1 ET0 TO
9127319 0
Wgn
Wakeforest
Eation
Arizona 8
Oklahoma

PA 1
tI o's I
4 Iz I
O

Laplace Smoothing I s I
200
s 1
Xie I V13
Size 2400 feet 400 800 800 1200 71200
X I 2 3 4
P x1 9
I PMIudhomed
us bernoulli
aardvark
X account 800
Ig bank 1600

beneficiary

Xie O I
bank account bank

XE
e Ird
Egg

X E L V1 01 10,000

di length of email i
Multivariate Bernoulli event model
Multinomial event model

Play Poly ply

II
assume paly PCR ly
T
X E 1 01
Parameters

4g Ply 1

Ok ly o PEX K y o
Chance that word is Kth word in
j dictionary
o
if y
meow

i
Laplace Smoothing 1 to numerator
TVI to denominator
10,000

Map rare words to dunk

Mortgage
mortgage
funk
spoofed headers

fetching URL

kernel Methods

ER

m models o

ho n 03N't 0222 0 n Oo

Ole Q R 1124

hocus Oo Or Or O Odin
É
o
I
f
ho x linear in O 4in
la y Cnn yet am y

I
Olam y Colney y OCaml g
cable polynomial for old dataset
linear on new dataset

LMS on new dataset

MI I É Cy OTP na

Gradient Descent
Loop O O a g 074 xd Oca
ERP ERP
y Olap
Terminology
Rd 112 feature map
attributes features
x attributes
06 features

What to do f p
is
very large
d I cubic polynomial
fo
or pin
Id Jd I T X T Xz
Xi
T X X
3
Yi X'd I
t X X Xk

Xix Xk d

Problem 4 a high dimensional


is

p It did't d old
103
d pm 109
O O a
É ga O'd un 0124

Runtime 1 iteration of GD
for is
Cnp
key observation
If O initialized at O
then at
any time O can be written as
O Eh Bi 012 for some Bi Pn GR
E IRP EIN
Proof observation
of iterations
By induction on

Base Case iteration 0


0 0
En O un
e
p
Assume at

Next iteration
iteration t O
E 13 41am

O Ot 4 E Cy OTO a Ocu

EiB I
Oca's

Of IRP E IR
Nang represent by B
n
p param param
Bi pit 2 ye O'd na
pita ye E B 0 ad x

I dead
a ad
pit ye p

I
Oni O ta's can be precomputed
L nil nm can often be computed
much without
faster
explicitly computing 0C

You might also like