Bishop CH 3 Notes
Bishop CH 3 Notes
bCt(x ,
w, B) =
X(t/y(x , w) ,
B-)
-Ien() -
itn-yexn w} .
-
loss function
setting the
gradient to zero for finding wi ,
x en [p(tX ,
w , B)J =
M-2x)]+
1-8t/max-likelihood parameter
I
% & M-(X)
I
=
bo(x ) , 4 , (x1) . . . .
: : :
%o(Xw) & . CXw) .... m (XN)
E
Ew pi(x) - compensates for diffe in ang .
wo = -
E =
with ; x =
12 ,
$ixal
solving for B we
get ,
m "
2. [tn-widexn) ;
etension for multiple target variable , approach would
be same ,
just t replaced by
will be the Matrix T .
Wm= (II) I T .
Adding regularization
-
After adding regularization with X ,
Stn -
wid(xnl + w w
Wm =
(xI +
&Td)" !t
# Bias-Variance Decomposition
Expected squared loss is ,
↳
<((Y(x) -
n(x))2b(x)dx + f(t -
n(x))p(x t)dx ,
(y(x : D) -
hexy
NOTE :
Expected value of it is ,
Ex[4yx : D) -
h(x))2]= [Eptyx D)]-ncx)+ var(ycx : : D))
Bias variance
b(w) =
N(w/mo So) ,
b(+ (w) =
N(t/w +
(x) , B-)
Now ,
b(wIt) = b(+/w)b(w)
b(witl will also be a normal distribution .
b (WIt) =
N(w/mw SN) ,
Mx =
SN/So"mo+BIt) and Si =
So + BI
mode
For
gaussian :
mean .
Thus , MM= WMAP .
. .
and x-> 0 .
For prior =
p(w) =
N (w10 , 9-I)
likelihood would
log
The be ,
&
Sequential Learning
:
for new t
given old vector probability - target
given certain weights
I noise variance " and
,
and noise variance -
brior weights variance x-1
b(t/E ,
< , B) takes a normal form as ,
N It 1mb(x) ,
Ow(x))
04(x) = +
ex SN(X)
mi =
SNCSo"mo+BIt) Sw =
So + BId
ent Kernel
y(x , mr) =
mo(x) =
BecxSNETE- : Bd(X) Snd(Xn) tn
Thus ,
predictive mean becomes " K(X xn)tn ,
n = 1
waninga
equivalent kernel depends
enter
NOTE
a
ou
:
mean for x is a
weighted sum of
where is
weight
given
more
Covtycx ) y(x2)] ,
,
=
B. k(x , x 2)
& it also quantifies the closeness
among multiple predictions
# Bayesian Model Comparison
Ib <MiS] ? =
b(t1x D) ,
=
zb(t1x , mi , D) b(MilD
(p(D1w , Mi)b(wImi) dw
&
SPCDIW)P(W)
integral
AtmADS
Then ,
becomes , - AW post
log(or)
If the
weights has M dimensions ,
this becomes ,
and
low
increasing
likelihood .
i leads to
# Evidence Approximation
True predictive dis .
p(tIT) =
((/p(t1w B) b(wIT , , 2 , B) p(x B(T) dwddB ,
Then p(tIT) =
b(+1+ 2 B) ,
, = (p(+1w ,
B) p(WIT , , B) dw
E(MN)
(np(T1 B)
MMx NMB- ICIAl-Hen (2)e
= -
+
For maximizing w r
. t
.
.
& B ,
and
Excitn mid(xn) Y
is
B :
-
mimN
and xis an eigenvalue of Bot" ot
:
Y depends on both <& B .
We first choose & & B ,
compute
Y and iterate
accordingly .