How To Use The Bayes Net Toolbox
How To Use The Bayes Net Toolbox
HowtousetheBayesNetToolbox
HowtousetheBayesNetToolbox
Thisdocumentationwaslastupdatedon29October2007.
ClickhereforaFrenchversionofthisdocumentation(lastupdatedin2005).
Installation
CreatingyourfirstBayesnet
Creatingamodelbyhand
Loadingamodelfromafile
CreatingamodelusingaGUI
Graphvisualization
Inference
Computingmarginaldistributions
Computingjointdistributions
Soft/virtualevidence
Mostprobableexplanation
ConditionalProbabilityDistributions
Tabular(multinomial)nodes
Noisyornodes
Other(noisy)deterministicnodes
Softmax(multinomiallogit)nodes
Neuralnetworknodes
Rootnodes
Gaussiannodes
Generalizedlinearmodelnodes
Classification/regressiontreenodes
Othercontinuousdistributions
SummaryofCPDtypes
Examplemodels
Gaussianmixturemodels
PCA,ICA,andallthat
Mixturesofexperts
Hierarchicalmixturesofexperts
QMR
ConditionalGaussianmodels
Otherhybridmodels
Parameterlearning
Loadingdatafromafile
Maximumlikelihoodparameterestimationfromcompletedata
Parameterpriors
(Sequential)Bayesianparameterupdatingfromcompletedata
Maximumlikelihoodparameterestimationwithmissingvalues(EM)
Parametertying
Structurelearning
Exhaustivesearch
K2
Hillclimbing
MCMC
Activelearning
StructuralEM
Visualizingthelearnedgraphstructure
Constraintbasedmethods
Inferenceengines
Junctiontree
Variableelimination
Globalinferencemethods
Quickscore
Beliefpropagation
Sampling(MonteCarlo)
Summaryofinferenceengines
Influencediagrams/decisionmaking
DBNs,HMMs,Kalmanfiltersandallthat
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics
1/32
9/21/2016
HowtousetheBayesNetToolbox
CreatingyourfirstBayesnet
TodefineaBayesnet,youmustspecifythegraphstructureandthentheparameters.Welookateachinturn,usingasimpleexample
(adaptedfromRussellandNorvig,"ArtificialIntelligence:aModernApproach",PrenticeHall,1995,p454).
Graphstructure
Considerthefollowingnetwork.
Tospecifythisdirectedacyclicgraph(dag),wecreateanadjacencymatrix:
N=4;
dag=zeros(N,N);
C=1;S=2;R=3;W=4;
dag(C,[RS])=1;
dag(R,W)=1;
dag(S,W)=1;
Wehavenumberedthenodesasfollows:Cloudy=1,Sprinkler=2,Rain=3,WetGrass=4.Thenodesmustalwaysbenumberedin
topologicalorder,i.e.,ancestorsbeforedescendants.Foramorecomplicatedgraph,thisisalittleinconvenient:wewillseehowto
getaroundthisbelow.
InMatlab6,youcanuselogicalarraysinsteadofdoublearrays,whichare4timessmaller:
dag=false(N,N);
dag(C,[RS])=true;
...
However,somegraphfunctions(egacyclic)donotworkonlogicalarrays!
Youcanvisualizetheresultinggraphstructureusingthemethodsdiscussedbelow.FordetailsonGUIs,clickhere.
CreatingtheBayesnetshell
Inadditiontospecifyingthegraphstructure,wemustspecifythesizeandtypeofeachnode.Ifanodeisdiscrete,itssizeisthenumber
ofpossiblevalueseachnodecantakeonifanodeiscontinuous,itcanbeavector,anditssizeisthelengthofthisvector.Inthiscase,
wewillassumeallnodesarediscreteandbinary.
discrete_nodes=1:N;
node_sizes=2*ones(1,N);
Ifthenodeswerenotbinary,youcouldtypee.g.,
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics
2/32
9/21/2016
HowtousetheBayesNetToolbox
node_sizes=[4235];
meaningthatCloudyhas4possiblevalues,Sprinklerhas2possiblevalues,etc.Notethatthesearecardinalvalues,notordinal,i.e.,
theyarenotorderedinanyway,like'low','medium','high'.
WearenowreadytomaketheBayesnet:
bnet=mk_bnet(dag,node_sizes,'discrete',discrete_nodes);
Bydefault,allnodesareassumedtobediscrete,sowecanalsojustwrite
bnet=mk_bnet(dag,node_sizes);
Youmayalsospecifywhichnodeswillbeobserved.Ifyoudon'tknow,orifthisnotfixedinadvance,justusetheemptylist(the
default).
onodes=[];
bnet=mk_bnet(dag,node_sizes,'discrete',discrete_nodes,'observed',onodes);
Notethatoptionalargumentsarespecifiedusinganame/valuesyntax.ThisiscommonformanyBNTfunctions.Ingeneral,tofindout
moreaboutafunction(e.g.,whichoptionalargumentsittakes),pleaseseeitsdocumentationstringbytyping
helpmk_bnet
SeealsootherusefulMatlabtips.
Itispossibletoassociatenameswithnodes,asfollows:
bnet=mk_bnet(dag,node_sizes,'names',{'cloudy','S','R','W'},'discrete',1:4);
Youcanthenrefertoanodebyitsname:
C=bnet.names('cloudy');%bnet.namesisanassociativearray
bnet.CPD{C}=tabular_CPD(bnet,C,[0.50.5]);
Thisfeatureusesmyownassociativearrayclass.
Parameters
Amodelconsistsofthegraphstructureandtheparameters.TheparametersarerepresentedbyCPDobjects(CPD=Conditional
ProbabilityDistribution),whichdefinetheprobabilitydistributionofanodegivenitsparents.(Wewillusetheterms"node"and
"randomvariable"interchangeably.)ThesimplestkindofCPDisatable(multidimensionalarray),whichissuitablewhenallthe
nodesarediscretevalued.Notethatthediscretevaluesarenotassumedtobeorderedinanywaythatis,theyrepresentcategorical
quantities,likemaleandfemale,ratherthanordinalquantities,likelow,mediumandhigh.(WewilldiscussCPDsinmoredetail
below.)
TabularCPDs,alsocalledCPTs(conditionalprobabilitytables),arestoredasmultidimensionalarrays,wherethedimensionsare
arrangedinthesameorderasthenodes,e.g.,theCPTfornode4(WetGrass)isindexedbySprinkler(2),Rain(3)andthenWetGrass
(4)itself.Hencethechildisalwaysthelastdimension.Ifanodehasnoparents,itsCPTisacolumnvectorrepresentingitsprior.Note
thatinMatlab(unlikeC),arraysareindexedfrom1,andarelayedoutinmemorysuchthatthefirstindextogglesfastest,e.g.,theCPT
fornode4(WetGrass)isasfollows
wherewehaveusedtheconventionthatfalse==1,true==2.WecancreatethisCPTinMatlabasfollows
CPT=zeros(2,2,2);
CPT(1,1,1)=1.0;
CPT(2,1,1)=0.1;
...
Hereisaneasierway:
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics
3/32
9/21/2016
HowtousetheBayesNetToolbox
CPT=reshape([10.10.10.0100.90.90.99],[222]);
Infact,wedon'tneedtoreshapethearray,sincetheCPDconstructorwilldothatforus.Sowecanjustwrite
bnet.CPD{W}=tabular_CPD(bnet,W,'CPT',[10.10.10.0100.90.90.99]);
Theothernodesarecreatedsimilarly(usingtheoldsyntaxforoptionalparameters)
bnet.CPD{C}=tabular_CPD(bnet,C,[0.50.5]);
bnet.CPD{R}=tabular_CPD(bnet,R,[0.80.20.20.8]);
bnet.CPD{S}=tabular_CPD(bnet,S,[0.50.90.50.1]);
bnet.CPD{W}=tabular_CPD(bnet,W,[10.10.10.0100.90.90.99]);
RandomParameters
IfwedonotspecifytheCPT,randomparameterswillbecreated,i.e.,each"row"oftheCPTwillbedrawnfromtheuniform
distribution.Toensurerepeatableresults,use
rand('state',seed);
randn('state',seed);
Tocontrolthedegreeofrandomness(entropy),youcansampleeachrowoftheCPTfromaDirichlet(p,p,...)distribution.Ifp<<1,
thisencourages"deterministic"CPTs(oneentrynear1,therestnear0).Ifp=1,eachentryisdrawnfromU[0,1].Ifp>>1,theentries
willallbenear1/k,wherekisthearityofthisnode,i.e.,eachrowwillbenearlyuniform.Youcandothisasfollows,assumingthis
nodeisnumberi,andnsisthenode_sizes.
k=ns(i);
ps=parents(dag,i);
psz=prod(ns(ps));
CPT=sample_dirichlet(p*ones(1,k),psz);
bnet.CPD{i}=tabular_CPD(bnet,i,'CPT',CPT);
Loadinganetworkfromafile
IfyoualreadyhaveaBayesnetrepresentedintheXMLbasedBayesNetInterchangeFormat(BNIF)(e.g.,downloadedfromthe
BayesNetrepository),youcanconvertittoBNTformatusingtheBIFBNTJavaprogramwrittenbyKenShan.(Thisisnot
necessarilyuptodate.)
Itiscurrentlynotpossibletosave/loadaBNTmatlabobjecttofile,butthisiseasilyfixedifyoumodifyalltheconstructorsforall
theclasses(seematlabdocumentation).
CreatingamodelusingaGUI
SenthilNachimuthuhasstarted(Oct07)anopensourceGUIforBNTcalledprojenyusingJava.ThisisasuccessortoBNJ.
PhilippeLeRayhaswritten(Sep05)aBNTGUIinmatlab.
LinkStrength,apackagebyImmeEbertUphoffforvisualizingthestrengthofdependenciesbetweennodes.
Graphvisualization
Clickhereformoreinformationongraphvisualization.
Inference
HavingcreatedtheBN,wecannowuseitforinference.TherearemanydifferentalgorithmsfordoinginferenceinBayesnets,that
makedifferenttradeoffsbetweenspeed,complexity,generality,andaccuracy.BNTthereforeoffersavarietyofdifferentinference
"engines".Wewilldiscusstheseinmoredetailbelow.Fornow,wewillusethejunctiontreeengine,whichisthemotherofallexact
inferencealgorithms.Thiscanbecreatedasfollows.
engine=jtree_inf_engine(bnet);
Theotherengineshavesimilarconstructors,butmighttakeadditional,algorithmspecificparameters.Allenginesareusedinthesame
way,oncetheyhavebeencreated.Weillustratethisinthefollowingsections.
Computingmarginaldistributions
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics
4/32
9/21/2016
HowtousetheBayesNetToolbox
Supposewewanttocomputetheprobabilitythatthesprinkerwasongiventhatthegrassiswet.Theevidenceconsistsofthefactthat
W=2.Alltheothernodesarehidden(unobserved).Wecanspecifythisasfollows.
evidence=cell(1,N);
evidence{W}=2;
Weusea1Dcellarrayinsteadofavectortocopewiththefactthatnodescanbevectorsofdifferentlengths.Inaddition,thevalue[]
canbeusedtodenote'noevidence',insteadofhavingtospecifytheobservationpatternasaseparateargument.(Clickhereforaquick
tutorialoncellarraysinmatlab.)
Wearenowreadytoaddtheevidencetotheengine.
[engine,loglik]=enter_evidence(engine,evidence);
Thebehaviorofthisfunctionisalgorithmspecific,andisdiscussedinmoredetailbelow.Inthecaseofthejtreeengine,
enter_evidenceimplementsatwopassmessagepassingscheme.Thefirstreturnargumentcontainsthemodifiedengine,which
incorporatestheevidence.Thesecondreturnargumentcontainstheloglikelihoodoftheevidence.(Notallenginesarecapableof
computingtheloglikelihood.)
Finally,wecancomputep=P(S=2|W=2)asfollows.
marg=marginal_nodes(engine,S);
marg.T
ans=
0.57024
0.42976
p=marg.T(2);
Weseethatp=0.4298.
Nowletusaddtheevidencethatitwasraining,andseewhatdifferenceitmakes.
evidence{R}=2;
[engine,loglik]=enter_evidence(engine,evidence);
marg=marginal_nodes(engine,S);
p=marg.T(2);
Wefindthatp=P(S=2|W=2,R=2)=0.1945,whichislowerthanbefore,becausetheraincan``explainaway''thefactthatthegrassis
wet.
Youcanplotamarginaldistributionoveradiscretevariableasabarchartusingthebuilt'bar'function:
bar(marg.T)
Thisiswhatitlookslike
Observednodes
Whathappensifweaskforthemarginalonanobservednode,e.g.P(W|W=2)?Anobserveddiscretenodeeffectivelyonlyhas1value
(theobservedone)allothervalueswouldresultin0probability.Forefficiency,BNTtreatsobserved(discrete)nodesasiftheywere
setto1,asweseebelow:
evidence=cell(1,N);
evidence{W}=2;
engine=enter_evidence(engine,evidence);
m=marginal_nodes(engine,W);
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics
5/32
9/21/2016
HowtousetheBayesNetToolbox
m.T
ans=
1
Thiscangetalittleconfusing,sinceweassignedW=2.SowecanaskBNTtoaddtheevidencebackinbypassinginanoptional
argument:
m=marginal_nodes(engine,W,1);
m.T
ans=
0
1
ThisshowsthatP(W=1|W=2)=0andP(W=2|W=2)=1.
Computingjointdistributions
Wecancomputethejointprobabilityonasetofnodesasinthefollowingexample.
evidence=cell(1,N);
[engine,ll]=enter_evidence(engine,evidence);
m=marginal_nodes(engine,[SRW]);
misastructure.The'T'fieldisamultidimensionalarray(inthiscase,3dimensional)thatcontainsthejointprobabilitydistributionon
thespecifiednodes.
>>m.T
ans(:,:,1)=
0.29000.0410
0.02100.0009
ans(:,:,2)=
00.3690
0.18900.0891
WeseethatP(S=1,R=1,W=2)=0,sinceitisimpossibleforthegrasstobewetifboththerainandsprinklerareoff.
LetusnowaddsomeevidencetoR.
evidence{R}=2;
[engine,ll]=enter_evidence(engine,evidence);
m=marginal_nodes(engine,[SRW])
m=
domain:[234]
T:[2x1x2double]
>>m.T
m.T
ans(:,:,1)=
0.0820
0.0018
ans(:,:,2)=
0.7380
0.1782
ThejointT(i,j,k)=P(S=i,R=j,W=k|evidence)shouldhaveT(i,1,k)=0foralli,k,sinceR=1isincompatiblewiththeevidencethatR=2.
Insteadofcreatinglargetableswithmany0s,BNTsetstheeffectivesizeofobserved(discrete)nodesto1,asexplainedabove.Thisis
whym.Thassize2x1x2.Togeta2x2x2table,type
m=marginal_nodes(engine,[SRW],1)
m=
domain:[234]
T:[2x2x2double]
>>m.T
m.T
ans(:,:,1)=
00.082
00.0018
ans(:,:,2)=
00.738
00.1782
Note:Itisnotalwayspossibletocomputethejointonarbitrarysetsofnodes:itdependsonwhichinferenceengineyouuse,as
discussedinmoredetailbelow.
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics
6/32
9/21/2016
HowtousetheBayesNetToolbox
Soft/virtualevidence
Sometimesanodeisnotobserved,butwehavesomedistributionoveritspossiblevaluesthisisoftencalled"soft"or"virtual"
evidence.Onecanusethisasfollows
[engine,loglik]=enter_evidence(engine,evidence,'soft',soft_evidence);
wheresoft_evidence{i}iseither[](ifnodeihasnosoftevidence)orisavectorrepresentingtheprobabilitydistributionoveri's
possiblevalues.Forexample,ifwedon'tknowi'sexactvalue,butweknowitslikelihoodratiois60/40,wecanwriteevidence{i}=[]
andsoft_evidence{i}=[0.60.4].
Currentlyonlyjtree_inf_enginesupportsthisoption.Itassumesthatallhiddennodes,andallnodesforwhichwehavesoftevidence,
arediscrete.Foralongerexample,seeBNT/examples/static/softev1.m.
Mostprobableexplanation
Tocomputethemostprobableexplanation(MPE)oftheevidence(i.e.,themostprobableassignment,oramodeofthejoint),use
[mpe,ll]=calc_mpe(engine,evidence);
mpe{i}isthemostlikelyvalueofnodei.Thiscallsenter_evidencewiththe'maximize'flagsetto1,whichcausestheenginetodo
maxproductinsteadofsumproduct.Theresultingmaxmarginalsarethenthresholded.Ifthereismorethanonemaximumprobability
assignment,wemusttakecaretobreaktiesinaconsistentmanner(thresholdingthemaxmarginalsmaygivethewrongresult).To
forcethisbehavior,type
[mpe,ll]=calc_mpe(engine,evidence,1);
NotethatcomputingtheMPEissometiescalledabductivereasoning.
Youcanalsousecalc_mpe_bucketwrittenbyRonZohar,thatdoesaforwardsmaxproductpass,andthenabackwardstracebackpass,
whichishowViterbiistraditionallyimplemented.
ConditionalProbabilityDistributions
AConditionalProbabilityDistributions(CPD)definesP(X(i)|X(Pa(i))),whereX(i)isthei'thnode,andX(Pa(i))aretheparentsof
nodei.Therearemanywaystorepresentthisdistribution,whichdependinpartonwhetherX(i)andX(Pa(i))arediscrete,continuous,
oracombination.Wewilldiscussvariousrepresentationsbelow.
Tabularnodes
IftheCPDisrepresentedasatable(i.e.,ifitisamultinomialdistribution),ithasanumberofparametersthatisexponentialinthe
numberofparents.Seetheexampleabove.
Noisyornodes
AnoisyORnodeislikearegularlogicalORgateexceptthatsometimestheeffectsofparentsthatareongetinhibited.Lettheprob.
thatparentigetsinhibitedbeq(i).Thenanode,C,with2parents,AandB,hasthefollowingCPD,whereweuseFandTtorepresent
offandon(1and2inBNT).
ABP(C=off)P(C=on)
FF1.00.0
TFq(A)1q(A)
FTq(B)1q(B)
TTq(A)q(B)1q(A)q(B)
Thusweseethatthecausesgetinhibitedindependently.Itiscommontoassociatea"leak"nodewithanoisyorCPD,whichislikea
parentthatisalwayson.Thiscanaccountforallotherunmodelledcauseswhichmightturnthenodeon.
Thenoisyordistributionissimilartothelogisticdistribution.Toseethis,letthenodes,S(i),havevaluesin{0,1},andletq(i,j)bethe
prob.thatjinhibitsi.Then
Pr(S(i)=1|parents(S(i)))=1prod_{j}q(i,j)^S(j)
Nowdefinew(i,j)=lnq(i,j)andrho(x)=1exp(x).Then
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics
7/32
9/21/2016
HowtousetheBayesNetToolbox
Pr(S(i)=1|parents(S(i)))=rho(sum_jw(i,j)S(j))
Forasigmoidnode,wehave
Pr(S(i)=1|parents(S(i)))=sigma(sum_jw(i,j)S(j))
wheresigma(x)=1/(1+exp(x)).Hencetheydifferinthechoiceoftheactivationfunction(althoughbotharemonotonically
increasing).Inaddition,inthecaseofanoisyor,theweightsareconstrainedtobepositive,sincetheyderivefromprobabilitiesq(i,j).
Inbothcases,thenumberofparametersislinearinthenumberofparents,unlikethecaseofamultinomialdistribution,wherethe
numberofparametersisexponentialinthenumberofparents.WewillseeanexampleofnoisyORnodesbelow.
Other(noisy)deterministicnodes
DeterministicCPDsfordiscreterandomvariablescanbecreatedusingthedeterministic_CPDclass.Itisalsopossibleto'flip'the
outputofthefunctionwithsomeprobability,tosimulatenoise.Theboolean_CPDclassisjustaspecialcaseofadeterministicCPD,
wheretheparentsandchildareallbinary.
Bothoftheseclassesarejust"syntacticsugar"forthetabular_CPDclass.
Softmaxnodes
Ifwehaveadiscretenodewithacontinuousparent,wecandefineitsCPDusingasoftmaxfunction(alsoknownasthemultinomial
logitfunction).Thisactslikeasoftthresholdingoperator,andisdefinedasfollows:
exp(w(:,i)'*x+b(i))
Pr(Q=i|X=x)=
sum_jexp(w(:,j)'*x+b(j))
Theparametersofasoftmaxnode,w(:,i)andb(i),i=1..|Q|,havethefollowinginterpretation:w(:,i)w(:,j)isthenormalvectortothe
decisionboundarybetweenclassesiandj,andb(i)b(j)isitsoffset(bias).Forexample,supposeXisa2vector,andQisbinary.Then
w=[11;
00];
b=[00];
meansclass1arepointsinthe2Dplanewithpositivexcoordinate,andclass2arepointsinthe2Dplanewithnegativexcoordinate.If
whaslargemagnitude,thedecisionboundaryissharp,otherwiseitissoft.InthespecialcasethatQisbinary(0/1),thesoftmax
functionreducestothelogistic(sigmoid)function.
Fittingasoftmaxfunctioncanbedoneusingtheiterativelyreweightedleastsquares(IRLS)algorithm.Weusetheimplementation
fromNetlab.Notethatsincethesoftmaxdistributionisnotintheexponentialfamily,itdoesnothavefinitesufficientstatistics,and
hencewemuststoreallthetrainingdatainuncompressedform.Ifthistakestoomuchspace,oneshoulduseonline(stochastic)
gradientdescent(notimplementedinBNT).
Ifasoftmaxnodealsohasdiscreteparents,weuseadifferentsetofw/bparametersforeachcombinationofparentvalues,asinthe
conditionallinearGaussianCPD.ThisfeaturewasimplementedbyPierpaoloBrutti.Heiscurrentlyextendingitsothatdiscrete
parentscanbetreatedasiftheywerecontinuous,byaddingindicatorvariablestotheXvector.
Wewillseeanexampleofsoftmaxnodesbelow.
Neuralnetworknodes
PierpaoloBruttihasimplementedthemlp_CPDclass,whichusesamultilayerperceptrontoimplementamappingfromcontinuous
parentstodiscretechildren,similartothesoftmaxfunction.(Iftherearealsodiscreteparents,itcreatesamixtureofMLPs.)Ituses
codefromNetlab.Thisisworkinprogress.
Rootnodes
Arootnodehasnoparentsandnoparametersitcanbeusedtomodelanobserved,exogeneousinputvariable,i.e.,onewhichis
"outside"themodel.Thisisusefulforconditionaldensitymodels.Wewillseeanexampleofrootnodesbelow.
Gaussiannodes
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics
8/32
9/21/2016
HowtousetheBayesNetToolbox
Wenowconsideradistributionsuitableforthecontinuousvaluednodes.SupposethenodeiscalledY,itscontinuousparents(ifany)
arecalledX,anditsdiscreteparents(ifany)arecalledQ.ThedistributiononYisdefinedasfollows:
noparents:Y~N(mu,Sigma)
ctsparents:Y|X=x~N(mu+Wx,Sigma)
discreteparents:Y|Q=i~N(mu(:,i),Sigma(:,:,i))
ctsanddiscreteparents:Y|X=x,Q=i~N(mu(:,i)+W(:,:,i)*x,Sigma(:,:,i))
whereN(mu,Sigma)denotesaNormaldistributionwithmeanmuandcovarianceSigma.Let|X|,|Y|and|Q|denotethesizesofX,Y
andQrespectively.Iftherearenodiscreteparents,|Q|=1ifthereismorethanone,then|Q|=avectorofthesizesofeachdiscrete
parent.Iftherearenocontinuousparents,|X|=0ifthereismorethanone,then|X|=thesumoftheirsizes.Thenmuisa|Y|*|Q|vector,
Sigmaisa|Y|*|Y|*|Q|positivesemidefinitematrix,andWisa|Y|*|X|*|Q|regression(weight)matrix.
WecancreateaGaussiannodewithrandomparametersasfollows.
bnet.CPD{i}=gaussian_CPD(bnet,i);
Wecanspecifythevalueofoneormoreoftheparametersasinthefollowingexample,inwhich|Y|=2,and|Q|=1.
bnet.CPD{i}=gaussian_CPD(bnet,i,'mean',[0;0],'weights',randn(Y,X),'cov',eye(Y));
WewillseeanexampleofconditionallinearGaussiannodesbelow.
WhenlearningGaussiansfromdata,itishelpfultoensurethedatahasasmallmagnitde(seee.g.,KPMstats/standardize)toprevent
numericalproblems.Unlessyouhavealotofdata,itisalsoaverygoodideatousediagonalinsteadoffullcovariancematrices.(BNT
doesnotcurrentlysupportsphericalcovariances,althoughitwouldbeeasytoadd,sinceKPMstats/clg_Mstepsupportsthisoptionyou
wouldjustneedtomodifygaussian_CPD/update_esstoaccumulateweightedinnerproducts.)
Othercontinuousdistributions
CurrentlyBNTdoesnotsupportanyCPDsforcontinuousnodesotherthantheGaussian.However,youcanuseamixtureof
Gaussianstoapproximateothercontinuousdistributions.WewillseesomeanexampleofthiswiththeIFAmodelbelow.
Generalizedlinearmodelnodes
Inthefuture,wemayincorporatesomeofthefunctionalityofglmlabintoBNT.
Classification/regressiontreenodes
WeplantoaddclassificationandregressiontreestodefineCPDsfordiscreteandcontinuousnodes,respectively.Treeshavemany
advantages:theyareeasytointerpret,theycandofeatureselection,theycanhandlediscreteandcontinuousinputs,theydonotmake
strongassumptionsabouttheformofthedistribution,thenumberofparameterscangrowinadatadependentway(i.e.,theyaresemi
parametric),theycanhandlemissingdata,etc.However,theyarenotyetimplemented.
SummaryofCPDtypes
WelistallthedifferenttypesofCPDssupportedbyBNT.ForeachCPD,wespecifyifthechildandparentscanbediscrete(D)or
continuous(C)(Binary(B)nodesareaspecialcase).Wealsospecifywhichmethodseachclasssupports.Ifamethodisinherited,the
nameoftheparentclassismentioned.Ifaparentclasscallsachildmethod,thisismentioned.
TheCPD_to_CPTmethodconvertsaCPDtoatablethisrequiresthatthechildandallparentsarediscrete.TheCPTmightbe
exponentiallybig...convert_to_tableevaluatesaCPDwithevidence,andrepresentsthetheresultingpotentialasanarray.This
requiresthatthechildisdiscrete,andanycontinuousparentsareobserved.convert_to_potevaluatesaCPDwithevidence,and
representstheresultingpotentialasadpot,gpot,cgpotorupot,asrequested.(d=discrete,g=Gaussian,cg=conditionalGaussian,u=
utility).
Whenwesampleanode,alltheparentsareobserved.Whenwecomputethe(log)probabilityofanode,alltheparentsandthechild
areobserved.
Wealsospecifyiftheparametersarelearnable.ForlearningwithEM,werequirethemethodsreset_ess,update_essand
maximize_params.Forlearningfromfullyobserveddata,werequirethemethodlearn_params.Bydefault,allclassesinheritthisfrom
generic_CPD,whichsimplycallsupdate_essNtimes,onceforeachdatacase,followedbymaximize_params,i.e.,itislikeEM,
withouttheEstep.Someclassesimplementabatchformula,whichisquicker.
Bayesianlearningmeanscomputingaposteriorovertheparametersgivenfullyobserveddata.
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics
9/32
9/21/2016
HowtousetheBayesNetToolbox
Pearlmeansweimplementthemethodscompute_piandcompute_lambda_msg,usedbypearl_inf_engine,whichrunsondirected
graphs.belprop_inf_engineonlyneedsconvert_to_pot.HThepearlmethodscanexploitspecialpropertiesoftheCPDsforcomputing
themessagesefficiently,whereasbelpropdoesnot.
Theonlymethodimplementedbygeneric_CPDisadjustable_CPD,whichisnotshown,sinceitisnotveryinteresting.
Name
boolean
sample
prob
deterministic D
Discrete
C/D
Gaussian
gmux
C
C
C/D
C/D
MLP
C/D
noisyor
root
C/D none
noparams N
softmax
generic
C/D C/D
Virtual
class
Tabular
C/D
Calls
Calls
Calls
Calls
N
CPD_to_CPT conv_to_table conv_to_table conv_to_table
N
N
N
N
N
Y
N
multiplexer N
multilayer
N
perceptron
Y
Y
Inheritsfrom
discrete
Y
N
Inheritsfrom
discrete
Y
Y
N
N
Inheritsfrom
Y
discrete
Examplemodels
Gaussianmixturemodels
RichardW.DeVaulhasmadeadetailedtutorialonhowtofitmixturesofGaussiansusingBNT.Availablehere.
PCA,ICA,andallthat
InFigure(a)below,weshowhowFactorAnalysiscanbethoughtofasagraphicalmodel.Here,XhasanN(0,I)prior,andY|X=x~
N(mu+Wx,Psi),wherePsiisdiagonalandWiscalledthe"factorloadingmatrix".SincethenoiseonbothXandYisdiagonal,the
componentsofthesevectorsareuncorrelated,andhencecanberepresentedasindividualscalarnodes,asweshowin(b).(Thisis
usefulifpartsoftheobservationsontheYvectorareoccasionallymissing.)Weusuallytakek=|X|<<|Y|=D,sothemodeltriesto
explainmanyobservationsusingalowdimensionalsubspace.
(a)
(b)
(c)
(d)
WecancreatethismodelinBNTasfollows.
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics
10/32
9/21/2016
HowtousetheBayesNetToolbox
ns=[kD];
dag=zeros(2,2);
dag(1,2)=1;
bnet=mk_bnet(dag,ns,'discrete',[]);
bnet.CPD{1}=gaussian_CPD(bnet,1,'mean',zeros(k,1),'cov',eye(k),...
'cov_type','diag','clamp_mean',1,'clamp_cov',1);
bnet.CPD{2}=gaussian_CPD(bnet,2,'mean',zeros(D,1),'cov',diag(Psi0),'weights',W0,...
'cov_type','diag','clamp_mean',1);
TherootnodeisclampedtotheN(0,I)distribution,sothatwewillnotupdatetheseparametersduringlearning.Themeanoftheleaf
nodeisclampedto0,sinceweassumethedatahasbeencentered(haditsmeansubtractedoff)thisisjustforsimplicity.Finally,the
covarianceoftheleafnodeisconstrainedtobediagonal.W0andPsi0aretheinitialparameterguesses.
Wecanfitthismodel(i.e.,estimateitsparametersinamaximumlikelihood(ML)sense)usingEM,asweexplainbelow.Not
surprisingly,theMLestimatesformuandPsiturnouttobeidenticaltothesamplemeanandvariance,whichcanbecomputeddirectly
as
mu_ML=mean(data);
Psi_ML=diag(cov(data));
NotethatWcanonlybeidentifieduptoarotationmatrix,becauseofthesphericalsymmetryofthesource.
IfwerestrictPsitobespherical,i.e.,Psi=sigma*I,thereisaclosedformsolutionforWaswell,i.e.,wedonotneedtouseEM.In
particular,Wcontainsthefirst|X|eigenvectorsofthesamplecovariancematrix,withscalingsdeterminedbytheeigenvaluesand
sigma.ClassicalPCAcanbeobtainedbytakingthesigma>0limit.Fordetails,see
"EMalgorithmsforPCAandSPCA",SamRoweis,NIPS97.(Matlabsoftware)
"Mixturesofprobabilisticprincipalcomponentanalyzers",TippingandBishop,NeuralComputation11(2):443482,1999.
Byaddingahiddendiscretevariable,wecancreatemixturesofFAmodels,asshownin(c).Nowwecanexplainthedatausingasetof
subspaces.WecancreatethismodelinBNTasfollows.
ns=[MkD];
dag=zeros(3);
dag(1,3)=1;
dag(2,3)=1;
bnet=mk_bnet(dag,ns,'discrete',1);
bnet.CPD{1}=tabular_CPD(bnet,1,Pi0);
bnet.CPD{2}=gaussian_CPD(bnet,2,'mean',zeros(k,1),'cov',eye(k),'cov_type','diag',...
'clamp_mean',1,'clamp_cov',1);
bnet.CPD{3}=gaussian_CPD(bnet,3,'mean',Mu0','cov',repmat(diag(Psi0),[11M]),...
'weights',W0,'cov_type','diag','tied_cov',1);
NoticehowthecovariancematrixforYisthesameforallvaluesofQthatis,thenoiselevelineachsubspaceisassumedthesame.
However,weallowtheoffset,mu,tovary.Fordetails,see
TheEMAlgorithmforMixturesofFactorAnalyzers,Ghahramani,Z.andHinton,G.E.(1996),UniversityofTorontoTechnical
ReportCRGTR961.(Matlabsoftware)
"Mixturesofprobabilisticprincipalcomponentanalyzers",TippingandBishop,NeuralComputation11(2):443482,1999.
IhaveincludedZoubin'sspecializedMFAcode(withhispermission)withthetoolbox,soyoucancheckthatBNTgivesthesame
results:see'BNT/examples/static/mfa1.m'.
IndependentFactorAnalysis(IFA)generalizesFAbyallowinganonGaussianprioroneachcomponentofX.(Notethatwecan
approximateanonGaussianpriorusingamixtureofGaussians.)Thismeansthatthelikelihoodfunctionisnolongerrotationally
invariant,sowecanuniquelyidentifyWandthehiddensourcesX.IFAalsoallowsanondiagonalPsi(i.e.correlationsbetweenthe
componentsofY).WerecoverclassicalIndependentComponentsAnalysis(ICA)inthePsi>0limit,andbyassumingthat|X|=|Y|,so
thattheweightmatrixWissquareandinvertible.Fordetails,see
IndependentFactorAnalysis,H.Attias,NeuralComputation11:803851,1998.
Mixturesofexperts
Asanexampleoftheuseofthesoftmaxfunction,weintroducetheMixtureofExpertsmodel.Asbefore,circlesdenotecontinuous
valuednodes,squaresdenotediscretenodes,clearmeanshidden,andshadedmeansobserved.
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics
11/32
9/21/2016
HowtousetheBayesNetToolbox
Xistheobservedinput,Yistheoutput,andtheQnodesarehidden"gating"nodes,whichselecttheappropriatesetofparametersfor
Y.Duringtraining,Yisassumedobserved,butfortesting,thegoalistopredictYgivenX.Notethatthisisaconditionaldensity
model,sowedon'tassociateanyparameterswithX.HenceX'sCPDwillbearootCPD,whichisawayofmodellingexogenous
nodes.Iftheoutputisacontinuousvaluedquantity,weassumethe"experts"arelinearregressionunits,andsetY'sCPDtolinear
Gaussian.Iftheoutputisdiscrete,wesetY'sCPDtoasoftmaxfunction.TheQCPDswillalwaysbesoftmaxfunctions.
Asaconcreteexample,considerthemixtureofexpertsmodelwhereXandYarescalars,andQisbinary.Thisisjustpiecewiselinear
regression,wherewehavetwolinesegments,i.e.,
Wecancreatethismodelwithrandomparametersasfollows.(ThiscodeisbundledinBNT/examples/static/mixexp2.m.)
X=1;
Q=2;
Y=3;
dag=zeros(3,3);
dag(X,[QY])=1
dag(Q,Y)=1;
ns=[121];%makeXandYscalars,andhave2experts
onodes=[13];
bnet=mk_bnet(dag,ns,'discrete',2,'observed',onodes);
rand('state',0);
randn('state',0);
bnet.CPD{1}=root_CPD(bnet,1);
bnet.CPD{2}=softmax_CPD(bnet,2);
bnet.CPD{3}=gaussian_CPD(bnet,3);
NowletusfitthismodelusingEM.Firstweloadthedata(1000trainingcases)andplotthem.
data=load('/examples/static/Misc/mixexp_data.txt','ascii');
plot(data(:,1),data(:,2),'.');
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics
12/32
9/21/2016
HowtousetheBayesNetToolbox
Thisiswhatthemodellookslikebeforetraining.(ThankstoThomasHofmanforwritingthisplottingroutine.)
Nowlet'strainthemodel,andplotthefinalperformance.(Wewilldiscusshowtotrainmodelsinmoredetailbelow.)
ncases=size(data,1);%eachrowofdataisatrainingcase
cases=cell(3,ncases);
cases([13],:)=num2cell(data');%eachcolumnofcasesisatrainingcase
engine=jtree_inf_engine(bnet);
max_iter=20;
[bnet2,LLtrace]=learn_params_em(engine,cases,max_iter);
(Wespecifywhichnodeswillbeobservedwhenwecreatetheengine.HenceBNTknowsthatthehiddennodesarealldiscrete.For
complexmodels,thiscanleadtoasignificantspeedup.)Belowweshowwhatthemodellookslikeafter16iterationsofEM(with100
IRLSiterationsperMstep),whenitconvergedusingthedefaultconvergencetolerance(thatthefractionalchangeintheloglikelihood
belessthan1e3).Beforelearning,theloglikelihoodwas322.927442afterwards,itwas13.728778.
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics
13/32
9/21/2016
HowtousetheBayesNetToolbox
(SeeBNT/examples/static/mixexp2.mfordetailsofthecode.)
Hierarchicalmixturesofexperts
Ahierarchicalmixtureofexperts(HME)extendsthemixtureofexpertsmodelbyhavingmorethanonehiddennode.Atwolevel
exampleisshownbelow,alongwithitsmoretraditionalrepresentationasaneuralnetwork.Thisislikea(balanced)probabilistic
decisiontreeofheight2.
PierpaoloBruttihaswrittenanextensivesetofroutinesforHMEs,whicharebundledwithBNT:seetheexamples/static/HME
directory.Theseroutinesallowyoutochoosethenumberofhidden(gating)layers,andtheformoftheexperts(softmaxorMLP).See
thefilehmemenu,whichprovidesademo.Forexample,thefigurebelowshowsthedecisionboundarieslearnedforaternary
classificationproblem,usinga2levelHMEwithsoftmaxgatesandsoftmaxexpertsthetrainingsetisontheleft,thetestingsetonthe
right.
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics
14/32
9/21/2016
HowtousetheBayesNetToolbox
Formoredetails,seethefollowing:
HierarchicalmixturesofexpertsandtheEMalgorithmM.I.JordanandR.A.Jacobs.NeuralComputation,6,181214,1994.
DavidMartin'smatlabcodeforHME
Whythelogisticfunction?Atutorialdiscussiononprobabilitiesandneuralnetworks.M.I.Jordan.MITComputational
CognitiveScienceReport9503,August1995.
"GeneralizedLinearModels",McCullaghandNelder,ChapmanandHalll,1983.
"Improvedlearningalgorithmsformixturesofexpertsinmulticlassclassification".K.Chen,L.Xu,H.Chi.NeuralNetworks
(1999)12:12291252.
ClassificationUsingHierarchicalMixturesofExpertsS.R.WaterhouseandA.J.Robinson.InProc.IEEEWorkshoponNeural
NetworkforSignalProcessingIV(1994),pp.177186
Localizedmixturesofexperts,P.Moerland,1998.
"Nonlineargatedexpertsfortimeseries",A.S.WeigendandM.Mangeas,1995.
QMR
Bayesnetsoriginallyaroseoutofanattempttoaddprobabilitiestoexpertsystems,andthisisstillthemostcommonuseforBNs.A
famousexampleisQMRDT,adecisiontheoreticreformulationoftheQuickMedicalReference(QMR)model.
Here,thetoplayerrepresentshiddendiseasenodes,andthebottomlayerrepresentsobservedsymptomnodes.Thegoalistoinferthe
posteriorprobabilityofeachdiseasegivenallthesymptoms(whichcanbepresent,absentorunknown).Eachnodeinthetoplayerhas
aBernoulliprior(withalowpriorprobabilitythatthediseaseispresent).Sinceeachnodeinthebottomlayerhasahighfanin,weuse
anoisyORparameterizationeachdiseasehasanindependentchanceofcausingeachsymptom.TherealQMRDTmodelis
copyright,butwecancreatearandomQMRlikemodelasfollows.
functionbnet=mk_qmr_bnet(G,inhibit,leak,prior)
%MK_QMR_BNETMakeaQMRmodel
%bnet=mk_qmr_bnet(G,inhibit,leak,prior)
%
%G(i,j)=1iffthereisanarcfromdiseaseitofindingj
%inhibit(i,j)=inhibitionprobabilityoni>jarc
%leak(j)=inhibitionprob.onleak>jarc
%prior(i)=prob.diseaseiison
[NdiseasesNfindings]=size(inhibit);
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics
15/32
9/21/2016
HowtousetheBayesNetToolbox
N=Ndiseases+Nfindings;
finding_node=Ndiseases+1:N;
ns=2*ones(1,N);
dag=zeros(N,N);
dag(1:Ndiseases,finding_node)=G;
bnet=mk_bnet(dag,ns,'observed',finding_node);
ford=1:Ndiseases
CPT=[1prior(d)prior(d)];
bnet.CPD{d}=tabular_CPD(bnet,d,CPT');
end
fori=1:Nfindings
fnode=finding_node(i);
ps=parents(G,i);
bnet.CPD{fnode}=noisyor_CPD(bnet,fnode,leak(i),inhibit(ps,i));
end
InthefileBNT/examples/static/qmr1,wecreatearandombipartitegraphG,with5diseasesand10findings,andrandomparameters.
(Ingeneral,tocreatearandomdag,use'mk_random_dag'.)Wecanvisualizetheresultinggraphstructureusingthemethodsdiscussed
below,withthefollowingresults:
Nowletusputsomerandomevidenceonalltheleavesexcepttheveryfirstandverylast,andcomputethediseaseposteriors.
pos=2:floor(Nfindings/2);
neg=(pos(end)+1):(Nfindings1);
onodes=myunion(pos,neg);
evidence=cell(1,N);
evidence(findings(pos))=num2cell(repmat(2,1,length(pos)));
evidence(findings(neg))=num2cell(repmat(1,1,length(neg)));
engine=jtree_inf_engine(bnet);
[engine,ll]=enter_evidence(engine,evidence);
post=zeros(1,Ndiseases);
fori=diseases(:)'
m=marginal_nodes(engine,i);
post(i)=m.T(2);
end
JunctiontreecanbequiteslowonlargeQMRmodels.Fortunately,itispossibletoexploitpropertiesofthenoisyORfunctiontospeed
upexactinferenceusinganalgorithmcalledquickscore,discussedbelow.
ConditionalGaussianmodels
AconditionalGaussianmodelisoneinwhich,conditionedonallthediscretenodes,thedistributionovertheremaining(continuous)
nodesismultivariateGaussian.Thismeanswecanhavearcsfromdiscrete(D)tocontinuous(C)nodes,butnotviceversa.(Weare
allowedC>Darcsifthecontinuousnodesareobserved,asinthemixtureofexpertsmodel,sincethisdistributioncanberepresented
withadiscretepotential.)
WenowgiveanexampleofaCGmodel,fromthepaper"PropagationofProbabilities,MeansamdVariancesinMixedGraphical
AssociationModels",SteffenLauritzen,JASA87(420):10981108,1992(reprintedinthebook"ProbabilisticNetworksandExpert
Systems",R.G.Cowell,A.P.Dawid,S.L.LauritzenandD.J.Spiegelhalter,Springer,1999.)
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics
16/32
9/21/2016
HowtousetheBayesNetToolbox
Specifyingthegraph
Considerthemodelofwasteemissionsfromanincineratorplantshownbelow.Wefollowthestandardconventionthatshadednodes
areobserved,clearnodesarehidden.Wealsousethenonstandardconventionthatsquarenodesarediscrete(tabular)androundnodes
areGaussian.
Wecancreatethismodelasfollows.
F=1;W=2;E=3;B=4;C=5;D=6;Min=7;Mout=8;L=9;
n=9;
dag=zeros(n);
dag(F,E)=1;
dag(W,[EMinD])=1;
dag(E,D)=1;
dag(B,[CD])=1;
dag(D,[LMout])=1;
dag(Min,Mout)=1;
%nodesizesallctsnodesarescalar,alldiscretenodesarebinary
ns=ones(1,n);
dnodes=[FWB];
cnodes=mysetdiff(1:n,dnodes);
ns(dnodes)=2;
bnet=mk_bnet(dag,ns,'discrete',dnodes);
'dnodes'isalistofthediscretenodes'cnodes'isthecontinuousnodes.'mysetdiff'isafasterversionofthebuiltin'setdiff'.
Specifyingtheparameters
Theparametersofthediscretenodescanbespecifiedasfollows.
bnet.CPD{B}=tabular_CPD(bnet,B,'CPT',[0.850.15]);%1=stable,2=unstable
bnet.CPD{F}=tabular_CPD(bnet,F,'CPT',[0.950.05]);%1=intact,2=defect
bnet.CPD{W}=tabular_CPD(bnet,W,'CPT',[2/75/7]);%1=industrial,2=household
Theparametersofthecontinuousnodescanbespecifiedasfollows.
bnet.CPD{E}=gaussian_CPD(bnet,E,'mean',[3.90.43.20.5],...
'cov',[0.000020.00010.000020.0001]);
bnet.CPD{D}=gaussian_CPD(bnet,D,'mean',[6.56.07.57.0],...
'cov',[0.030.040.10.1],'weights',[1111]);
bnet.CPD{C}=gaussian_CPD(bnet,C,'mean',[21],'cov',[0.10.3]);
bnet.CPD{L}=gaussian_CPD(bnet,L,'mean',3,'cov',0.25,'weights',0.5);
bnet.CPD{Min}=gaussian_CPD(bnet,Min,'mean',[0.50.5],'cov',[0.010.005]);
bnet.CPD{Mout}=gaussian_CPD(bnet,Mout,'mean',0,'cov',0.002,'weights',[11]);
Inference
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics
17/32
9/21/2016
HowtousetheBayesNetToolbox
Firstwecomputetheunconditionalmarginals.
engine=jtree_inf_engine(bnet);
evidence=cell(1,n);
[engine,ll]=enter_evidence(engine,evidence);
marg=marginal_nodes(engine,E);
'marg'isastructurethatcontainsthefields'mu'and'Sigma',whichcontainthemeanand(co)varianceofthemarginalonE.Inthis
case,theyarebothscalars.Letuschecktheymatchthepublishedfigures(to2decimalplaces).
tol=1e2;
assert(approxeq(marg.mu,3.25,tol));
assert(approxeq(sqrt(marg.Sigma),0.709,tol));
Wecancomputetheotherposteriorssimilarly.Nowletusaddsomeevidence.
evidence=cell(1,n);
evidence{W}=1;%industrial
evidence{L}=1.1;
evidence{C}=0.9;
[engine,ll]=enter_evidence(engine,evidence);
Nowwefind
marg=marginal_nodes(engine,E);
assert(approxeq(marg.mu,3.8983,tol));
assert(approxeq(sqrt(marg.Sigma),0.0763,tol));
Wecanalsocomputethejointprobabilityonasetofnodes.Forexample,P(D,Mout|evidence)isa2DGaussian:
marg=marginal_nodes(engine,[DMout])
marg=
domain:[68]
mu:[2x1double]
Sigma:[2x2double]
T:1.0000
Themeanis
marg.mu
ans=
3.6077
4.1077
andthecovariancematrixis
marg.Sigma
ans=
0.10620.1062
0.10620.1182
ItiseasytovisualizethisposteriorusingstandardMatlabplottingfunctions,e.g.,
gaussplot2d(marg.mu,marg.Sigma);
producesthefollowingpicture.
TheTfieldindicatesthatthemixingweightofthisGaussiancomponentis1.0.Ifthejointcontainsdiscreteandcontinuousvariables,
theresultwillbeamixtureofGaussians,e.g.,
marg=marginal_nodes(engine,[FE])
domain:[13]
mu:[3.90000.4003]
Sigma:[1x1x2double]
T:[0.99954.7373e04]
TheinterpretationisSigma(i,j,k)=Cov[E(i)E(j)|F=k].Inthiscase,Eisascalar,soi=j=1kspecifiesthemixturecomponent.
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics
18/32
9/21/2016
HowtousetheBayesNetToolbox
WesawinthesprinklernetworkthatBNTsetstheeffectivesizeofobserveddiscretenodesto1,sincetheyonlyhaveonelegalvalue.
Forcontinuousnodes,BNTsetstheirlengthto0,sincetheyhavebeenreducedtoapoint.Forexample,
marg=marginal_nodes(engine,[BC])
domain:[45]
mu:[]
Sigma:[]
T:[0.01230.9877]
Itissimpletopostprocesstheoutputofmarginal_nodes.Forexample,thefileBNT/examples/static/cg1setsthemutermofobserved
nodestotheirobservedvalue,andtheSigmatermto0(sinceobservednodeshavenovariance).
NotethattheimplementedversionofthejunctiontreeisnumericallyunstablewhenusingCGpotentials(whichiswhy,intheexample
above,weonlyrequiredouranswerstoagreewiththepublishedonesto2dp.)Thisiswhyyoumightwanttouse
stab_cond_gauss_inf_engine,implementedbyShanHuang.Thisisdescribedin
"StableLocalComputationwithConditionalGaussianDistributions",S.LauritzenandF.Jensen,TechReportR992014,Dept.
Math.Sciences,AllborgUniv.,1999.
However,eventhenumericallystableversioncanbecomputationallyintractableiftherearemanyhiddendiscretenodes,becausethe
numberofmixturecomponentsgrowsexponentiallye.g.,inaswitchinglineardynamicalsystem.Ingeneral,onemustresortto
approximateinferencetechniques:seethediscussiononinferenceenginesbelow.
Otherhybridmodels
WhenwehaveC>Darcs,whereCishidden,weneedtouseapproximateinference.Oneapproach(notimplementedinBNT)is
describedin
AVariationalApproximationforBayesianNetworkswithDiscreteandContinuousLatentVariables,K.Murphy,UAI99.
Ofcourse,onecanalwaysusesamplingmethodsforapproximateinferenceinsuchmodels.
ParameterLearning
TheparameterestimationroutinesinBNTcanbeclassifiedinto4types,dependingonwhetherthegoalistocomputeafull(Bayesian)
posteriorovertheparametersorjustapointestimate(e.g.,MaximumLikelihoodorMaximumAPosteriori),andwhetherallthe
variablesarefullyobservedorthereismissingdata/hiddenvariables(partialobservability).
Fullobs
Partialobs
Point
Bayes bayes_update_params notyetsupported
learn_params
learn_params_em
Loadingdatafromafile
ToloadnumericdatafromanASCIItextfilecalled'dat.txt',whereeachrowisacaseandcolumnsareseparatedbywhitespace,such
as
0119791626.50.0
0219791367.00.0
...
youcanuse
data=load('dat.txt');
or
loaddat.txtascii
Inthelattercase,thedataisstoredinavariablecalled'dat'(thefilenameminustheextension).Alternatively,supposethedataisstored
ina.csvfile(hascommasseparatingthecolumns,andcontainsaheaderline),suchas
headerinfogoeshere
ORD,011979,1626.5,0.0
DSM,021979,1367.0,0.0
...
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics
19/32
9/21/2016
HowtousetheBayesNetToolbox
Youcanloadthisusing
[a,b,c,d]=textread('dat.txt','%s%d%f%f','delimiter',',','headerlines',1);
Ifyourfileisnotineitheroftheseformats,youcaneitherusePerltoconvertittothisformat,orusetheMatlabscanfcommand.Type
helpiofunformoreinformationonMatlab'sfilefunctions.
BNTlearningroutinesrequiredatatobestoredinacellarray.data{i,m}isthevalueofnodeiincase(example)m,i.e.,eachcolumnis
acase.Ifnodeiisnotobservedincasem(missingvalue),setdata{i,m}=[].(Notallthelearningroutinescancopewithsuchmissing
values,however.)Inthespecialcasethatallthenodesareobservedandarescalarvalued(asopposedtovectorvalued),thedatacanbe
storedinamatrix(asopposedtoacellarray).
Suppose,asinthemixtureofexpertsexample,thatwehave3nodesinthegraph:X(1)istheobservedinput,X(3)istheobserved
output,andX(2)isahidden(gating)node.Wecancreatethedatasetasfollows.
data=load('dat.txt');
ncases=size(data,1);
cases=cell(3,ncases);
cases([13],:)=num2cell(data');
Noticehowwetransposedthedata,toconvertrowsintocolumns.Also,cases{2,m}=[]forallm,sinceX(2)isalwayshidden.
Maximumlikelihoodparameterestimationfromcompletedata
Asanexample,let'sgeneratesomedatafromthesprinklernetwork,randomizetheparameters,andthentrytorecovertheoriginal
model.Firstwecreatesometrainingdatausingforwardssampling.
samples=cell(N,nsamples);
fori=1:nsamples
samples(:,i)=sample_bnet(bnet);
end
samples{j,i}containsthevalueofthej'thnodeincasei.sample_bnetreturnsacellarraybecause,ingeneral,eachnodemightbea
vectorofdifferentlength.Inthiscase,allnodesarediscrete(andhencescalars),sowecouldhaveusedaregulararrayinstead(which
canbequicker):
data=cell2num(samples);
Nowwecreateanetworkwithrandomparameters.(Theinitialvaluesofbnet2don'tmatterinthiscase,sincewecanfindtheglobally
optimalMLEindependentofwherewestart.)
%Makeatabularasa
bnet2=mk_bnet(dag,node_sizes);
seed=0;
rand('state',seed);
bnet2.CPD{C}=tabular_CPD(bnet2,C);
bnet2.CPD{R}=tabular_CPD(bnet2,R);
bnet2.CPD{S}=tabular_CPD(bnet2,S);
bnet2.CPD{W}=tabular_CPD(bnet2,W);
Finally,wefindthemaximumlikelihoodestimatesoftheparameters.
bnet3=learn_params(bnet2,samples);
Toviewthelearnedparameters,weusealittleMatlabhackery.
CPT3=cell(1,N);
fori=1:N
s=struct(bnet3.CPD{i});%violateobjectprivacy
CPT3{i}=s.CPT;
end
Herearetheparameterslearnedfornode4.
dispcpt(CPT3{4})
11:1.00000.0000
21:0.20000.8000
12:0.22730.7727
22:0.00001.0000
Soweseethatthelearnedparametersarefairlyclosetothe"true"ones,whichwedisplaybelow.
dispcpt(CPT{4})
11:1.00000.0000
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics
20/32
9/21/2016
HowtousetheBayesNetToolbox
21:0.10000.9000
12:0.10000.9000
22:0.01000.9900
Wecangetbetterresultsbyusingalargertrainingset,orusinginformativepriors(seebelow).
Parameterpriors
Currently,onlytabularCPDscanhavepriorsontheirparameters.TheconjugatepriorforamultinomialistheDirichlet.(Forbinary
randomvariables,themultinomialisthesameastheBernoulli,andtheDirichletisthesameastheBeta.)
TheDirichlethasasimpleinterpretationintermsofpseudocounts.IfweletN_ijk=thenum.timesX_i=kandPa_i=joccursinthe
trainingset,wherePa_iaretheparentsofX_i,thenthemaximumlikelihood(ML)estimateisT_ijk=N_ijk/N_ij(whereN_ij=
sum_k'N_ijk'),whichwillbe0ifN_ijk=0.Topreventusfromdeclaringthat(X_i=k,Pa_i=j)isimpossiblejustbecausethiseventwas
notseeninthetrainingset,wecanpretendwesawvaluekofX_i,foreachvaluejofPa_isomenumber(alpha_ijk)oftimesinthe
past.TheMAP(maximumaposterior)estimateisthen
T_ijk=(N_ijk+alpha_ijk)/(N_ij+alpha_ij)
andisnever0ifallalpha_ijk>0.Forexample,considerthenetworkA>B,whereAisbinaryandBhas3values.Auniformpriorfor
Bhastheform
B=1B=2B=3
A=1111
A=2111
whichcanbecreatedusing
tabular_CPD(bnet,i,'prior_type','dirichlet','dirichlet_type','unif');
Thispriordoesnotsatisfythelikelihoodequivalenceprinciple,whichsaysthatMarkovequivalentmodelsshouldhavethesame
marginallikelihood.Apriorthatdoessatisfythisprincipleisshownbelow.Heckerman(1995)callsthistheBDeuprior(likelihood
equivalentuniformBayesianDirichlet).
B=1B=2B=3
A=11/61/61/6
A=21/61/61/6
whereweputN/(q*r)ineachbinNistheequivalentsamplesize,r=|A|,q=|B|.Thiscanbecreatedasfollows
tabular_CPD(bnet,i,'prior_type','dirichlet','dirichlet_type','BDeu');
Here,1istheequivalentsamplesize,andisthestrengthoftheprior.Youcanchangethisusing
tabular_CPD(bnet,i,'prior_type','dirichlet','dirichlet_type',...
'BDeu','dirichlet_weight',10);
(Sequential)Bayesianparameterupdatingfromcompletedata
Ifweuseconjugatepriorsandhavefullyobserveddata,wecancomputetheposteriorovertheparametersinbatchformasfollows.
cases=sample_bnet(bnet,nsamples);
bnet=bayes_update_params(bnet,cases);
LL=log_marg_lik_complete(bnet,cases);
bnet.CPD{i}.priorcontainsthenewDirichletpseudocounts,andbnet.CPD{i}.CPTissettothemeanoftheposterior(thenormalized
counts).(Henceiftheinitialpseudocountsare0,bayes_update_paramsandlearn_paramswillgivethesameresult.)
Wecancomputethesameresultsequentially(online)asfollows.
LL=0;
form=1:nsamples
LL=LL+log_marg_lik_complete(bnet,cases(:,m));
bnet=bayes_update_params(bnet,cases(:,m));
end
ThefileBNT/examples/static/StructLearn/model_select1hasanexampleofsequentialmodelselectionwhichusesthesameidea.We
generatedatafromthemodelA>Bandcomputetheposteriorprobofall3dagson2nodes:(1)AB,(2)A<B,(3)A>BModels2
and3areMarkovequivalent,andthereforeindistinguishablefromobservationaldataalone,soweexpecttheirposteriorstobethe
same(assumingapriorwhichsatisfieslikelihoodequivalence).Ifweuserandomparameters,the"true"modelonlygetsahigher
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics
21/32
9/21/2016
HowtousetheBayesNetToolbox
posteriorafter2000trials!However,ifwemakeBanoisyNOTgate,thetruemodel"wins"after12trials,asshownbelow(red=
model1,blue/green(superimposed)representsmodels2/3).
Theuseofmarginallikelihoodformodelselectionisdiscussedingreaterdetailinthesectiononstructurelearning.
Maximumlikelihoodparameterestimationwithmissingvalues
Nowweconsiderlearningwhensomevaluesarenotobserved.Letusrandomlyhidehalfthevaluesgeneratedfromthewatersprinkler
example.
samples2=samples;
hide=rand(N,nsamples)>0.5;
[I,J]=find(hide);
fork=1:length(I)
samples2{I(k),J(k)}=[];
end
samples2{i,l}isthevalueofnodeiintrainingcasel,or[]ifunobserved.
NowwewillcomputetheMLEsusingtheEMalgorithm.Weneedtouseaninferencealgorithmtocomputetheexpectedsufficient
statisticsintheEsteptheM(maximization)stepisasabove.
engine2=jtree_inf_engine(bnet2);
max_iter=10;
[bnet4,LLtrace]=learn_params_em(engine2,samples2,max_iter);
LLtrace(i)istheloglikelihoodatiterationi.Wecanplotthisasfollows:
plot(LLtrace,'x')
Let'sdisplaytheresultsafter10iterationsofEM.
celldisp(CPT4)
CPT4{1}=
0.6616
0.3384
CPT4{2}=
0.65100.3490
0.87510.1249
CPT4{3}=
0.83660.1634
0.01970.9803
CPT4{4}=
(:,:,1)=
0.82760.0546
0.54520.1658
(:,:,2)=
0.17240.9454
0.45480.8342
Wecangetimprovedperformancebyusingoneormoreofthefollowingmethods:
Increasingthesizeofthetrainingset.
Decreasingtheamountofhiddendata.
RunningEMforlonger.
Usinginformativepriors.
InitialisingEMfrommultiplestartingpoints.
ClickhereforadiscussionoflearningGaussians,whichcancausenumericalproblems.
ForamorecompleteexampleoflearningwithEM,seethescriptBNT/examples/static/learn1.m.
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics
22/32
9/21/2016
HowtousetheBayesNetToolbox
Parametertying
Innetworkswithrepeatedstructure(e.g.,chainsandgrids),itiscommontoassumethattheparametersarethesameateverynode.
Thisiscalledparametertying,andreducestheamountofdataneededforlearning.
Whenwehavetiedparameters,thereisnolongeraonetoonecorrespondencebetweennodesandCPDs.Rather,eachCPDspecies
theparametersforawholeequivalenceclassofnodes.Itiseasiesttoseethisbyexample.ConsiderthefollowinghiddenMarkov
model(HMM)
WhenHMMsareusedforsemiinfiniteprocesseslikespeechrecognition,weassumethetransitionmatrixP(H(t+1)|H(t))isthesame
foralltthisiscalledatimeinvariantorhomogenousMarkovchain.Hencehiddennodes2,3,...,Tareallinthesameequivalence
class,sayclassHclass.Similarly,theobservationmatrixP(O(t)|H(t))isassumedtobethesameforallt,sotheobservednodesareall
inthesameequivalenceclass,sayclassOclass.Finally,thepriortermP(H(1))isinaclassallbyitself,sayclassH1class.Thisis
illustratedbelow,whereweexplicitlyrepresenttheparametersasrandomvariables(dottednodes).
InBNT,wecannotrepresentparametersasrandomvariables(nodes).Instead,we"hide"theparametersinsideoneCPDforeach
equivalenceclass,andthenspecifythattheotherCPDsshouldsharetheseparameters,asfollows.
hnodes=1:2:2*T;
onodes=2:2:2*T;
H1class=1;Hclass=2;Oclass=3;
eclass=ones(1,N);
eclass(hnodes(2:end))=Hclass;
eclass(hnodes(1))=H1class;
eclass(onodes)=Oclass;
%createdagandnsintheusualway
bnet=mk_bnet(dag,ns,'discrete',dnodes,'equiv_class',eclass);
Finally,wedefinetheparametersforeachequivalenceclass:
bnet.CPD{H1class}=tabular_CPD(bnet,hnodes(1));%prior
bnet.CPD{Hclass}=tabular_CPD(bnet,hnodes(2));%transitionmatrix
ifcts_obs
bnet.CPD{Oclass}=gaussian_CPD(bnet,onodes(1));
else
bnet.CPD{Oclass}=tabular_CPD(bnet,onodes(1));
end
Ingeneral,ifbnet.CPD{e}=xxx_CPD(bnet,j),thenjshouldbeamemberofe'sequivalenceclassthatis,itisnotalwaysthecasethat
e==j.Youcanusebnet.rep_of_eclass(e)toreturntherepresentativeofequivalenceclasse.BNTwilllookuptheparentsofjto
determinethesizeoftheCPTtouse.Itassumesthatthisisthesameforallmembersoftheequivalenceclass.Clickhereforamore
complexexampleofparametertying.
Note:NormallyonewoulddefineanHMMasaDynamicBayesNet(seethefunctionBNT/examples/dynamic/mk_chmm.m).
However,onecandefineanHMMasastaticBNusingthefunctionBNT/examples/static/Models/mk_hmm_bnet.m.
Structurelearning
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics
23/32
9/21/2016
HowtousetheBayesNetToolbox
Update(9/29/03):PhillipeLeRayisdevelopingsomeadditionalstructurelearningcodeontopofBNT.Clickherefordetails.
Therearetwoverydifferentapproachestostructurelearning:constraintbasedandsearchandscore.Intheconstraintbasedapproach,
westartwithafullyconnectedgraph,andremoveedgesifcertainconditionalindependenciesaremeasuredinthedata.Thishasthe
disadvantagethatrepeatedindependencetestslosestatisticalpower.
Inthemorepopularsearchandscoreapproach,weperformasearchthroughthespaceofpossibleDAGs,andeitherreturnthebest
onefound(apointestimate),orreturnasampleofthemodelsfound(anapproximationtotheBayesianposterior).
ThenumberofDAGsasafunctionofthenumberofnodes,G(n),issuperexponentialinn,andisgivenbythefollowingrecurrence
Thefirstfewvaluesareshownbelow.
n G(n)
1 1
2 3
3 25
4 543
5 29,281
6 3,781,503
7 1.1x10^9
8 7.8x10^11
9 1.2x10^15
10 4.2x10^18
SincethenumberofDAGsissuperexponentialinthenumberofnodes,wecannotexhaustivelysearchthespace,soweeitherusea
localsearchalgorithm(e.g.,greedyhillclimbining,perhapswithmultiplerestarts)oraglobalsearchalgorithm(e.g.,MarkovChain
MonteCarlo).
Ifweknowatotalorderingonthenodes,findingthebeststructureamountstopickingthebestsetofparentsforeachnode
independently.ThisiswhattheK2algorithmdoes.Iftheorderingisunknown,wecansearchoverorderings,whichismoreefficient
thansearchingoverDAGs(KollerandFriedman,2000).
Inadditiontothesearchprocedure,wemustspecifythescoringfunction.Therearetwopopularchoices.TheBayesianscoreintegrates
outtheparameters,i.e.,itisthemarginallikelihoodofthemodel.TheBIC(BayesianInformationCriterion)isdefinedaslog
P(D|theta_hat)0.5*d*log(N),whereDisthedata,theta_hatistheMLestimateoftheparameters,disthenumberofparameters,and
Nisthenumberofdatacases.TheBICmethodhastheadvantageofnotrequiringaprior.
BICcanbederivedasalargesampleapproximationtothemarginallikelihood.(ItisalsoequaltotheMinimumDescriptionLengthof
amodel.)However,inpractice,thesamplesizedoesnotneedtobeverylargefortheapproximationtobegood.Forexample,inthe
figurebelow,weplottheratiobetweenthelogmarginallikelihoodandtheBICscoreagainstdatasetsizeweseethattheratiorapidly
approaches1,especiallyfornoninformativepriors.(ThisplotwasgeneratedbythefileBNT/examples/static/bic1.m.Itusesthewater
sprinklerBNwithBDeuDirichletpriorswithdifferentequivalentsamplesizes.)
Aswithparameterlearning,handlingmissingdata/hiddenvariablesismuchharderthanthefullyobservedcase.Thestructurelearning
routinesinBNTcanthereforebeclassifiedinto4types,analogouslytotheparameterlearningcase.
Fullobs
Partialobs
Point learn_struct_K2
notyetsupported
Bayes
notyetsupported
learn_struct_mcmc
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics
24/32
9/21/2016
HowtousetheBayesNetToolbox
Markovequivalence
IftwoDAGsencodethesameconditionalindependencies,theyarecalledMarkovequivalent.ThesetofallDAGscanbeparitioned
intoMarkovequivalenceclasses.Graphswithinthesameclasscanhavethedirectionofsomeoftheirarcsreversedwithoutchanging
anyoftheCIrelationships.EachclasscanberepresentedbyaPDAG(partiallydirectedacyclicgraph)calledanessentialgraphor
pattern.Thisspecifieswhichedgesmustbeorientedinacertaindirection,andwhichmaybereversed.
Whenlearninggraphstructurefromobservationaldata,thebestonecanhopetodoistoidentifythemodeluptoMarkovequivalence.
Todistinguishamongstgraphswithinthesameequivalenceclass,oneneedsinterventionaldata:seethediscussiononactivelearning
below.
Exhaustivesearch
ThebruteforceapproachtostructurelearningistoenumerateallpossibleDAGs,andscoreeachone.Thisprovidesa"goldstandard"
withwhichtocompareotheralgorithms.Wecandothisasfollows.
dags=mk_all_dags(N);
score=score_dags(data,ns,dags);
wheredata(i,m)isthevalueofnodeiincasem,andns(i)isthesizeofnodei.IftheDAGshavealotoffamiliesincommon,wecan
cachethesufficientstatistics,makingthispotentiallymoreefficientthanscoringtheDAGsoneatatime.(Cachingisnotcurrently
implemented,however.)
Bydefault,weusetheBayesianscoringmetric,andassumeCPDsarerepresentedbytableswithBDeu(1)priors.Wecanoverride
thesedefaultsasfollows.Ifwewanttouseuniformpriors,wecansay
params=cell(1,N);
fori=1:N
params{i}={'prior','unif'};
end
score=score_dags(data,ns,dags,'params',params);
params{i}isacellarray,containingoptionalargumentsthatarepassedtotheconstructorforCPDi.
Nowsupposewewanttousedifferentnodetypes,e.g.,Supposenodes1and2areGaussian,andnodes3and4softmax(boththese
CPDscansupportdiscreteandcontinuousparents,whichisnecessarysinceallothernodeswillbeconsideredasparents).The
BayesianscoringmetriccurrentlyonlyworksfortabularCPDs,sowewilluseBIC:
score=score_dags(data,ns,dags,'discrete',[34],'params',[],
'type',{'gaussian','gaussian','softmax',softmax'},'scoring_fn','bic')
Inpractice,onecan'tenumerateallpossibleDAGsforN>5,butonecanevaluateanyreasonablysizedsetofhypothesesinthisway
(e.g.,nearestneighborsofyourcurrentbestguess).Thinkofthisas"computerassistedmodelrefinement"asopposedtodenovo
learning.
K2
TheK2algorithm(CooperandHerskovits,1992)isagreedysearchalgorithmthatworksasfollows.Initiallyeachnodehasno
parents.Itthenaddsincrementallythatparentwhoseadditionmostincreasesthescoreoftheresultingstructure.Whentheadditionof
nosingleparentcanincreasethescore,itstopsaddingparentstothenode.Sinceweareusingafixedordering,wedonotneedto
checkforcycles,andcanchoosetheparentsforeachnodeindependently.
TheoriginalpaperusedtheBayesianscoringmetricwithtabularCPDsandDirichletpriors.BNTgeneralizesthistoallowanykindof
CPD,andeithertheBayesianscoringmetricorBIC,asintheexampleabove.Inaddition,youcanspecifyanoptionalupperboundon
thenumberofparentsforeachnode.ThefileBNT/examples/static/k2demo1.mgivesanexampleofhowtouseK2.Weusethewater
sprinklernetworkandsample100casesfromitasbefore.Thenweseehowmuchdataittakestorecoverthegeneratingstructure:
order=[CSRW];
max_fan_in=2;
sz=5:5:100;
fori=1:length(sz)
dag2=learn_struct_K2(data(:,1:sz(i)),node_sizes,order,'max_fan_in',max_fan_in);
correct(i)=isequal(dag,dag2);
end
Herearetheresults.
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics
25/32
9/21/2016
HowtousetheBayesNetToolbox
correct=
Columns1through12
000000010111
Columns13through20
11111111
Soweseeittakesaboutsz(10)=50cases.(BICbehavessimilarly,showingthatthepriordoesn'tmattertoomuch.)Ingeneral,we
cannothopetorecoverthe"true"generatingstructure,onlyonethatisinitsMarkovequivalenceclass.
Hillclimbing
Hillclimbingstartsataspecificpointinspace,considersallnearestneighbors,andmovestotheneighborthathasthehighestscoreif
noneighborshavehigherscorethanthecurrentpoint(i.e.,wehavereachedalocalmaximum),thealgorithmstops.Onecanthen
restartinanotherpartofthespace.
Acommondefinitionof"neighbor"isallgraphsthatcanbegeneratedfromthecurrentgraphbyadding,deletingorreversingasingle
arc,subjecttotheacyclicityconstraint.Otherneighborhoodsarepossible:seeOptimalStructureIdentificationwithGreedySearch,
MaxChickering,JMLR2002.
MCMC
WecanuseaMarkovChainMonteCarlo(MCMC)algorithmcalledMetropolisHastings(MH)tosearchthespaceofallDAGs.The
standardproposaldistributionistoconsidermovingtoallnearestneighborsinthesensedefinedabove.
Thefunctioncanbecalledasinthefollowingexample.
[sampled_graphs,accept_ratio]=learn_struct_mcmc(data,ns,'nsamples',100,'burnin',10);
Wecanconvertoursetofsampledgraphstoahistogram(empiricalposterioroveralltheDAGs)thus
all_dags=mk_all_dags(N);
mcmc_post=mcmc_sample_to_hist(sampled_graphs,all_dags);
Toseehowwellthisperforms,letuscomputetheexactposteriorexhaustively.
score=score_dags(data,ns,all_dags);
post=normalise(exp(score));%assuminguniformstructuralprior
Weplottheresultsbelow.(Thedatasetwas100samplesdrawnfromarandom4nodebnetseethefileBNT/examples/static/mcmc1.)
subplot(2,1,1)
bar(post)
subplot(2,1,2)
bar(mcmc_post)
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics
26/32
9/21/2016
HowtousetheBayesNetToolbox
WecanalsoplottheacceptanceratioversusnumberofMCMCsteps,asacrudeconvergencediagnostic.
clf
plot(accept_ratio)
EventhoughthenumberofsamplesneededbyMCMCistheoreticallypolynomial(notexponential)inthedimensionalityofthesearch
space,inpracticeithasbeenfoundthatMCMCdoesnotconvergeinreasonabletimeforgraphswithmorethanabout10nodes.
Activestructurelearning
Aswasmentionedabove,onecanonlylearnaDAGuptoMarkovequivalence,evengiveninfinitedata.Ifoneisinterestedinlearning
thestructureofacausalnetwork,oneneedsinterventionaldata.(By"intervention"wemeanforcinganodetotakeonaspecificvalue,
therebyeffectivelyseveringitsincomingarcs.)
Mostofthescoringfunctionsacceptanoptionalargumentthatspecifieswhetheranodewasobservedtohaveacertainvalue,orwas
forcedtohavethatvalue:wesetclamped(i,m)=1ifnodeiwasforcedintrainingcasem.e.g.,seethefile
BNT/examples/static/cooper_yoo.
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics
27/32
9/21/2016
HowtousetheBayesNetToolbox
Aninterestingquestionistodecidewhichinterventionstoperform(c.f.,designofexperiments).Fordetails,seethefollowingtech
report
ActivelearningofcausalBayesnetstructure,KevinMurphy,March2001.
StructuralEM
ComputingtheBayesianscorewhenthereispartialobservabilityiscomputationallychallenging,becausetheparameterposterior
becomesmultimodal(thehiddennodesinduceamixturedistribution).OnethereforeneedstouseapproximationssuchasBIC.
Unfortunately,searchalgorithmsarestillexpensive,becauseweneedtorunEMateachsteptocomputetheMLE,whichisneededto
computethescoreofeachmodel.AnalternativeapproachistodothelocalsearchstepsinsideoftheMstepofEM,whichismore
efficientsincethedatahasbeen"filledin"thisiscalledthestructuralEMalgorithm(Friedman1997),andprovablyconvergestoa
localmaximumoftheBICscore.
WeiHuhasimplementedSEMfordiscretenodes.Youcandownloadhispackagefromhere.Pleaseaddressallquestionsaboutthis
[email protected]'simplementationofSEM.
Visualizingthegraph
Clickhereformoreinformationongraphvisualization.
Constraintbasedmethods
TheICalgorithm(PearlandVerma,1991),andthefaster,butotherwiseequivalent,PCalgorithm(Spirtes,Glymour,andScheines
1993),computesmanyconditionalindependencetests,andcombinestheseconstraintsintoaPDAGtorepresentthewholeMarkov
equivalenceclass.
IC*/FCIextendIC/PCtohandlelatentvariables:seebelow.(ICstandsforinductivecausationPCstandsforPeterandClark,thefirst
namesofSpirtesandGlymourFCIstandsforfastcausalinference.Whatwe,followingPearl(2000),callIC*wascalledICinthe
originalPearlandVermapaper.)Fordetails,see
Causation,Prediction,andSearch,Spirtes,GlymourandScheines(SGS),2001(2ndedition),MITPress.
Causality:Models,ReasoningandInference,J.Pearl,2000,CambridgeUniversityPress.
ThePCalgorithmtakesasargumentsafunctionf,thenumberofnodesN,themaximumfaninK,andadditionalargumentsAwhich
arepassedtof.Thefunctionf(X,Y,S,A)returns1ifXisconditionallyindependentofYgivenS,and0otherwise.Forexample,
supposewecheatbypassinginaCI"oracle"whichhasaccesstothetrueDAGtheoracletestsfordseparationinthisDAG,i.e.,
f(X,Y,S)callsdsep(X,Y,S,dag).Wecantothisasfollows.
pdag=learn_struct_pdag_pc('dsep',N,max_fan_in,dag);
pdag(i,j)=1ifthereisdefinitelyani>jarc,andpdag(i,j)=1ifthereiseitherani>jorandi<jarc.
Appliedtothesprinklernetwork,thisreturns
pdag=
0110
1001
1001
0000
Soasexpected,weseethattheVstructureattheWnodeisuniquelyidentified,buttheotherarcshaveambiguousorientation.
Wenowgiveanexamplefromp141(1stedn)/p103(2ndend)oftheSGSbook.Thisexampleconcernsthefemaleorgasm.Weare
givenacorrelationmatrixCbetween7measuredfactors(suchassubjectiveexperiencesofcoitalandmasturbatoryexperiences),
derivedfrom281samples,andwanttolearnacausalmodelofthedata.Wewillnotdiscussthemeritsofthistypeofworkhere,but
merelyshowhowtoreproducetheresultsintheSGSbook.Theirprogram,Tetrad,makesuseoftheFisherZtestforconditional
independence,sowedothesame:
max_fan_in=4;
nsamples=281;
alpha=0.05;
pdag=learn_struct_pdag_pc('cond_indep_fisher_z',n,max_fan_in,C,nsamples,alpha);
Inthiscase,theCItestis
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics
28/32
9/21/2016
HowtousetheBayesNetToolbox
f(X,Y,S)=cond_indep_fisher_z(X,Y,S,C,nsamples,alpha)
TheresultsmatchthoseofFig12aofSGSapartfromtwoedgedifferencespresumablythisisduetoroundingerror(althoughitcould
beabug,eitherinBNTorinTetrad).ThisexamplecanbefoundinthefileBNT/examples/static/pc2.m.
TheIC*algorithm(PearlandVerma,1991),andthefasterFCIalgorithm(Spirtes,Glymour,andScheines1993),areliketheIC/PC
algorithm,exceptthattheycandetectthepresenceoflatentvariables.Seethefilelearn_struct_pdag_ic_starwrittenbyTamar
Kushnir.TheoutputisamatrixP,definedasfollows(seePearl(2000),p52fordetails):
%P(i,j)=1ifthereiseitheralatentvariableLsuchthati<L>jORthereisadirectededgefromi>j.
%P(i,j)=2ifthereisamarkeddirectedi*>jedge.
%P(i,j)=P(j,i)=1ifthereisandundirectededgeij
%P(i,j)=P(j,i)=2ifthereisalatentvariableLsuchthati<L>j.
PhilippeLeray'sstructurelearningpackage
PhilippeLerayhaswrittenastructurelearningpackagethatusesBNT.Itcurrently(Juen2003)hasthefollowingfeatures:
PCwithChi2statisticaltest
MWST:MaximumweightedSpanningTree
HillClimbing
GreedySearch
StructuralEM
hist_ic:optimalHistogrambasedonICinformationcriterion
cpdag_to_dag
dag_to_cpdag
...
Inferenceengines
Upuntilnow,wehaveusedthejunctiontreealgorithmforinference.However,sometimesthisistooslow,ornotevenapplicable.In
general,therearemanyinferencealgorithmseachofwhichmakedifferenttradeoffsbetweenspeed,accuracy,complexityand
generality.Furthermore,theremightbemanyimplementationsofthesamealgorithmforinstance,ageneralpurpose,readableversion,
andahighlyoptimized,specializedone.Tocopewiththisvariety,wetreateachinferencealgorithmasanobject,whichwecallan
inferenceengine.
Aninferenceengineisanobjectthatcontainsabnetandsupportsthe'enter_evidence'and'marginal_nodes'methods.Theengine
constructortakesthebnetasargumentandmaydosomemodelspecificprocessing.When'enter_evidence'iscalled,theenginemay
dosomeevidencespecificprocessing.Finally,when'marginal_nodes'iscalled,theenginemaydosomequeryspecificprocessing.
Theamountofworkdonewheneachstageisspecifiedstructure,parameters,evidence,andquerydependsontheengine.Thecost
ofworkdoneearlyinthissequencecanbeamortized.Ontheotherhand,onecanmakebetteroptimizationsifonewaitsuntillaterin
thesequence.Forexample,theparametersmightimplyconditionalindpendenciesthatarenotevidentinthegraphstructure,butcan
neverthelessbeexploitedtheevidenceindicateswhichnodesareobservedandhencecaneffectivelybedisconnectedfromthegraph
andthequerymightindicatethatlargepartsofthenetworkaredseparatedfromthequerynodes.(Sinceitisnottheactualvaluesof
theevidencethatmatters,justwhichnodesareobserved,manyenginesallowyoutospecifywhichnodeswillbeobservedwhenthey
areconstructed,i.e.,beforecalling'enter_evidence'.Someenginescanstillcopeiftheactualpatternofevidenceisdifferent,e.g.,if
thereismissingdata.)
Althoughbeingmaximallylazy(i.e.,onlydoingworkwhenaqueryisissued)mayseemdesirable,thisisnotalwaysthemost
efficient.Forexample,whenlearningusingEM,weneedtocallmarginal_nodesNtimes,whereNisthenumberofnodes.Variable
eliminationwouldenduprepeatingalotofworkeachtimemarginal_nodesiscalled,makingitinefficientforlearning.Thejunction
treealgorithm,bycontrast,usesdynamicprogrammingtoavoidthisredundantcomputationitcalculatesallmarginalsintwopasses
during'enter_evidence',socalling'marginal_nodes'takesconstanttime.
WewilldiscusssomeoftheinferencealgorithmsimplementedinBNTbelow,andfinishwithasummaryofallofthem.
Variableelimination
Thevariableeliminationalgorithm,alsoknownasbucketeliminationorpeeling,isoneofthesimplestinferencealgorithms.Thebasic
ideaisto"pushsumsinsideofproducts"thisisexplainedinmoredetailhere.
Theprincipleofdistributingsumsoverproductscanbegeneralizedgreatlytoapplytoanycommutativesemiring.Thisformsthebasis
ofmanycommonalgorithms,suchasViterbidecodingandtheFastFourierTransform.Fordetails,see
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics
29/32
9/21/2016
HowtousetheBayesNetToolbox
R.McElieceandS.M.Aji,2000.TheGeneralizedDistributiveLaw,IEEETrans.Inform.Theory,vol.46,no.2(March2000),
pp.325343.
F.R.Kschischang,B.J.FreyandH.A.Loeliger,2001.FactorgraphsandthesumproductalgorithmIEEETransactionson
InformationTheory,February,2001.
ChoosinganorderinwhichtosumoutthevariablessoastominimizecomputationalcostisknowntobeNPhard.The
implementationofthisalgorithminvar_elim_inf_enginemakesnoattempttooptimizethisordering(incontrast,say,to
jtree_inf_engine,whichusesagreedysearchproceduretofindagoodordering).
Note:unlikemostalgorithms,var_elimdoesallitscomputationalworkinsideofmarginal_nodes,notinsideofenter_evidence.
Globalinferencemethods
Thesimplestinferencealgorithmofallistoexplicitelyconstructthejointdistributionoverallthenodes,andthentomarginalizeit.
Thisisimplementedinglobal_joint_inf_engine.Sincethesizeofthejointisexponentialinthenumberofdiscrete(hidden)nodes,
thisisnotaverypracticalalgorithm.Itisincludedmerelyforpedagogicalanddebuggingpurposes.
Threespecializedversionsofthisalgorithmhavealsobeenimplemented,correspondingtothecaseswhereallthenodesarediscrete
(D),allareGaussian(G),andsomearediscreteandsomeGaussian(CG).Theyarecalledenumerative_inf_engine,
gaussian_inf_engine,andcond_gauss_inf_enginerespectively.
Note:unlikemostalgorithms,theseglobalinferencealgorithmsdoalltheircomputationalworkinsideofmarginal_nodes,notinsideof
enter_evidence.
Quickscore
ThejunctiontreealgorithmisquiteslowontheQMRnetwork,sincethecliquesaresobig.Onesimpletrickwecanuseistonotice
thathiddenleavesdonotaffecttheposteriorsontheroots,andhencedonotneedtobeincludedinthenetwork.Asecondtrickisto
noticethatthenegativefindingscanbe"absorbed"intotheprior:seethefileBNT/examples/static/mk_minimal_qmr_bnetfordetails.
Amuchmoresignificantspeedupisobtainedbyexploitingspecialpropertiesofthenoisyornode,asdonebythequickscore
algorithm.Fordetails,see
Heckerman,"Atractableinferencealgorithmfordiagnosingmultiplediseases",UAI89.
RishandDechter,"Ontheimpactofcausalindependence",UCItechreport,1998.
ThishasbeenimplementedinBNTasaspecialpurposeinferenceengine,whichcanbecreatedandusedasfollows:
engine=quickscore_inf_engine(inhibit,leak,prior);
engine=enter_evidence(engine,pos,neg);
m=marginal_nodes(engine,i);
Beliefpropagation
Evenusingquickscore,exactinferencetakestimethatisexponentialinthenumberofpositivefindings.Henceforlargenetworkswe
needtoresorttoapproximateinferencetechniques.Seeforexample
T.JaakkolaandM.Jordan,"VariationalprobabilisticinferenceandtheQMRDTnetwork",JAIR10,1999.
K.Murphy,Y.WeissandM.Jordan,"Loopybeliefpropagationforapproximateinference:anempiricalstudy",UAI99.
ThelatterapproximationentailsapplyingPearl'sbeliefpropagationalgorithmtoamodelevenifithasloops(hencethenameloopy
beliefpropagation).Pearl'salgorithm,implementedaspearl_inf_engine,givesexactresultswhenappliedtosinglyconnectedgraphs
(a.k.a.polytrees,sincetheunderlyingundirectedtopologyisatree,butanodemayhavemultipleparents).Toapplythisalgorithmtoa
graphwithloops,usepearl_inf_engine.Thiscanuseacentralizedordistributedmessagepassingprotocol.Youcanuseitasinthe
followingexample.
engine=pearl_inf_engine(bnet,'max_iter',30);
engine=enter_evidence(engine,evidence);
m=marginal_nodes(engine,i);
Wefoundthatthisalgorithmoftenconverges,andwhenitdoes,oftenisveryaccurate,butitdependsontheprecisesettingofthe
parametervaluesofthenetwork.(SeethefileBNT/examples/static/qmr1torepeattheexperimentforyourself.)Understandingwhen
andwhybeliefpropagationconverges/worksisatopicofongoingresearch.
pearl_inf_enginecanexploitspecialstructureinnoisyorandgmuxnodestocomputemessagesefficiently.
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics
30/32
9/21/2016
HowtousetheBayesNetToolbox
belprop_inf_engineislikepearl,butusespotentialstorepresentmessages.Hencethisisslower.
belprop_fg_inf_engineislikebelprop,butisdesignedforfactorgraphs.
Sampling
BNTnow(Mar'02)hastwosampling(MonteCarlo)inferencealgorithms:
likelihood_weighting_inf_enginewhichdoesimportancesamplingandcanhandleanynodetype.
gibbs_sampling_inf_engine,writtenbyBhaskaraMarthi.CurrentlythiscanonlyhandletabularCPDs.Foramuchfasterand
morepowerfulGibbssamplingprogram,seeBUGS.
Note:Togeneratesamplesfromanetwork(whichisnotthesameasinference!),usesample_bnet.
Summaryofinferenceengines
Theinferenceenginesdifferinmanyways.Herearesomeofthemajor"axes":
Worksforalltopologiesormakesrestrictions?
Worksforallnodetypesormakesrestrictions?
Exactorapproximateinference?
Intermsoftopology,mostengineshandleanykindofDAG.belprop_fgdoesapproximateinferenceonfactorgraphs(FG),whichcan
beusedtorepresentdirected,undirected,andmixed(chain)graphs.(Inthefuture,weplantosupportexactinferenceonchaingraphs.)
quickscoreonlyworksonQMRlikemodels.
Intermsofnodetypes:algorithmsthatusepotentialscanhandlediscrete(D),Gaussian(G)orconditionalGaussian(CG)models.
Samplingalgorithmscanessentiallyhandleanykindofnode(distribution).Otheralgorithmsmakemorerestrictiveassumptionsin
exchangeforspeed.
Finally,mostalgorithmsaredesignedtogivetheexactanswer.Thebeliefpropagationalgorithmsareexactifappliedtotrees,andin
someothercases.Samplingisconsideredapproximate,eventhough,inthelimitofaninfinitenumberofsamples,itgivestheexact
answer.
HereisasummaryofthepropertiesofalltheenginesinBNTwhichworkonstaticnetworks.
Name
belprop
belprop_fg
approx D
approx D
DAG
factorgraph
cond_gauss
enumerative
gaussian
exact
exact
exact
DAG
DAG
DAG
gibbs
global_joint
jtree
approx D
exact D,G,CG
exact D,G,CG
CG
D
G
DAG
DAG
DAGb
DAG
DAG
polytree
quickscore
stab_cond_gauss
var_elim
QMR
DAG
DAG
exact
exact
exact
noisyor
CG
D,G,CG
Influencediagrams/decisionmaking
BNTimplementsanexactalgorithmforsolvingLIMIDs(limitedmemoryinfluencediagrams),describedin
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics
31/32
9/21/2016
HowtousetheBayesNetToolbox
S.L.LauritzenandD.Nilsson.RepresentingandsolvingdecisionproblemswithlimitedinformationManagementScience,47,
12381251.September2001.
LIMIDsexplicitelyshowallinformationarcs,ratherthanimplicitelyassumingnoforgetting.Thisallowsthemtomodelforgetful
controllers.
SeetheexamplesinBNT/examples/limidsfordetails.
DBNs,HMMs,Kalmanfiltersandallthat
ClickherefordocumentationabouthowtouseBNTfordynamicalsystemsandsequencedata.
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics
32/32