0% found this document useful (0 votes)
196 views32 pages

How To Use The Bayes Net Toolbox

This document provides information on how to use the Bayes Net Toolbox. It covers creating Bayes networks, specifying graph structures and parameters, loading models from files, performing inference, and learning network structures and parameters from data. Various inference algorithms, CPD types, and example models are also described.

Uploaded by

Koushik Kashyap
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
196 views32 pages

How To Use The Bayes Net Toolbox

This document provides information on how to use the Bayes Net Toolbox. It covers creating Bayes networks, specifying graph structures and parameters, loading models from files, performing inference, and learning network structures and parameters from data. Various inference algorithms, CPD types, and example models are also described.

Uploaded by

Koushik Kashyap
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

9/21/2016

HowtousetheBayesNetToolbox

HowtousetheBayesNetToolbox
Thisdocumentationwaslastupdatedon29October2007.
ClickhereforaFrenchversionofthisdocumentation(lastupdatedin2005).
Installation
CreatingyourfirstBayesnet
Creatingamodelbyhand
Loadingamodelfromafile
CreatingamodelusingaGUI
Graphvisualization
Inference
Computingmarginaldistributions
Computingjointdistributions
Soft/virtualevidence
Mostprobableexplanation
ConditionalProbabilityDistributions
Tabular(multinomial)nodes
Noisyornodes
Other(noisy)deterministicnodes
Softmax(multinomiallogit)nodes
Neuralnetworknodes
Rootnodes
Gaussiannodes
Generalizedlinearmodelnodes
Classification/regressiontreenodes
Othercontinuousdistributions
SummaryofCPDtypes
Examplemodels
Gaussianmixturemodels
PCA,ICA,andallthat
Mixturesofexperts
Hierarchicalmixturesofexperts
QMR
ConditionalGaussianmodels
Otherhybridmodels
Parameterlearning
Loadingdatafromafile
Maximumlikelihoodparameterestimationfromcompletedata
Parameterpriors
(Sequential)Bayesianparameterupdatingfromcompletedata
Maximumlikelihoodparameterestimationwithmissingvalues(EM)
Parametertying
Structurelearning
Exhaustivesearch
K2
Hillclimbing
MCMC
Activelearning
StructuralEM
Visualizingthelearnedgraphstructure
Constraintbasedmethods
Inferenceengines
Junctiontree
Variableelimination
Globalinferencemethods
Quickscore
Beliefpropagation
Sampling(MonteCarlo)
Summaryofinferenceengines
Influencediagrams/decisionmaking
DBNs,HMMs,Kalmanfiltersandallthat
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics

1/32

9/21/2016

HowtousetheBayesNetToolbox

CreatingyourfirstBayesnet
TodefineaBayesnet,youmustspecifythegraphstructureandthentheparameters.Welookateachinturn,usingasimpleexample
(adaptedfromRussellandNorvig,"ArtificialIntelligence:aModernApproach",PrenticeHall,1995,p454).

Graphstructure
Considerthefollowingnetwork.

Tospecifythisdirectedacyclicgraph(dag),wecreateanadjacencymatrix:
N=4;
dag=zeros(N,N);
C=1;S=2;R=3;W=4;
dag(C,[RS])=1;
dag(R,W)=1;
dag(S,W)=1;

Wehavenumberedthenodesasfollows:Cloudy=1,Sprinkler=2,Rain=3,WetGrass=4.Thenodesmustalwaysbenumberedin
topologicalorder,i.e.,ancestorsbeforedescendants.Foramorecomplicatedgraph,thisisalittleinconvenient:wewillseehowto
getaroundthisbelow.
InMatlab6,youcanuselogicalarraysinsteadofdoublearrays,whichare4timessmaller:
dag=false(N,N);
dag(C,[RS])=true;
...

However,somegraphfunctions(egacyclic)donotworkonlogicalarrays!
Youcanvisualizetheresultinggraphstructureusingthemethodsdiscussedbelow.FordetailsonGUIs,clickhere.

CreatingtheBayesnetshell
Inadditiontospecifyingthegraphstructure,wemustspecifythesizeandtypeofeachnode.Ifanodeisdiscrete,itssizeisthenumber
ofpossiblevalueseachnodecantakeonifanodeiscontinuous,itcanbeavector,anditssizeisthelengthofthisvector.Inthiscase,
wewillassumeallnodesarediscreteandbinary.
discrete_nodes=1:N;
node_sizes=2*ones(1,N);

Ifthenodeswerenotbinary,youcouldtypee.g.,
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics

2/32

9/21/2016

HowtousetheBayesNetToolbox

node_sizes=[4235];

meaningthatCloudyhas4possiblevalues,Sprinklerhas2possiblevalues,etc.Notethatthesearecardinalvalues,notordinal,i.e.,
theyarenotorderedinanyway,like'low','medium','high'.
WearenowreadytomaketheBayesnet:
bnet=mk_bnet(dag,node_sizes,'discrete',discrete_nodes);

Bydefault,allnodesareassumedtobediscrete,sowecanalsojustwrite
bnet=mk_bnet(dag,node_sizes);

Youmayalsospecifywhichnodeswillbeobserved.Ifyoudon'tknow,orifthisnotfixedinadvance,justusetheemptylist(the
default).
onodes=[];
bnet=mk_bnet(dag,node_sizes,'discrete',discrete_nodes,'observed',onodes);

Notethatoptionalargumentsarespecifiedusinganame/valuesyntax.ThisiscommonformanyBNTfunctions.Ingeneral,tofindout
moreaboutafunction(e.g.,whichoptionalargumentsittakes),pleaseseeitsdocumentationstringbytyping
helpmk_bnet

SeealsootherusefulMatlabtips.
Itispossibletoassociatenameswithnodes,asfollows:
bnet=mk_bnet(dag,node_sizes,'names',{'cloudy','S','R','W'},'discrete',1:4);

Youcanthenrefertoanodebyitsname:
C=bnet.names('cloudy');%bnet.namesisanassociativearray
bnet.CPD{C}=tabular_CPD(bnet,C,[0.50.5]);

Thisfeatureusesmyownassociativearrayclass.

Parameters
Amodelconsistsofthegraphstructureandtheparameters.TheparametersarerepresentedbyCPDobjects(CPD=Conditional
ProbabilityDistribution),whichdefinetheprobabilitydistributionofanodegivenitsparents.(Wewillusetheterms"node"and
"randomvariable"interchangeably.)ThesimplestkindofCPDisatable(multidimensionalarray),whichissuitablewhenallthe
nodesarediscretevalued.Notethatthediscretevaluesarenotassumedtobeorderedinanywaythatis,theyrepresentcategorical
quantities,likemaleandfemale,ratherthanordinalquantities,likelow,mediumandhigh.(WewilldiscussCPDsinmoredetail
below.)
TabularCPDs,alsocalledCPTs(conditionalprobabilitytables),arestoredasmultidimensionalarrays,wherethedimensionsare
arrangedinthesameorderasthenodes,e.g.,theCPTfornode4(WetGrass)isindexedbySprinkler(2),Rain(3)andthenWetGrass
(4)itself.Hencethechildisalwaysthelastdimension.Ifanodehasnoparents,itsCPTisacolumnvectorrepresentingitsprior.Note
thatinMatlab(unlikeC),arraysareindexedfrom1,andarelayedoutinmemorysuchthatthefirstindextogglesfastest,e.g.,theCPT
fornode4(WetGrass)isasfollows

wherewehaveusedtheconventionthatfalse==1,true==2.WecancreatethisCPTinMatlabasfollows
CPT=zeros(2,2,2);
CPT(1,1,1)=1.0;
CPT(2,1,1)=0.1;
...

Hereisaneasierway:
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics

3/32

9/21/2016

HowtousetheBayesNetToolbox

CPT=reshape([10.10.10.0100.90.90.99],[222]);

Infact,wedon'tneedtoreshapethearray,sincetheCPDconstructorwilldothatforus.Sowecanjustwrite
bnet.CPD{W}=tabular_CPD(bnet,W,'CPT',[10.10.10.0100.90.90.99]);

Theothernodesarecreatedsimilarly(usingtheoldsyntaxforoptionalparameters)
bnet.CPD{C}=tabular_CPD(bnet,C,[0.50.5]);
bnet.CPD{R}=tabular_CPD(bnet,R,[0.80.20.20.8]);
bnet.CPD{S}=tabular_CPD(bnet,S,[0.50.90.50.1]);
bnet.CPD{W}=tabular_CPD(bnet,W,[10.10.10.0100.90.90.99]);

RandomParameters
IfwedonotspecifytheCPT,randomparameterswillbecreated,i.e.,each"row"oftheCPTwillbedrawnfromtheuniform
distribution.Toensurerepeatableresults,use
rand('state',seed);
randn('state',seed);

Tocontrolthedegreeofrandomness(entropy),youcansampleeachrowoftheCPTfromaDirichlet(p,p,...)distribution.Ifp<<1,
thisencourages"deterministic"CPTs(oneentrynear1,therestnear0).Ifp=1,eachentryisdrawnfromU[0,1].Ifp>>1,theentries
willallbenear1/k,wherekisthearityofthisnode,i.e.,eachrowwillbenearlyuniform.Youcandothisasfollows,assumingthis
nodeisnumberi,andnsisthenode_sizes.
k=ns(i);
ps=parents(dag,i);
psz=prod(ns(ps));
CPT=sample_dirichlet(p*ones(1,k),psz);
bnet.CPD{i}=tabular_CPD(bnet,i,'CPT',CPT);

Loadinganetworkfromafile
IfyoualreadyhaveaBayesnetrepresentedintheXMLbasedBayesNetInterchangeFormat(BNIF)(e.g.,downloadedfromthe
BayesNetrepository),youcanconvertittoBNTformatusingtheBIFBNTJavaprogramwrittenbyKenShan.(Thisisnot
necessarilyuptodate.)
Itiscurrentlynotpossibletosave/loadaBNTmatlabobjecttofile,butthisiseasilyfixedifyoumodifyalltheconstructorsforall
theclasses(seematlabdocumentation).

CreatingamodelusingaGUI
SenthilNachimuthuhasstarted(Oct07)anopensourceGUIforBNTcalledprojenyusingJava.ThisisasuccessortoBNJ.
PhilippeLeRayhaswritten(Sep05)aBNTGUIinmatlab.
LinkStrength,apackagebyImmeEbertUphoffforvisualizingthestrengthofdependenciesbetweennodes.

Graphvisualization
Clickhereformoreinformationongraphvisualization.

Inference
HavingcreatedtheBN,wecannowuseitforinference.TherearemanydifferentalgorithmsfordoinginferenceinBayesnets,that
makedifferenttradeoffsbetweenspeed,complexity,generality,andaccuracy.BNTthereforeoffersavarietyofdifferentinference
"engines".Wewilldiscusstheseinmoredetailbelow.Fornow,wewillusethejunctiontreeengine,whichisthemotherofallexact
inferencealgorithms.Thiscanbecreatedasfollows.
engine=jtree_inf_engine(bnet);

Theotherengineshavesimilarconstructors,butmighttakeadditional,algorithmspecificparameters.Allenginesareusedinthesame
way,oncetheyhavebeencreated.Weillustratethisinthefollowingsections.

Computingmarginaldistributions
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics

4/32

9/21/2016

HowtousetheBayesNetToolbox

Supposewewanttocomputetheprobabilitythatthesprinkerwasongiventhatthegrassiswet.Theevidenceconsistsofthefactthat
W=2.Alltheothernodesarehidden(unobserved).Wecanspecifythisasfollows.
evidence=cell(1,N);
evidence{W}=2;

Weusea1Dcellarrayinsteadofavectortocopewiththefactthatnodescanbevectorsofdifferentlengths.Inaddition,thevalue[]
canbeusedtodenote'noevidence',insteadofhavingtospecifytheobservationpatternasaseparateargument.(Clickhereforaquick
tutorialoncellarraysinmatlab.)
Wearenowreadytoaddtheevidencetotheengine.
[engine,loglik]=enter_evidence(engine,evidence);

Thebehaviorofthisfunctionisalgorithmspecific,andisdiscussedinmoredetailbelow.Inthecaseofthejtreeengine,
enter_evidenceimplementsatwopassmessagepassingscheme.Thefirstreturnargumentcontainsthemodifiedengine,which
incorporatestheevidence.Thesecondreturnargumentcontainstheloglikelihoodoftheevidence.(Notallenginesarecapableof
computingtheloglikelihood.)
Finally,wecancomputep=P(S=2|W=2)asfollows.
marg=marginal_nodes(engine,S);
marg.T
ans=
0.57024
0.42976
p=marg.T(2);

Weseethatp=0.4298.
Nowletusaddtheevidencethatitwasraining,andseewhatdifferenceitmakes.
evidence{R}=2;
[engine,loglik]=enter_evidence(engine,evidence);
marg=marginal_nodes(engine,S);
p=marg.T(2);

Wefindthatp=P(S=2|W=2,R=2)=0.1945,whichislowerthanbefore,becausetheraincan``explainaway''thefactthatthegrassis
wet.
Youcanplotamarginaldistributionoveradiscretevariableasabarchartusingthebuilt'bar'function:
bar(marg.T)

Thisiswhatitlookslike

Observednodes
Whathappensifweaskforthemarginalonanobservednode,e.g.P(W|W=2)?Anobserveddiscretenodeeffectivelyonlyhas1value
(theobservedone)allothervalueswouldresultin0probability.Forefficiency,BNTtreatsobserved(discrete)nodesasiftheywere
setto1,asweseebelow:
evidence=cell(1,N);
evidence{W}=2;
engine=enter_evidence(engine,evidence);
m=marginal_nodes(engine,W);
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics

5/32

9/21/2016

HowtousetheBayesNetToolbox

m.T
ans=
1

Thiscangetalittleconfusing,sinceweassignedW=2.SowecanaskBNTtoaddtheevidencebackinbypassinginanoptional
argument:
m=marginal_nodes(engine,W,1);
m.T
ans=
0
1

ThisshowsthatP(W=1|W=2)=0andP(W=2|W=2)=1.

Computingjointdistributions
Wecancomputethejointprobabilityonasetofnodesasinthefollowingexample.
evidence=cell(1,N);
[engine,ll]=enter_evidence(engine,evidence);
m=marginal_nodes(engine,[SRW]);

misastructure.The'T'fieldisamultidimensionalarray(inthiscase,3dimensional)thatcontainsthejointprobabilitydistributionon
thespecifiednodes.
>>m.T
ans(:,:,1)=
0.29000.0410
0.02100.0009
ans(:,:,2)=
00.3690
0.18900.0891

WeseethatP(S=1,R=1,W=2)=0,sinceitisimpossibleforthegrasstobewetifboththerainandsprinklerareoff.
LetusnowaddsomeevidencetoR.
evidence{R}=2;
[engine,ll]=enter_evidence(engine,evidence);
m=marginal_nodes(engine,[SRW])
m=
domain:[234]
T:[2x1x2double]
>>m.T
m.T
ans(:,:,1)=
0.0820
0.0018
ans(:,:,2)=
0.7380
0.1782

ThejointT(i,j,k)=P(S=i,R=j,W=k|evidence)shouldhaveT(i,1,k)=0foralli,k,sinceR=1isincompatiblewiththeevidencethatR=2.
Insteadofcreatinglargetableswithmany0s,BNTsetstheeffectivesizeofobserved(discrete)nodesto1,asexplainedabove.Thisis
whym.Thassize2x1x2.Togeta2x2x2table,type
m=marginal_nodes(engine,[SRW],1)
m=
domain:[234]
T:[2x2x2double]
>>m.T
m.T
ans(:,:,1)=
00.082
00.0018
ans(:,:,2)=
00.738
00.1782

Note:Itisnotalwayspossibletocomputethejointonarbitrarysetsofnodes:itdependsonwhichinferenceengineyouuse,as
discussedinmoredetailbelow.

http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics

6/32

9/21/2016

HowtousetheBayesNetToolbox

Soft/virtualevidence
Sometimesanodeisnotobserved,butwehavesomedistributionoveritspossiblevaluesthisisoftencalled"soft"or"virtual"
evidence.Onecanusethisasfollows
[engine,loglik]=enter_evidence(engine,evidence,'soft',soft_evidence);

wheresoft_evidence{i}iseither[](ifnodeihasnosoftevidence)orisavectorrepresentingtheprobabilitydistributionoveri's
possiblevalues.Forexample,ifwedon'tknowi'sexactvalue,butweknowitslikelihoodratiois60/40,wecanwriteevidence{i}=[]
andsoft_evidence{i}=[0.60.4].
Currentlyonlyjtree_inf_enginesupportsthisoption.Itassumesthatallhiddennodes,andallnodesforwhichwehavesoftevidence,
arediscrete.Foralongerexample,seeBNT/examples/static/softev1.m.

Mostprobableexplanation
Tocomputethemostprobableexplanation(MPE)oftheevidence(i.e.,themostprobableassignment,oramodeofthejoint),use
[mpe,ll]=calc_mpe(engine,evidence);

mpe{i}isthemostlikelyvalueofnodei.Thiscallsenter_evidencewiththe'maximize'flagsetto1,whichcausestheenginetodo
maxproductinsteadofsumproduct.Theresultingmaxmarginalsarethenthresholded.Ifthereismorethanonemaximumprobability
assignment,wemusttakecaretobreaktiesinaconsistentmanner(thresholdingthemaxmarginalsmaygivethewrongresult).To
forcethisbehavior,type
[mpe,ll]=calc_mpe(engine,evidence,1);

NotethatcomputingtheMPEissometiescalledabductivereasoning.
Youcanalsousecalc_mpe_bucketwrittenbyRonZohar,thatdoesaforwardsmaxproductpass,andthenabackwardstracebackpass,
whichishowViterbiistraditionallyimplemented.

ConditionalProbabilityDistributions
AConditionalProbabilityDistributions(CPD)definesP(X(i)|X(Pa(i))),whereX(i)isthei'thnode,andX(Pa(i))aretheparentsof
nodei.Therearemanywaystorepresentthisdistribution,whichdependinpartonwhetherX(i)andX(Pa(i))arediscrete,continuous,
oracombination.Wewilldiscussvariousrepresentationsbelow.

Tabularnodes
IftheCPDisrepresentedasatable(i.e.,ifitisamultinomialdistribution),ithasanumberofparametersthatisexponentialinthe
numberofparents.Seetheexampleabove.

Noisyornodes
AnoisyORnodeislikearegularlogicalORgateexceptthatsometimestheeffectsofparentsthatareongetinhibited.Lettheprob.
thatparentigetsinhibitedbeq(i).Thenanode,C,with2parents,AandB,hasthefollowingCPD,whereweuseFandTtorepresent
offandon(1and2inBNT).
ABP(C=off)P(C=on)

FF1.00.0
TFq(A)1q(A)
FTq(B)1q(B)
TTq(A)q(B)1q(A)q(B)

Thusweseethatthecausesgetinhibitedindependently.Itiscommontoassociatea"leak"nodewithanoisyorCPD,whichislikea
parentthatisalwayson.Thiscanaccountforallotherunmodelledcauseswhichmightturnthenodeon.
Thenoisyordistributionissimilartothelogisticdistribution.Toseethis,letthenodes,S(i),havevaluesin{0,1},andletq(i,j)bethe
prob.thatjinhibitsi.Then
Pr(S(i)=1|parents(S(i)))=1prod_{j}q(i,j)^S(j)

Nowdefinew(i,j)=lnq(i,j)andrho(x)=1exp(x).Then
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics

7/32

9/21/2016

HowtousetheBayesNetToolbox

Pr(S(i)=1|parents(S(i)))=rho(sum_jw(i,j)S(j))

Forasigmoidnode,wehave
Pr(S(i)=1|parents(S(i)))=sigma(sum_jw(i,j)S(j))

wheresigma(x)=1/(1+exp(x)).Hencetheydifferinthechoiceoftheactivationfunction(althoughbotharemonotonically
increasing).Inaddition,inthecaseofanoisyor,theweightsareconstrainedtobepositive,sincetheyderivefromprobabilitiesq(i,j).
Inbothcases,thenumberofparametersislinearinthenumberofparents,unlikethecaseofamultinomialdistribution,wherethe
numberofparametersisexponentialinthenumberofparents.WewillseeanexampleofnoisyORnodesbelow.

Other(noisy)deterministicnodes
DeterministicCPDsfordiscreterandomvariablescanbecreatedusingthedeterministic_CPDclass.Itisalsopossibleto'flip'the
outputofthefunctionwithsomeprobability,tosimulatenoise.Theboolean_CPDclassisjustaspecialcaseofadeterministicCPD,
wheretheparentsandchildareallbinary.
Bothoftheseclassesarejust"syntacticsugar"forthetabular_CPDclass.

Softmaxnodes
Ifwehaveadiscretenodewithacontinuousparent,wecandefineitsCPDusingasoftmaxfunction(alsoknownasthemultinomial
logitfunction).Thisactslikeasoftthresholdingoperator,andisdefinedasfollows:
exp(w(:,i)'*x+b(i))
Pr(Q=i|X=x)=
sum_jexp(w(:,j)'*x+b(j))

Theparametersofasoftmaxnode,w(:,i)andb(i),i=1..|Q|,havethefollowinginterpretation:w(:,i)w(:,j)isthenormalvectortothe
decisionboundarybetweenclassesiandj,andb(i)b(j)isitsoffset(bias).Forexample,supposeXisa2vector,andQisbinary.Then
w=[11;
00];
b=[00];

meansclass1arepointsinthe2Dplanewithpositivexcoordinate,andclass2arepointsinthe2Dplanewithnegativexcoordinate.If
whaslargemagnitude,thedecisionboundaryissharp,otherwiseitissoft.InthespecialcasethatQisbinary(0/1),thesoftmax
functionreducestothelogistic(sigmoid)function.
Fittingasoftmaxfunctioncanbedoneusingtheiterativelyreweightedleastsquares(IRLS)algorithm.Weusetheimplementation
fromNetlab.Notethatsincethesoftmaxdistributionisnotintheexponentialfamily,itdoesnothavefinitesufficientstatistics,and
hencewemuststoreallthetrainingdatainuncompressedform.Ifthistakestoomuchspace,oneshoulduseonline(stochastic)
gradientdescent(notimplementedinBNT).
Ifasoftmaxnodealsohasdiscreteparents,weuseadifferentsetofw/bparametersforeachcombinationofparentvalues,asinthe
conditionallinearGaussianCPD.ThisfeaturewasimplementedbyPierpaoloBrutti.Heiscurrentlyextendingitsothatdiscrete
parentscanbetreatedasiftheywerecontinuous,byaddingindicatorvariablestotheXvector.
Wewillseeanexampleofsoftmaxnodesbelow.

Neuralnetworknodes
PierpaoloBruttihasimplementedthemlp_CPDclass,whichusesamultilayerperceptrontoimplementamappingfromcontinuous
parentstodiscretechildren,similartothesoftmaxfunction.(Iftherearealsodiscreteparents,itcreatesamixtureofMLPs.)Ituses
codefromNetlab.Thisisworkinprogress.

Rootnodes
Arootnodehasnoparentsandnoparametersitcanbeusedtomodelanobserved,exogeneousinputvariable,i.e.,onewhichis
"outside"themodel.Thisisusefulforconditionaldensitymodels.Wewillseeanexampleofrootnodesbelow.

Gaussiannodes
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics

8/32

9/21/2016

HowtousetheBayesNetToolbox

Wenowconsideradistributionsuitableforthecontinuousvaluednodes.SupposethenodeiscalledY,itscontinuousparents(ifany)
arecalledX,anditsdiscreteparents(ifany)arecalledQ.ThedistributiononYisdefinedasfollows:
noparents:Y~N(mu,Sigma)
ctsparents:Y|X=x~N(mu+Wx,Sigma)
discreteparents:Y|Q=i~N(mu(:,i),Sigma(:,:,i))
ctsanddiscreteparents:Y|X=x,Q=i~N(mu(:,i)+W(:,:,i)*x,Sigma(:,:,i))

whereN(mu,Sigma)denotesaNormaldistributionwithmeanmuandcovarianceSigma.Let|X|,|Y|and|Q|denotethesizesofX,Y
andQrespectively.Iftherearenodiscreteparents,|Q|=1ifthereismorethanone,then|Q|=avectorofthesizesofeachdiscrete
parent.Iftherearenocontinuousparents,|X|=0ifthereismorethanone,then|X|=thesumoftheirsizes.Thenmuisa|Y|*|Q|vector,
Sigmaisa|Y|*|Y|*|Q|positivesemidefinitematrix,andWisa|Y|*|X|*|Q|regression(weight)matrix.
WecancreateaGaussiannodewithrandomparametersasfollows.
bnet.CPD{i}=gaussian_CPD(bnet,i);

Wecanspecifythevalueofoneormoreoftheparametersasinthefollowingexample,inwhich|Y|=2,and|Q|=1.
bnet.CPD{i}=gaussian_CPD(bnet,i,'mean',[0;0],'weights',randn(Y,X),'cov',eye(Y));

WewillseeanexampleofconditionallinearGaussiannodesbelow.
WhenlearningGaussiansfromdata,itishelpfultoensurethedatahasasmallmagnitde(seee.g.,KPMstats/standardize)toprevent
numericalproblems.Unlessyouhavealotofdata,itisalsoaverygoodideatousediagonalinsteadoffullcovariancematrices.(BNT
doesnotcurrentlysupportsphericalcovariances,althoughitwouldbeeasytoadd,sinceKPMstats/clg_Mstepsupportsthisoptionyou
wouldjustneedtomodifygaussian_CPD/update_esstoaccumulateweightedinnerproducts.)

Othercontinuousdistributions
CurrentlyBNTdoesnotsupportanyCPDsforcontinuousnodesotherthantheGaussian.However,youcanuseamixtureof
Gaussianstoapproximateothercontinuousdistributions.WewillseesomeanexampleofthiswiththeIFAmodelbelow.

Generalizedlinearmodelnodes
Inthefuture,wemayincorporatesomeofthefunctionalityofglmlabintoBNT.

Classification/regressiontreenodes
WeplantoaddclassificationandregressiontreestodefineCPDsfordiscreteandcontinuousnodes,respectively.Treeshavemany
advantages:theyareeasytointerpret,theycandofeatureselection,theycanhandlediscreteandcontinuousinputs,theydonotmake
strongassumptionsabouttheformofthedistribution,thenumberofparameterscangrowinadatadependentway(i.e.,theyaresemi
parametric),theycanhandlemissingdata,etc.However,theyarenotyetimplemented.

SummaryofCPDtypes
WelistallthedifferenttypesofCPDssupportedbyBNT.ForeachCPD,wespecifyifthechildandparentscanbediscrete(D)or
continuous(C)(Binary(B)nodesareaspecialcase).Wealsospecifywhichmethodseachclasssupports.Ifamethodisinherited,the
nameoftheparentclassismentioned.Ifaparentclasscallsachildmethod,thisismentioned.
TheCPD_to_CPTmethodconvertsaCPDtoatablethisrequiresthatthechildandallparentsarediscrete.TheCPTmightbe
exponentiallybig...convert_to_tableevaluatesaCPDwithevidence,andrepresentsthetheresultingpotentialasanarray.This
requiresthatthechildisdiscrete,andanycontinuousparentsareobserved.convert_to_potevaluatesaCPDwithevidence,and
representstheresultingpotentialasadpot,gpot,cgpotorupot,asrequested.(d=discrete,g=Gaussian,cg=conditionalGaussian,u=
utility).
Whenwesampleanode,alltheparentsareobserved.Whenwecomputethe(log)probabilityofanode,alltheparentsandthechild
areobserved.
Wealsospecifyiftheparametersarelearnable.ForlearningwithEM,werequirethemethodsreset_ess,update_essand
maximize_params.Forlearningfromfullyobserveddata,werequirethemethodlearn_params.Bydefault,allclassesinheritthisfrom
generic_CPD,whichsimplycallsupdate_essNtimes,onceforeachdatacase,followedbymaximize_params,i.e.,itislikeEM,
withouttheEstep.Someclassesimplementabatchformula,whichisquicker.
Bayesianlearningmeanscomputingaposteriorovertheparametersgivenfullyobserveddata.
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics

9/32

9/21/2016

HowtousetheBayesNetToolbox

Pearlmeansweimplementthemethodscompute_piandcompute_lambda_msg,usedbypearl_inf_engine,whichrunsondirected
graphs.belprop_inf_engineonlyneedsconvert_to_pot.HThepearlmethodscanexploitspecialpropertiesoftheCPDsforcomputing
themessagesefficiently,whereasbelpropdoesnot.
Theonlymethodimplementedbygeneric_CPDisadjustable_CPD,whichisnotshown,sinceitisnotveryinteresting.
Name
boolean

Child Parents Comments CPD_to_CPT conv_to_table conv_to_pot


Syntactic
sugarfor
tabular
Syntactic
sugarfor
tabular
Virtual
class

sample

prob

learn Bayes Pearl

deterministic D

Discrete

C/D

Gaussian
gmux

C
C

C/D
C/D

MLP

C/D

noisyor

root

C/D none

noparams N

Inheritsfrom Inheritsfrom Inheritsfrom Inheritsfrom


N
discrete
discrete
discrete
discrete
N
Y
Y
Y
N

softmax

generic

C/D C/D

Virtual
class

Tabular

C/D

Calls
Calls
Calls
Calls
N
CPD_to_CPT conv_to_table conv_to_table conv_to_table

N
N

N
N

N
Y

Inheritsfrom Inheritsfrom Inheritsfrom


Y
discrete
discrete
discrete

Inheritsfrom Inheritsfrom Inheritsfrom Inheritsfrom


Y
discrete
discrete
discrete
discrete

N
multiplexer N
multilayer
N
perceptron

Y
Y
Inheritsfrom
discrete

Y
N
Inheritsfrom
discrete

Y
Y
N
N
Inheritsfrom
Y
discrete

Examplemodels
Gaussianmixturemodels
RichardW.DeVaulhasmadeadetailedtutorialonhowtofitmixturesofGaussiansusingBNT.Availablehere.

PCA,ICA,andallthat
InFigure(a)below,weshowhowFactorAnalysiscanbethoughtofasagraphicalmodel.Here,XhasanN(0,I)prior,andY|X=x~
N(mu+Wx,Psi),wherePsiisdiagonalandWiscalledthe"factorloadingmatrix".SincethenoiseonbothXandYisdiagonal,the
componentsofthesevectorsareuncorrelated,andhencecanberepresentedasindividualscalarnodes,asweshowin(b).(Thisis
usefulifpartsoftheobservationsontheYvectorareoccasionallymissing.)Weusuallytakek=|X|<<|Y|=D,sothemodeltriesto
explainmanyobservationsusingalowdimensionalsubspace.

(a)

(b)

(c)

(d)

WecancreatethismodelinBNTasfollows.
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics

10/32

9/21/2016

HowtousetheBayesNetToolbox

ns=[kD];
dag=zeros(2,2);
dag(1,2)=1;
bnet=mk_bnet(dag,ns,'discrete',[]);
bnet.CPD{1}=gaussian_CPD(bnet,1,'mean',zeros(k,1),'cov',eye(k),...
'cov_type','diag','clamp_mean',1,'clamp_cov',1);
bnet.CPD{2}=gaussian_CPD(bnet,2,'mean',zeros(D,1),'cov',diag(Psi0),'weights',W0,...
'cov_type','diag','clamp_mean',1);

TherootnodeisclampedtotheN(0,I)distribution,sothatwewillnotupdatetheseparametersduringlearning.Themeanoftheleaf
nodeisclampedto0,sinceweassumethedatahasbeencentered(haditsmeansubtractedoff)thisisjustforsimplicity.Finally,the
covarianceoftheleafnodeisconstrainedtobediagonal.W0andPsi0aretheinitialparameterguesses.
Wecanfitthismodel(i.e.,estimateitsparametersinamaximumlikelihood(ML)sense)usingEM,asweexplainbelow.Not
surprisingly,theMLestimatesformuandPsiturnouttobeidenticaltothesamplemeanandvariance,whichcanbecomputeddirectly
as
mu_ML=mean(data);
Psi_ML=diag(cov(data));

NotethatWcanonlybeidentifieduptoarotationmatrix,becauseofthesphericalsymmetryofthesource.
IfwerestrictPsitobespherical,i.e.,Psi=sigma*I,thereisaclosedformsolutionforWaswell,i.e.,wedonotneedtouseEM.In
particular,Wcontainsthefirst|X|eigenvectorsofthesamplecovariancematrix,withscalingsdeterminedbytheeigenvaluesand
sigma.ClassicalPCAcanbeobtainedbytakingthesigma>0limit.Fordetails,see
"EMalgorithmsforPCAandSPCA",SamRoweis,NIPS97.(Matlabsoftware)
"Mixturesofprobabilisticprincipalcomponentanalyzers",TippingandBishop,NeuralComputation11(2):443482,1999.
Byaddingahiddendiscretevariable,wecancreatemixturesofFAmodels,asshownin(c).Nowwecanexplainthedatausingasetof
subspaces.WecancreatethismodelinBNTasfollows.
ns=[MkD];
dag=zeros(3);
dag(1,3)=1;
dag(2,3)=1;
bnet=mk_bnet(dag,ns,'discrete',1);
bnet.CPD{1}=tabular_CPD(bnet,1,Pi0);
bnet.CPD{2}=gaussian_CPD(bnet,2,'mean',zeros(k,1),'cov',eye(k),'cov_type','diag',...

'clamp_mean',1,'clamp_cov',1);
bnet.CPD{3}=gaussian_CPD(bnet,3,'mean',Mu0','cov',repmat(diag(Psi0),[11M]),...

'weights',W0,'cov_type','diag','tied_cov',1);

NoticehowthecovariancematrixforYisthesameforallvaluesofQthatis,thenoiselevelineachsubspaceisassumedthesame.
However,weallowtheoffset,mu,tovary.Fordetails,see
TheEMAlgorithmforMixturesofFactorAnalyzers,Ghahramani,Z.andHinton,G.E.(1996),UniversityofTorontoTechnical
ReportCRGTR961.(Matlabsoftware)
"Mixturesofprobabilisticprincipalcomponentanalyzers",TippingandBishop,NeuralComputation11(2):443482,1999.
IhaveincludedZoubin'sspecializedMFAcode(withhispermission)withthetoolbox,soyoucancheckthatBNTgivesthesame
results:see'BNT/examples/static/mfa1.m'.
IndependentFactorAnalysis(IFA)generalizesFAbyallowinganonGaussianprioroneachcomponentofX.(Notethatwecan
approximateanonGaussianpriorusingamixtureofGaussians.)Thismeansthatthelikelihoodfunctionisnolongerrotationally
invariant,sowecanuniquelyidentifyWandthehiddensourcesX.IFAalsoallowsanondiagonalPsi(i.e.correlationsbetweenthe
componentsofY).WerecoverclassicalIndependentComponentsAnalysis(ICA)inthePsi>0limit,andbyassumingthat|X|=|Y|,so
thattheweightmatrixWissquareandinvertible.Fordetails,see
IndependentFactorAnalysis,H.Attias,NeuralComputation11:803851,1998.

Mixturesofexperts
Asanexampleoftheuseofthesoftmaxfunction,weintroducetheMixtureofExpertsmodel.Asbefore,circlesdenotecontinuous
valuednodes,squaresdenotediscretenodes,clearmeanshidden,andshadedmeansobserved.

http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics

11/32

9/21/2016

HowtousetheBayesNetToolbox

Xistheobservedinput,Yistheoutput,andtheQnodesarehidden"gating"nodes,whichselecttheappropriatesetofparametersfor
Y.Duringtraining,Yisassumedobserved,butfortesting,thegoalistopredictYgivenX.Notethatthisisaconditionaldensity
model,sowedon'tassociateanyparameterswithX.HenceX'sCPDwillbearootCPD,whichisawayofmodellingexogenous
nodes.Iftheoutputisacontinuousvaluedquantity,weassumethe"experts"arelinearregressionunits,andsetY'sCPDtolinear
Gaussian.Iftheoutputisdiscrete,wesetY'sCPDtoasoftmaxfunction.TheQCPDswillalwaysbesoftmaxfunctions.
Asaconcreteexample,considerthemixtureofexpertsmodelwhereXandYarescalars,andQisbinary.Thisisjustpiecewiselinear
regression,wherewehavetwolinesegments,i.e.,

Wecancreatethismodelwithrandomparametersasfollows.(ThiscodeisbundledinBNT/examples/static/mixexp2.m.)
X=1;
Q=2;
Y=3;
dag=zeros(3,3);
dag(X,[QY])=1
dag(Q,Y)=1;
ns=[121];%makeXandYscalars,andhave2experts
onodes=[13];
bnet=mk_bnet(dag,ns,'discrete',2,'observed',onodes);
rand('state',0);
randn('state',0);
bnet.CPD{1}=root_CPD(bnet,1);
bnet.CPD{2}=softmax_CPD(bnet,2);
bnet.CPD{3}=gaussian_CPD(bnet,3);

NowletusfitthismodelusingEM.Firstweloadthedata(1000trainingcases)andplotthem.
data=load('/examples/static/Misc/mixexp_data.txt','ascii');
plot(data(:,1),data(:,2),'.');

http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics

12/32

9/21/2016

HowtousetheBayesNetToolbox

Thisiswhatthemodellookslikebeforetraining.(ThankstoThomasHofmanforwritingthisplottingroutine.)

Nowlet'strainthemodel,andplotthefinalperformance.(Wewilldiscusshowtotrainmodelsinmoredetailbelow.)
ncases=size(data,1);%eachrowofdataisatrainingcase
cases=cell(3,ncases);
cases([13],:)=num2cell(data');%eachcolumnofcasesisatrainingcase
engine=jtree_inf_engine(bnet);
max_iter=20;
[bnet2,LLtrace]=learn_params_em(engine,cases,max_iter);

(Wespecifywhichnodeswillbeobservedwhenwecreatetheengine.HenceBNTknowsthatthehiddennodesarealldiscrete.For
complexmodels,thiscanleadtoasignificantspeedup.)Belowweshowwhatthemodellookslikeafter16iterationsofEM(with100
IRLSiterationsperMstep),whenitconvergedusingthedefaultconvergencetolerance(thatthefractionalchangeintheloglikelihood
belessthan1e3).Beforelearning,theloglikelihoodwas322.927442afterwards,itwas13.728778.

http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics

13/32

9/21/2016

HowtousetheBayesNetToolbox

(SeeBNT/examples/static/mixexp2.mfordetailsofthecode.)

Hierarchicalmixturesofexperts
Ahierarchicalmixtureofexperts(HME)extendsthemixtureofexpertsmodelbyhavingmorethanonehiddennode.Atwolevel
exampleisshownbelow,alongwithitsmoretraditionalrepresentationasaneuralnetwork.Thisislikea(balanced)probabilistic
decisiontreeofheight2.

PierpaoloBruttihaswrittenanextensivesetofroutinesforHMEs,whicharebundledwithBNT:seetheexamples/static/HME
directory.Theseroutinesallowyoutochoosethenumberofhidden(gating)layers,andtheformoftheexperts(softmaxorMLP).See
thefilehmemenu,whichprovidesademo.Forexample,thefigurebelowshowsthedecisionboundarieslearnedforaternary
classificationproblem,usinga2levelHMEwithsoftmaxgatesandsoftmaxexpertsthetrainingsetisontheleft,thetestingsetonthe
right.

http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics

14/32

9/21/2016

HowtousetheBayesNetToolbox

Formoredetails,seethefollowing:
HierarchicalmixturesofexpertsandtheEMalgorithmM.I.JordanandR.A.Jacobs.NeuralComputation,6,181214,1994.
DavidMartin'smatlabcodeforHME
Whythelogisticfunction?Atutorialdiscussiononprobabilitiesandneuralnetworks.M.I.Jordan.MITComputational
CognitiveScienceReport9503,August1995.
"GeneralizedLinearModels",McCullaghandNelder,ChapmanandHalll,1983.
"Improvedlearningalgorithmsformixturesofexpertsinmulticlassclassification".K.Chen,L.Xu,H.Chi.NeuralNetworks
(1999)12:12291252.
ClassificationUsingHierarchicalMixturesofExpertsS.R.WaterhouseandA.J.Robinson.InProc.IEEEWorkshoponNeural
NetworkforSignalProcessingIV(1994),pp.177186
Localizedmixturesofexperts,P.Moerland,1998.
"Nonlineargatedexpertsfortimeseries",A.S.WeigendandM.Mangeas,1995.

QMR
Bayesnetsoriginallyaroseoutofanattempttoaddprobabilitiestoexpertsystems,andthisisstillthemostcommonuseforBNs.A
famousexampleisQMRDT,adecisiontheoreticreformulationoftheQuickMedicalReference(QMR)model.

Here,thetoplayerrepresentshiddendiseasenodes,andthebottomlayerrepresentsobservedsymptomnodes.Thegoalistoinferthe
posteriorprobabilityofeachdiseasegivenallthesymptoms(whichcanbepresent,absentorunknown).Eachnodeinthetoplayerhas
aBernoulliprior(withalowpriorprobabilitythatthediseaseispresent).Sinceeachnodeinthebottomlayerhasahighfanin,weuse
anoisyORparameterizationeachdiseasehasanindependentchanceofcausingeachsymptom.TherealQMRDTmodelis
copyright,butwecancreatearandomQMRlikemodelasfollows.
functionbnet=mk_qmr_bnet(G,inhibit,leak,prior)
%MK_QMR_BNETMakeaQMRmodel
%bnet=mk_qmr_bnet(G,inhibit,leak,prior)
%
%G(i,j)=1iffthereisanarcfromdiseaseitofindingj
%inhibit(i,j)=inhibitionprobabilityoni>jarc
%leak(j)=inhibitionprob.onleak>jarc
%prior(i)=prob.diseaseiison
[NdiseasesNfindings]=size(inhibit);
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics

15/32

9/21/2016

HowtousetheBayesNetToolbox

N=Ndiseases+Nfindings;
finding_node=Ndiseases+1:N;
ns=2*ones(1,N);
dag=zeros(N,N);
dag(1:Ndiseases,finding_node)=G;
bnet=mk_bnet(dag,ns,'observed',finding_node);
ford=1:Ndiseases
CPT=[1prior(d)prior(d)];
bnet.CPD{d}=tabular_CPD(bnet,d,CPT');
end
fori=1:Nfindings
fnode=finding_node(i);
ps=parents(G,i);
bnet.CPD{fnode}=noisyor_CPD(bnet,fnode,leak(i),inhibit(ps,i));
end

InthefileBNT/examples/static/qmr1,wecreatearandombipartitegraphG,with5diseasesand10findings,andrandomparameters.
(Ingeneral,tocreatearandomdag,use'mk_random_dag'.)Wecanvisualizetheresultinggraphstructureusingthemethodsdiscussed
below,withthefollowingresults:

Nowletusputsomerandomevidenceonalltheleavesexcepttheveryfirstandverylast,andcomputethediseaseposteriors.
pos=2:floor(Nfindings/2);
neg=(pos(end)+1):(Nfindings1);
onodes=myunion(pos,neg);
evidence=cell(1,N);
evidence(findings(pos))=num2cell(repmat(2,1,length(pos)));
evidence(findings(neg))=num2cell(repmat(1,1,length(neg)));
engine=jtree_inf_engine(bnet);
[engine,ll]=enter_evidence(engine,evidence);
post=zeros(1,Ndiseases);
fori=diseases(:)'
m=marginal_nodes(engine,i);
post(i)=m.T(2);
end

JunctiontreecanbequiteslowonlargeQMRmodels.Fortunately,itispossibletoexploitpropertiesofthenoisyORfunctiontospeed
upexactinferenceusinganalgorithmcalledquickscore,discussedbelow.

ConditionalGaussianmodels
AconditionalGaussianmodelisoneinwhich,conditionedonallthediscretenodes,thedistributionovertheremaining(continuous)
nodesismultivariateGaussian.Thismeanswecanhavearcsfromdiscrete(D)tocontinuous(C)nodes,butnotviceversa.(Weare
allowedC>Darcsifthecontinuousnodesareobserved,asinthemixtureofexpertsmodel,sincethisdistributioncanberepresented
withadiscretepotential.)
WenowgiveanexampleofaCGmodel,fromthepaper"PropagationofProbabilities,MeansamdVariancesinMixedGraphical
AssociationModels",SteffenLauritzen,JASA87(420):10981108,1992(reprintedinthebook"ProbabilisticNetworksandExpert
Systems",R.G.Cowell,A.P.Dawid,S.L.LauritzenandD.J.Spiegelhalter,Springer,1999.)
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics

16/32

9/21/2016

HowtousetheBayesNetToolbox

Specifyingthegraph
Considerthemodelofwasteemissionsfromanincineratorplantshownbelow.Wefollowthestandardconventionthatshadednodes
areobserved,clearnodesarehidden.Wealsousethenonstandardconventionthatsquarenodesarediscrete(tabular)androundnodes
areGaussian.

Wecancreatethismodelasfollows.
F=1;W=2;E=3;B=4;C=5;D=6;Min=7;Mout=8;L=9;
n=9;
dag=zeros(n);
dag(F,E)=1;
dag(W,[EMinD])=1;
dag(E,D)=1;
dag(B,[CD])=1;
dag(D,[LMout])=1;
dag(Min,Mout)=1;
%nodesizesallctsnodesarescalar,alldiscretenodesarebinary
ns=ones(1,n);
dnodes=[FWB];
cnodes=mysetdiff(1:n,dnodes);
ns(dnodes)=2;
bnet=mk_bnet(dag,ns,'discrete',dnodes);

'dnodes'isalistofthediscretenodes'cnodes'isthecontinuousnodes.'mysetdiff'isafasterversionofthebuiltin'setdiff'.

Specifyingtheparameters
Theparametersofthediscretenodescanbespecifiedasfollows.
bnet.CPD{B}=tabular_CPD(bnet,B,'CPT',[0.850.15]);%1=stable,2=unstable
bnet.CPD{F}=tabular_CPD(bnet,F,'CPT',[0.950.05]);%1=intact,2=defect
bnet.CPD{W}=tabular_CPD(bnet,W,'CPT',[2/75/7]);%1=industrial,2=household

Theparametersofthecontinuousnodescanbespecifiedasfollows.
bnet.CPD{E}=gaussian_CPD(bnet,E,'mean',[3.90.43.20.5],...

'cov',[0.000020.00010.000020.0001]);
bnet.CPD{D}=gaussian_CPD(bnet,D,'mean',[6.56.07.57.0],...

'cov',[0.030.040.10.1],'weights',[1111]);
bnet.CPD{C}=gaussian_CPD(bnet,C,'mean',[21],'cov',[0.10.3]);
bnet.CPD{L}=gaussian_CPD(bnet,L,'mean',3,'cov',0.25,'weights',0.5);
bnet.CPD{Min}=gaussian_CPD(bnet,Min,'mean',[0.50.5],'cov',[0.010.005]);
bnet.CPD{Mout}=gaussian_CPD(bnet,Mout,'mean',0,'cov',0.002,'weights',[11]);

Inference
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics

17/32

9/21/2016

HowtousetheBayesNetToolbox

Firstwecomputetheunconditionalmarginals.
engine=jtree_inf_engine(bnet);
evidence=cell(1,n);
[engine,ll]=enter_evidence(engine,evidence);
marg=marginal_nodes(engine,E);

'marg'isastructurethatcontainsthefields'mu'and'Sigma',whichcontainthemeanand(co)varianceofthemarginalonE.Inthis
case,theyarebothscalars.Letuschecktheymatchthepublishedfigures(to2decimalplaces).
tol=1e2;
assert(approxeq(marg.mu,3.25,tol));
assert(approxeq(sqrt(marg.Sigma),0.709,tol));

Wecancomputetheotherposteriorssimilarly.Nowletusaddsomeevidence.
evidence=cell(1,n);
evidence{W}=1;%industrial
evidence{L}=1.1;
evidence{C}=0.9;
[engine,ll]=enter_evidence(engine,evidence);

Nowwefind
marg=marginal_nodes(engine,E);
assert(approxeq(marg.mu,3.8983,tol));
assert(approxeq(sqrt(marg.Sigma),0.0763,tol));

Wecanalsocomputethejointprobabilityonasetofnodes.Forexample,P(D,Mout|evidence)isa2DGaussian:
marg=marginal_nodes(engine,[DMout])
marg=
domain:[68]
mu:[2x1double]
Sigma:[2x2double]
T:1.0000

Themeanis
marg.mu
ans=
3.6077
4.1077

andthecovariancematrixis
marg.Sigma
ans=
0.10620.1062
0.10620.1182

ItiseasytovisualizethisposteriorusingstandardMatlabplottingfunctions,e.g.,
gaussplot2d(marg.mu,marg.Sigma);

producesthefollowingpicture.

TheTfieldindicatesthatthemixingweightofthisGaussiancomponentis1.0.Ifthejointcontainsdiscreteandcontinuousvariables,
theresultwillbeamixtureofGaussians,e.g.,
marg=marginal_nodes(engine,[FE])
domain:[13]
mu:[3.90000.4003]
Sigma:[1x1x2double]
T:[0.99954.7373e04]

TheinterpretationisSigma(i,j,k)=Cov[E(i)E(j)|F=k].Inthiscase,Eisascalar,soi=j=1kspecifiesthemixturecomponent.
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics

18/32

9/21/2016

HowtousetheBayesNetToolbox

WesawinthesprinklernetworkthatBNTsetstheeffectivesizeofobserveddiscretenodesto1,sincetheyonlyhaveonelegalvalue.
Forcontinuousnodes,BNTsetstheirlengthto0,sincetheyhavebeenreducedtoapoint.Forexample,
marg=marginal_nodes(engine,[BC])
domain:[45]
mu:[]
Sigma:[]
T:[0.01230.9877]

Itissimpletopostprocesstheoutputofmarginal_nodes.Forexample,thefileBNT/examples/static/cg1setsthemutermofobserved
nodestotheirobservedvalue,andtheSigmatermto0(sinceobservednodeshavenovariance).
NotethattheimplementedversionofthejunctiontreeisnumericallyunstablewhenusingCGpotentials(whichiswhy,intheexample
above,weonlyrequiredouranswerstoagreewiththepublishedonesto2dp.)Thisiswhyyoumightwanttouse
stab_cond_gauss_inf_engine,implementedbyShanHuang.Thisisdescribedin
"StableLocalComputationwithConditionalGaussianDistributions",S.LauritzenandF.Jensen,TechReportR992014,Dept.
Math.Sciences,AllborgUniv.,1999.
However,eventhenumericallystableversioncanbecomputationallyintractableiftherearemanyhiddendiscretenodes,becausethe
numberofmixturecomponentsgrowsexponentiallye.g.,inaswitchinglineardynamicalsystem.Ingeneral,onemustresortto
approximateinferencetechniques:seethediscussiononinferenceenginesbelow.

Otherhybridmodels
WhenwehaveC>Darcs,whereCishidden,weneedtouseapproximateinference.Oneapproach(notimplementedinBNT)is
describedin
AVariationalApproximationforBayesianNetworkswithDiscreteandContinuousLatentVariables,K.Murphy,UAI99.
Ofcourse,onecanalwaysusesamplingmethodsforapproximateinferenceinsuchmodels.

ParameterLearning
TheparameterestimationroutinesinBNTcanbeclassifiedinto4types,dependingonwhetherthegoalistocomputeafull(Bayesian)
posteriorovertheparametersorjustapointestimate(e.g.,MaximumLikelihoodorMaximumAPosteriori),andwhetherallthe
variablesarefullyobservedorthereismissingdata/hiddenvariables(partialobservability).
Fullobs

Partialobs

Point
Bayes bayes_update_params notyetsupported
learn_params

learn_params_em

Loadingdatafromafile
ToloadnumericdatafromanASCIItextfilecalled'dat.txt',whereeachrowisacaseandcolumnsareseparatedbywhitespace,such
as
0119791626.50.0
0219791367.00.0
...

youcanuse
data=load('dat.txt');

or
loaddat.txtascii

Inthelattercase,thedataisstoredinavariablecalled'dat'(thefilenameminustheextension).Alternatively,supposethedataisstored
ina.csvfile(hascommasseparatingthecolumns,andcontainsaheaderline),suchas
headerinfogoeshere
ORD,011979,1626.5,0.0
DSM,021979,1367.0,0.0
...
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics

19/32

9/21/2016

HowtousetheBayesNetToolbox

Youcanloadthisusing
[a,b,c,d]=textread('dat.txt','%s%d%f%f','delimiter',',','headerlines',1);

Ifyourfileisnotineitheroftheseformats,youcaneitherusePerltoconvertittothisformat,orusetheMatlabscanfcommand.Type
helpiofunformoreinformationonMatlab'sfilefunctions.
BNTlearningroutinesrequiredatatobestoredinacellarray.data{i,m}isthevalueofnodeiincase(example)m,i.e.,eachcolumnis
acase.Ifnodeiisnotobservedincasem(missingvalue),setdata{i,m}=[].(Notallthelearningroutinescancopewithsuchmissing
values,however.)Inthespecialcasethatallthenodesareobservedandarescalarvalued(asopposedtovectorvalued),thedatacanbe
storedinamatrix(asopposedtoacellarray).
Suppose,asinthemixtureofexpertsexample,thatwehave3nodesinthegraph:X(1)istheobservedinput,X(3)istheobserved
output,andX(2)isahidden(gating)node.Wecancreatethedatasetasfollows.
data=load('dat.txt');
ncases=size(data,1);
cases=cell(3,ncases);
cases([13],:)=num2cell(data');

Noticehowwetransposedthedata,toconvertrowsintocolumns.Also,cases{2,m}=[]forallm,sinceX(2)isalwayshidden.

Maximumlikelihoodparameterestimationfromcompletedata
Asanexample,let'sgeneratesomedatafromthesprinklernetwork,randomizetheparameters,andthentrytorecovertheoriginal
model.Firstwecreatesometrainingdatausingforwardssampling.
samples=cell(N,nsamples);
fori=1:nsamples
samples(:,i)=sample_bnet(bnet);
end

samples{j,i}containsthevalueofthej'thnodeincasei.sample_bnetreturnsacellarraybecause,ingeneral,eachnodemightbea
vectorofdifferentlength.Inthiscase,allnodesarediscrete(andhencescalars),sowecouldhaveusedaregulararrayinstead(which
canbequicker):
data=cell2num(samples);

Nowwecreateanetworkwithrandomparameters.(Theinitialvaluesofbnet2don'tmatterinthiscase,sincewecanfindtheglobally
optimalMLEindependentofwherewestart.)
%Makeatabularasa
bnet2=mk_bnet(dag,node_sizes);
seed=0;
rand('state',seed);
bnet2.CPD{C}=tabular_CPD(bnet2,C);
bnet2.CPD{R}=tabular_CPD(bnet2,R);
bnet2.CPD{S}=tabular_CPD(bnet2,S);
bnet2.CPD{W}=tabular_CPD(bnet2,W);

Finally,wefindthemaximumlikelihoodestimatesoftheparameters.
bnet3=learn_params(bnet2,samples);

Toviewthelearnedparameters,weusealittleMatlabhackery.
CPT3=cell(1,N);
fori=1:N
s=struct(bnet3.CPD{i});%violateobjectprivacy
CPT3{i}=s.CPT;
end

Herearetheparameterslearnedfornode4.
dispcpt(CPT3{4})
11:1.00000.0000
21:0.20000.8000
12:0.22730.7727
22:0.00001.0000

Soweseethatthelearnedparametersarefairlyclosetothe"true"ones,whichwedisplaybelow.
dispcpt(CPT{4})
11:1.00000.0000
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics

20/32

9/21/2016

HowtousetheBayesNetToolbox

21:0.10000.9000
12:0.10000.9000
22:0.01000.9900

Wecangetbetterresultsbyusingalargertrainingset,orusinginformativepriors(seebelow).

Parameterpriors
Currently,onlytabularCPDscanhavepriorsontheirparameters.TheconjugatepriorforamultinomialistheDirichlet.(Forbinary
randomvariables,themultinomialisthesameastheBernoulli,andtheDirichletisthesameastheBeta.)
TheDirichlethasasimpleinterpretationintermsofpseudocounts.IfweletN_ijk=thenum.timesX_i=kandPa_i=joccursinthe
trainingset,wherePa_iaretheparentsofX_i,thenthemaximumlikelihood(ML)estimateisT_ijk=N_ijk/N_ij(whereN_ij=
sum_k'N_ijk'),whichwillbe0ifN_ijk=0.Topreventusfromdeclaringthat(X_i=k,Pa_i=j)isimpossiblejustbecausethiseventwas
notseeninthetrainingset,wecanpretendwesawvaluekofX_i,foreachvaluejofPa_isomenumber(alpha_ijk)oftimesinthe
past.TheMAP(maximumaposterior)estimateisthen
T_ijk=(N_ijk+alpha_ijk)/(N_ij+alpha_ij)

andisnever0ifallalpha_ijk>0.Forexample,considerthenetworkA>B,whereAisbinaryandBhas3values.Auniformpriorfor
Bhastheform
B=1B=2B=3
A=1111
A=2111

whichcanbecreatedusing
tabular_CPD(bnet,i,'prior_type','dirichlet','dirichlet_type','unif');

Thispriordoesnotsatisfythelikelihoodequivalenceprinciple,whichsaysthatMarkovequivalentmodelsshouldhavethesame
marginallikelihood.Apriorthatdoessatisfythisprincipleisshownbelow.Heckerman(1995)callsthistheBDeuprior(likelihood
equivalentuniformBayesianDirichlet).
B=1B=2B=3
A=11/61/61/6
A=21/61/61/6

whereweputN/(q*r)ineachbinNistheequivalentsamplesize,r=|A|,q=|B|.Thiscanbecreatedasfollows
tabular_CPD(bnet,i,'prior_type','dirichlet','dirichlet_type','BDeu');

Here,1istheequivalentsamplesize,andisthestrengthoftheprior.Youcanchangethisusing
tabular_CPD(bnet,i,'prior_type','dirichlet','dirichlet_type',...
'BDeu','dirichlet_weight',10);

(Sequential)Bayesianparameterupdatingfromcompletedata
Ifweuseconjugatepriorsandhavefullyobserveddata,wecancomputetheposteriorovertheparametersinbatchformasfollows.
cases=sample_bnet(bnet,nsamples);
bnet=bayes_update_params(bnet,cases);
LL=log_marg_lik_complete(bnet,cases);

bnet.CPD{i}.priorcontainsthenewDirichletpseudocounts,andbnet.CPD{i}.CPTissettothemeanoftheposterior(thenormalized
counts).(Henceiftheinitialpseudocountsare0,bayes_update_paramsandlearn_paramswillgivethesameresult.)
Wecancomputethesameresultsequentially(online)asfollows.
LL=0;
form=1:nsamples
LL=LL+log_marg_lik_complete(bnet,cases(:,m));
bnet=bayes_update_params(bnet,cases(:,m));
end

ThefileBNT/examples/static/StructLearn/model_select1hasanexampleofsequentialmodelselectionwhichusesthesameidea.We
generatedatafromthemodelA>Bandcomputetheposteriorprobofall3dagson2nodes:(1)AB,(2)A<B,(3)A>BModels2
and3areMarkovequivalent,andthereforeindistinguishablefromobservationaldataalone,soweexpecttheirposteriorstobethe
same(assumingapriorwhichsatisfieslikelihoodequivalence).Ifweuserandomparameters,the"true"modelonlygetsahigher
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics

21/32

9/21/2016

HowtousetheBayesNetToolbox

posteriorafter2000trials!However,ifwemakeBanoisyNOTgate,thetruemodel"wins"after12trials,asshownbelow(red=
model1,blue/green(superimposed)representsmodels2/3).

Theuseofmarginallikelihoodformodelselectionisdiscussedingreaterdetailinthesectiononstructurelearning.

Maximumlikelihoodparameterestimationwithmissingvalues
Nowweconsiderlearningwhensomevaluesarenotobserved.Letusrandomlyhidehalfthevaluesgeneratedfromthewatersprinkler
example.
samples2=samples;
hide=rand(N,nsamples)>0.5;
[I,J]=find(hide);
fork=1:length(I)
samples2{I(k),J(k)}=[];
end

samples2{i,l}isthevalueofnodeiintrainingcasel,or[]ifunobserved.
NowwewillcomputetheMLEsusingtheEMalgorithm.Weneedtouseaninferencealgorithmtocomputetheexpectedsufficient
statisticsintheEsteptheM(maximization)stepisasabove.
engine2=jtree_inf_engine(bnet2);
max_iter=10;
[bnet4,LLtrace]=learn_params_em(engine2,samples2,max_iter);

LLtrace(i)istheloglikelihoodatiterationi.Wecanplotthisasfollows:
plot(LLtrace,'x')

Let'sdisplaytheresultsafter10iterationsofEM.
celldisp(CPT4)
CPT4{1}=
0.6616
0.3384
CPT4{2}=
0.65100.3490
0.87510.1249
CPT4{3}=
0.83660.1634
0.01970.9803
CPT4{4}=
(:,:,1)=
0.82760.0546
0.54520.1658
(:,:,2)=
0.17240.9454
0.45480.8342

Wecangetimprovedperformancebyusingoneormoreofthefollowingmethods:
Increasingthesizeofthetrainingset.
Decreasingtheamountofhiddendata.
RunningEMforlonger.
Usinginformativepriors.
InitialisingEMfrommultiplestartingpoints.
ClickhereforadiscussionoflearningGaussians,whichcancausenumericalproblems.
ForamorecompleteexampleoflearningwithEM,seethescriptBNT/examples/static/learn1.m.
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics

22/32

9/21/2016

HowtousetheBayesNetToolbox

Parametertying
Innetworkswithrepeatedstructure(e.g.,chainsandgrids),itiscommontoassumethattheparametersarethesameateverynode.
Thisiscalledparametertying,andreducestheamountofdataneededforlearning.
Whenwehavetiedparameters,thereisnolongeraonetoonecorrespondencebetweennodesandCPDs.Rather,eachCPDspecies
theparametersforawholeequivalenceclassofnodes.Itiseasiesttoseethisbyexample.ConsiderthefollowinghiddenMarkov
model(HMM)

WhenHMMsareusedforsemiinfiniteprocesseslikespeechrecognition,weassumethetransitionmatrixP(H(t+1)|H(t))isthesame
foralltthisiscalledatimeinvariantorhomogenousMarkovchain.Hencehiddennodes2,3,...,Tareallinthesameequivalence
class,sayclassHclass.Similarly,theobservationmatrixP(O(t)|H(t))isassumedtobethesameforallt,sotheobservednodesareall
inthesameequivalenceclass,sayclassOclass.Finally,thepriortermP(H(1))isinaclassallbyitself,sayclassH1class.Thisis
illustratedbelow,whereweexplicitlyrepresenttheparametersasrandomvariables(dottednodes).

InBNT,wecannotrepresentparametersasrandomvariables(nodes).Instead,we"hide"theparametersinsideoneCPDforeach
equivalenceclass,andthenspecifythattheotherCPDsshouldsharetheseparameters,asfollows.
hnodes=1:2:2*T;
onodes=2:2:2*T;
H1class=1;Hclass=2;Oclass=3;
eclass=ones(1,N);
eclass(hnodes(2:end))=Hclass;
eclass(hnodes(1))=H1class;
eclass(onodes)=Oclass;
%createdagandnsintheusualway
bnet=mk_bnet(dag,ns,'discrete',dnodes,'equiv_class',eclass);

Finally,wedefinetheparametersforeachequivalenceclass:
bnet.CPD{H1class}=tabular_CPD(bnet,hnodes(1));%prior
bnet.CPD{Hclass}=tabular_CPD(bnet,hnodes(2));%transitionmatrix
ifcts_obs
bnet.CPD{Oclass}=gaussian_CPD(bnet,onodes(1));
else
bnet.CPD{Oclass}=tabular_CPD(bnet,onodes(1));
end

Ingeneral,ifbnet.CPD{e}=xxx_CPD(bnet,j),thenjshouldbeamemberofe'sequivalenceclassthatis,itisnotalwaysthecasethat
e==j.Youcanusebnet.rep_of_eclass(e)toreturntherepresentativeofequivalenceclasse.BNTwilllookuptheparentsofjto
determinethesizeoftheCPTtouse.Itassumesthatthisisthesameforallmembersoftheequivalenceclass.Clickhereforamore
complexexampleofparametertying.
Note:NormallyonewoulddefineanHMMasaDynamicBayesNet(seethefunctionBNT/examples/dynamic/mk_chmm.m).
However,onecandefineanHMMasastaticBNusingthefunctionBNT/examples/static/Models/mk_hmm_bnet.m.

Structurelearning
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics

23/32

9/21/2016

HowtousetheBayesNetToolbox

Update(9/29/03):PhillipeLeRayisdevelopingsomeadditionalstructurelearningcodeontopofBNT.Clickherefordetails.
Therearetwoverydifferentapproachestostructurelearning:constraintbasedandsearchandscore.Intheconstraintbasedapproach,
westartwithafullyconnectedgraph,andremoveedgesifcertainconditionalindependenciesaremeasuredinthedata.Thishasthe
disadvantagethatrepeatedindependencetestslosestatisticalpower.
Inthemorepopularsearchandscoreapproach,weperformasearchthroughthespaceofpossibleDAGs,andeitherreturnthebest
onefound(apointestimate),orreturnasampleofthemodelsfound(anapproximationtotheBayesianposterior).
ThenumberofDAGsasafunctionofthenumberofnodes,G(n),issuperexponentialinn,andisgivenbythefollowingrecurrence

Thefirstfewvaluesareshownbelow.
n G(n)
1 1
2 3
3 25
4 543
5 29,281
6 3,781,503
7 1.1x10^9
8 7.8x10^11
9 1.2x10^15
10 4.2x10^18
SincethenumberofDAGsissuperexponentialinthenumberofnodes,wecannotexhaustivelysearchthespace,soweeitherusea
localsearchalgorithm(e.g.,greedyhillclimbining,perhapswithmultiplerestarts)oraglobalsearchalgorithm(e.g.,MarkovChain
MonteCarlo).
Ifweknowatotalorderingonthenodes,findingthebeststructureamountstopickingthebestsetofparentsforeachnode
independently.ThisiswhattheK2algorithmdoes.Iftheorderingisunknown,wecansearchoverorderings,whichismoreefficient
thansearchingoverDAGs(KollerandFriedman,2000).
Inadditiontothesearchprocedure,wemustspecifythescoringfunction.Therearetwopopularchoices.TheBayesianscoreintegrates
outtheparameters,i.e.,itisthemarginallikelihoodofthemodel.TheBIC(BayesianInformationCriterion)isdefinedaslog
P(D|theta_hat)0.5*d*log(N),whereDisthedata,theta_hatistheMLestimateoftheparameters,disthenumberofparameters,and
Nisthenumberofdatacases.TheBICmethodhastheadvantageofnotrequiringaprior.
BICcanbederivedasalargesampleapproximationtothemarginallikelihood.(ItisalsoequaltotheMinimumDescriptionLengthof
amodel.)However,inpractice,thesamplesizedoesnotneedtobeverylargefortheapproximationtobegood.Forexample,inthe
figurebelow,weplottheratiobetweenthelogmarginallikelihoodandtheBICscoreagainstdatasetsizeweseethattheratiorapidly
approaches1,especiallyfornoninformativepriors.(ThisplotwasgeneratedbythefileBNT/examples/static/bic1.m.Itusesthewater
sprinklerBNwithBDeuDirichletpriorswithdifferentequivalentsamplesizes.)

Aswithparameterlearning,handlingmissingdata/hiddenvariablesismuchharderthanthefullyobservedcase.Thestructurelearning
routinesinBNTcanthereforebeclassifiedinto4types,analogouslytotheparameterlearningcase.
Fullobs

Partialobs

Point learn_struct_K2

notyetsupported

Bayes

notyetsupported

learn_struct_mcmc

http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics

24/32

9/21/2016

HowtousetheBayesNetToolbox

Markovequivalence
IftwoDAGsencodethesameconditionalindependencies,theyarecalledMarkovequivalent.ThesetofallDAGscanbeparitioned
intoMarkovequivalenceclasses.Graphswithinthesameclasscanhavethedirectionofsomeoftheirarcsreversedwithoutchanging
anyoftheCIrelationships.EachclasscanberepresentedbyaPDAG(partiallydirectedacyclicgraph)calledanessentialgraphor
pattern.Thisspecifieswhichedgesmustbeorientedinacertaindirection,andwhichmaybereversed.
Whenlearninggraphstructurefromobservationaldata,thebestonecanhopetodoistoidentifythemodeluptoMarkovequivalence.
Todistinguishamongstgraphswithinthesameequivalenceclass,oneneedsinterventionaldata:seethediscussiononactivelearning
below.

Exhaustivesearch
ThebruteforceapproachtostructurelearningistoenumerateallpossibleDAGs,andscoreeachone.Thisprovidesa"goldstandard"
withwhichtocompareotheralgorithms.Wecandothisasfollows.
dags=mk_all_dags(N);
score=score_dags(data,ns,dags);

wheredata(i,m)isthevalueofnodeiincasem,andns(i)isthesizeofnodei.IftheDAGshavealotoffamiliesincommon,wecan
cachethesufficientstatistics,makingthispotentiallymoreefficientthanscoringtheDAGsoneatatime.(Cachingisnotcurrently
implemented,however.)
Bydefault,weusetheBayesianscoringmetric,andassumeCPDsarerepresentedbytableswithBDeu(1)priors.Wecanoverride
thesedefaultsasfollows.Ifwewanttouseuniformpriors,wecansay
params=cell(1,N);
fori=1:N
params{i}={'prior','unif'};
end
score=score_dags(data,ns,dags,'params',params);

params{i}isacellarray,containingoptionalargumentsthatarepassedtotheconstructorforCPDi.
Nowsupposewewanttousedifferentnodetypes,e.g.,Supposenodes1and2areGaussian,andnodes3and4softmax(boththese
CPDscansupportdiscreteandcontinuousparents,whichisnecessarysinceallothernodeswillbeconsideredasparents).The
BayesianscoringmetriccurrentlyonlyworksfortabularCPDs,sowewilluseBIC:
score=score_dags(data,ns,dags,'discrete',[34],'params',[],
'type',{'gaussian','gaussian','softmax',softmax'},'scoring_fn','bic')

Inpractice,onecan'tenumerateallpossibleDAGsforN>5,butonecanevaluateanyreasonablysizedsetofhypothesesinthisway
(e.g.,nearestneighborsofyourcurrentbestguess).Thinkofthisas"computerassistedmodelrefinement"asopposedtodenovo
learning.

K2
TheK2algorithm(CooperandHerskovits,1992)isagreedysearchalgorithmthatworksasfollows.Initiallyeachnodehasno
parents.Itthenaddsincrementallythatparentwhoseadditionmostincreasesthescoreoftheresultingstructure.Whentheadditionof
nosingleparentcanincreasethescore,itstopsaddingparentstothenode.Sinceweareusingafixedordering,wedonotneedto
checkforcycles,andcanchoosetheparentsforeachnodeindependently.
TheoriginalpaperusedtheBayesianscoringmetricwithtabularCPDsandDirichletpriors.BNTgeneralizesthistoallowanykindof
CPD,andeithertheBayesianscoringmetricorBIC,asintheexampleabove.Inaddition,youcanspecifyanoptionalupperboundon
thenumberofparentsforeachnode.ThefileBNT/examples/static/k2demo1.mgivesanexampleofhowtouseK2.Weusethewater
sprinklernetworkandsample100casesfromitasbefore.Thenweseehowmuchdataittakestorecoverthegeneratingstructure:
order=[CSRW];
max_fan_in=2;
sz=5:5:100;
fori=1:length(sz)
dag2=learn_struct_K2(data(:,1:sz(i)),node_sizes,order,'max_fan_in',max_fan_in);
correct(i)=isequal(dag,dag2);
end

Herearetheresults.
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics

25/32

9/21/2016

HowtousetheBayesNetToolbox

correct=
Columns1through12
000000010111
Columns13through20
11111111

Soweseeittakesaboutsz(10)=50cases.(BICbehavessimilarly,showingthatthepriordoesn'tmattertoomuch.)Ingeneral,we
cannothopetorecoverthe"true"generatingstructure,onlyonethatisinitsMarkovequivalenceclass.

Hillclimbing
Hillclimbingstartsataspecificpointinspace,considersallnearestneighbors,andmovestotheneighborthathasthehighestscoreif
noneighborshavehigherscorethanthecurrentpoint(i.e.,wehavereachedalocalmaximum),thealgorithmstops.Onecanthen
restartinanotherpartofthespace.
Acommondefinitionof"neighbor"isallgraphsthatcanbegeneratedfromthecurrentgraphbyadding,deletingorreversingasingle
arc,subjecttotheacyclicityconstraint.Otherneighborhoodsarepossible:seeOptimalStructureIdentificationwithGreedySearch,
MaxChickering,JMLR2002.

MCMC
WecanuseaMarkovChainMonteCarlo(MCMC)algorithmcalledMetropolisHastings(MH)tosearchthespaceofallDAGs.The
standardproposaldistributionistoconsidermovingtoallnearestneighborsinthesensedefinedabove.
Thefunctioncanbecalledasinthefollowingexample.
[sampled_graphs,accept_ratio]=learn_struct_mcmc(data,ns,'nsamples',100,'burnin',10);

Wecanconvertoursetofsampledgraphstoahistogram(empiricalposterioroveralltheDAGs)thus
all_dags=mk_all_dags(N);
mcmc_post=mcmc_sample_to_hist(sampled_graphs,all_dags);

Toseehowwellthisperforms,letuscomputetheexactposteriorexhaustively.
score=score_dags(data,ns,all_dags);
post=normalise(exp(score));%assuminguniformstructuralprior

Weplottheresultsbelow.(Thedatasetwas100samplesdrawnfromarandom4nodebnetseethefileBNT/examples/static/mcmc1.)
subplot(2,1,1)
bar(post)
subplot(2,1,2)
bar(mcmc_post)

http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics

26/32

9/21/2016

HowtousetheBayesNetToolbox

WecanalsoplottheacceptanceratioversusnumberofMCMCsteps,asacrudeconvergencediagnostic.
clf
plot(accept_ratio)

EventhoughthenumberofsamplesneededbyMCMCistheoreticallypolynomial(notexponential)inthedimensionalityofthesearch
space,inpracticeithasbeenfoundthatMCMCdoesnotconvergeinreasonabletimeforgraphswithmorethanabout10nodes.

Activestructurelearning
Aswasmentionedabove,onecanonlylearnaDAGuptoMarkovequivalence,evengiveninfinitedata.Ifoneisinterestedinlearning
thestructureofacausalnetwork,oneneedsinterventionaldata.(By"intervention"wemeanforcinganodetotakeonaspecificvalue,
therebyeffectivelyseveringitsincomingarcs.)
Mostofthescoringfunctionsacceptanoptionalargumentthatspecifieswhetheranodewasobservedtohaveacertainvalue,orwas
forcedtohavethatvalue:wesetclamped(i,m)=1ifnodeiwasforcedintrainingcasem.e.g.,seethefile
BNT/examples/static/cooper_yoo.

http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics

27/32

9/21/2016

HowtousetheBayesNetToolbox

Aninterestingquestionistodecidewhichinterventionstoperform(c.f.,designofexperiments).Fordetails,seethefollowingtech
report
ActivelearningofcausalBayesnetstructure,KevinMurphy,March2001.

StructuralEM
ComputingtheBayesianscorewhenthereispartialobservabilityiscomputationallychallenging,becausetheparameterposterior
becomesmultimodal(thehiddennodesinduceamixturedistribution).OnethereforeneedstouseapproximationssuchasBIC.
Unfortunately,searchalgorithmsarestillexpensive,becauseweneedtorunEMateachsteptocomputetheMLE,whichisneededto
computethescoreofeachmodel.AnalternativeapproachistodothelocalsearchstepsinsideoftheMstepofEM,whichismore
efficientsincethedatahasbeen"filledin"thisiscalledthestructuralEMalgorithm(Friedman1997),andprovablyconvergestoa
localmaximumoftheBICscore.
WeiHuhasimplementedSEMfordiscretenodes.Youcandownloadhispackagefromhere.Pleaseaddressallquestionsaboutthis
[email protected]'simplementationofSEM.

Visualizingthegraph
Clickhereformoreinformationongraphvisualization.

Constraintbasedmethods
TheICalgorithm(PearlandVerma,1991),andthefaster,butotherwiseequivalent,PCalgorithm(Spirtes,Glymour,andScheines
1993),computesmanyconditionalindependencetests,andcombinestheseconstraintsintoaPDAGtorepresentthewholeMarkov
equivalenceclass.
IC*/FCIextendIC/PCtohandlelatentvariables:seebelow.(ICstandsforinductivecausationPCstandsforPeterandClark,thefirst
namesofSpirtesandGlymourFCIstandsforfastcausalinference.Whatwe,followingPearl(2000),callIC*wascalledICinthe
originalPearlandVermapaper.)Fordetails,see
Causation,Prediction,andSearch,Spirtes,GlymourandScheines(SGS),2001(2ndedition),MITPress.
Causality:Models,ReasoningandInference,J.Pearl,2000,CambridgeUniversityPress.
ThePCalgorithmtakesasargumentsafunctionf,thenumberofnodesN,themaximumfaninK,andadditionalargumentsAwhich
arepassedtof.Thefunctionf(X,Y,S,A)returns1ifXisconditionallyindependentofYgivenS,and0otherwise.Forexample,
supposewecheatbypassinginaCI"oracle"whichhasaccesstothetrueDAGtheoracletestsfordseparationinthisDAG,i.e.,
f(X,Y,S)callsdsep(X,Y,S,dag).Wecantothisasfollows.
pdag=learn_struct_pdag_pc('dsep',N,max_fan_in,dag);

pdag(i,j)=1ifthereisdefinitelyani>jarc,andpdag(i,j)=1ifthereiseitherani>jorandi<jarc.
Appliedtothesprinklernetwork,thisreturns
pdag=
0110
1001
1001
0000

Soasexpected,weseethattheVstructureattheWnodeisuniquelyidentified,buttheotherarcshaveambiguousorientation.
Wenowgiveanexamplefromp141(1stedn)/p103(2ndend)oftheSGSbook.Thisexampleconcernsthefemaleorgasm.Weare
givenacorrelationmatrixCbetween7measuredfactors(suchassubjectiveexperiencesofcoitalandmasturbatoryexperiences),
derivedfrom281samples,andwanttolearnacausalmodelofthedata.Wewillnotdiscussthemeritsofthistypeofworkhere,but
merelyshowhowtoreproducetheresultsintheSGSbook.Theirprogram,Tetrad,makesuseoftheFisherZtestforconditional
independence,sowedothesame:
max_fan_in=4;
nsamples=281;
alpha=0.05;
pdag=learn_struct_pdag_pc('cond_indep_fisher_z',n,max_fan_in,C,nsamples,alpha);

Inthiscase,theCItestis
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics

28/32

9/21/2016

HowtousetheBayesNetToolbox

f(X,Y,S)=cond_indep_fisher_z(X,Y,S,C,nsamples,alpha)

TheresultsmatchthoseofFig12aofSGSapartfromtwoedgedifferencespresumablythisisduetoroundingerror(althoughitcould
beabug,eitherinBNTorinTetrad).ThisexamplecanbefoundinthefileBNT/examples/static/pc2.m.
TheIC*algorithm(PearlandVerma,1991),andthefasterFCIalgorithm(Spirtes,Glymour,andScheines1993),areliketheIC/PC
algorithm,exceptthattheycandetectthepresenceoflatentvariables.Seethefilelearn_struct_pdag_ic_starwrittenbyTamar
Kushnir.TheoutputisamatrixP,definedasfollows(seePearl(2000),p52fordetails):
%P(i,j)=1ifthereiseitheralatentvariableLsuchthati<L>jORthereisadirectededgefromi>j.
%P(i,j)=2ifthereisamarkeddirectedi*>jedge.
%P(i,j)=P(j,i)=1ifthereisandundirectededgeij
%P(i,j)=P(j,i)=2ifthereisalatentvariableLsuchthati<L>j.

PhilippeLeray'sstructurelearningpackage
PhilippeLerayhaswrittenastructurelearningpackagethatusesBNT.Itcurrently(Juen2003)hasthefollowingfeatures:
PCwithChi2statisticaltest
MWST:MaximumweightedSpanningTree
HillClimbing
GreedySearch
StructuralEM
hist_ic:optimalHistogrambasedonICinformationcriterion
cpdag_to_dag
dag_to_cpdag
...

Inferenceengines
Upuntilnow,wehaveusedthejunctiontreealgorithmforinference.However,sometimesthisistooslow,ornotevenapplicable.In
general,therearemanyinferencealgorithmseachofwhichmakedifferenttradeoffsbetweenspeed,accuracy,complexityand
generality.Furthermore,theremightbemanyimplementationsofthesamealgorithmforinstance,ageneralpurpose,readableversion,
andahighlyoptimized,specializedone.Tocopewiththisvariety,wetreateachinferencealgorithmasanobject,whichwecallan
inferenceengine.
Aninferenceengineisanobjectthatcontainsabnetandsupportsthe'enter_evidence'and'marginal_nodes'methods.Theengine
constructortakesthebnetasargumentandmaydosomemodelspecificprocessing.When'enter_evidence'iscalled,theenginemay
dosomeevidencespecificprocessing.Finally,when'marginal_nodes'iscalled,theenginemaydosomequeryspecificprocessing.
Theamountofworkdonewheneachstageisspecifiedstructure,parameters,evidence,andquerydependsontheengine.Thecost
ofworkdoneearlyinthissequencecanbeamortized.Ontheotherhand,onecanmakebetteroptimizationsifonewaitsuntillaterin
thesequence.Forexample,theparametersmightimplyconditionalindpendenciesthatarenotevidentinthegraphstructure,butcan
neverthelessbeexploitedtheevidenceindicateswhichnodesareobservedandhencecaneffectivelybedisconnectedfromthegraph
andthequerymightindicatethatlargepartsofthenetworkaredseparatedfromthequerynodes.(Sinceitisnottheactualvaluesof
theevidencethatmatters,justwhichnodesareobserved,manyenginesallowyoutospecifywhichnodeswillbeobservedwhenthey
areconstructed,i.e.,beforecalling'enter_evidence'.Someenginescanstillcopeiftheactualpatternofevidenceisdifferent,e.g.,if
thereismissingdata.)
Althoughbeingmaximallylazy(i.e.,onlydoingworkwhenaqueryisissued)mayseemdesirable,thisisnotalwaysthemost
efficient.Forexample,whenlearningusingEM,weneedtocallmarginal_nodesNtimes,whereNisthenumberofnodes.Variable
eliminationwouldenduprepeatingalotofworkeachtimemarginal_nodesiscalled,makingitinefficientforlearning.Thejunction
treealgorithm,bycontrast,usesdynamicprogrammingtoavoidthisredundantcomputationitcalculatesallmarginalsintwopasses
during'enter_evidence',socalling'marginal_nodes'takesconstanttime.
WewilldiscusssomeoftheinferencealgorithmsimplementedinBNTbelow,andfinishwithasummaryofallofthem.

Variableelimination
Thevariableeliminationalgorithm,alsoknownasbucketeliminationorpeeling,isoneofthesimplestinferencealgorithms.Thebasic
ideaisto"pushsumsinsideofproducts"thisisexplainedinmoredetailhere.
Theprincipleofdistributingsumsoverproductscanbegeneralizedgreatlytoapplytoanycommutativesemiring.Thisformsthebasis
ofmanycommonalgorithms,suchasViterbidecodingandtheFastFourierTransform.Fordetails,see
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics

29/32

9/21/2016

HowtousetheBayesNetToolbox

R.McElieceandS.M.Aji,2000.TheGeneralizedDistributiveLaw,IEEETrans.Inform.Theory,vol.46,no.2(March2000),
pp.325343.
F.R.Kschischang,B.J.FreyandH.A.Loeliger,2001.FactorgraphsandthesumproductalgorithmIEEETransactionson
InformationTheory,February,2001.
ChoosinganorderinwhichtosumoutthevariablessoastominimizecomputationalcostisknowntobeNPhard.The
implementationofthisalgorithminvar_elim_inf_enginemakesnoattempttooptimizethisordering(incontrast,say,to
jtree_inf_engine,whichusesagreedysearchproceduretofindagoodordering).
Note:unlikemostalgorithms,var_elimdoesallitscomputationalworkinsideofmarginal_nodes,notinsideofenter_evidence.

Globalinferencemethods
Thesimplestinferencealgorithmofallistoexplicitelyconstructthejointdistributionoverallthenodes,andthentomarginalizeit.
Thisisimplementedinglobal_joint_inf_engine.Sincethesizeofthejointisexponentialinthenumberofdiscrete(hidden)nodes,
thisisnotaverypracticalalgorithm.Itisincludedmerelyforpedagogicalanddebuggingpurposes.
Threespecializedversionsofthisalgorithmhavealsobeenimplemented,correspondingtothecaseswhereallthenodesarediscrete
(D),allareGaussian(G),andsomearediscreteandsomeGaussian(CG).Theyarecalledenumerative_inf_engine,
gaussian_inf_engine,andcond_gauss_inf_enginerespectively.
Note:unlikemostalgorithms,theseglobalinferencealgorithmsdoalltheircomputationalworkinsideofmarginal_nodes,notinsideof
enter_evidence.

Quickscore
ThejunctiontreealgorithmisquiteslowontheQMRnetwork,sincethecliquesaresobig.Onesimpletrickwecanuseistonotice
thathiddenleavesdonotaffecttheposteriorsontheroots,andhencedonotneedtobeincludedinthenetwork.Asecondtrickisto
noticethatthenegativefindingscanbe"absorbed"intotheprior:seethefileBNT/examples/static/mk_minimal_qmr_bnetfordetails.
Amuchmoresignificantspeedupisobtainedbyexploitingspecialpropertiesofthenoisyornode,asdonebythequickscore
algorithm.Fordetails,see
Heckerman,"Atractableinferencealgorithmfordiagnosingmultiplediseases",UAI89.
RishandDechter,"Ontheimpactofcausalindependence",UCItechreport,1998.
ThishasbeenimplementedinBNTasaspecialpurposeinferenceengine,whichcanbecreatedandusedasfollows:
engine=quickscore_inf_engine(inhibit,leak,prior);
engine=enter_evidence(engine,pos,neg);
m=marginal_nodes(engine,i);

Beliefpropagation
Evenusingquickscore,exactinferencetakestimethatisexponentialinthenumberofpositivefindings.Henceforlargenetworkswe
needtoresorttoapproximateinferencetechniques.Seeforexample
T.JaakkolaandM.Jordan,"VariationalprobabilisticinferenceandtheQMRDTnetwork",JAIR10,1999.
K.Murphy,Y.WeissandM.Jordan,"Loopybeliefpropagationforapproximateinference:anempiricalstudy",UAI99.
ThelatterapproximationentailsapplyingPearl'sbeliefpropagationalgorithmtoamodelevenifithasloops(hencethenameloopy
beliefpropagation).Pearl'salgorithm,implementedaspearl_inf_engine,givesexactresultswhenappliedtosinglyconnectedgraphs
(a.k.a.polytrees,sincetheunderlyingundirectedtopologyisatree,butanodemayhavemultipleparents).Toapplythisalgorithmtoa
graphwithloops,usepearl_inf_engine.Thiscanuseacentralizedordistributedmessagepassingprotocol.Youcanuseitasinthe
followingexample.
engine=pearl_inf_engine(bnet,'max_iter',30);
engine=enter_evidence(engine,evidence);
m=marginal_nodes(engine,i);

Wefoundthatthisalgorithmoftenconverges,andwhenitdoes,oftenisveryaccurate,butitdependsontheprecisesettingofthe
parametervaluesofthenetwork.(SeethefileBNT/examples/static/qmr1torepeattheexperimentforyourself.)Understandingwhen
andwhybeliefpropagationconverges/worksisatopicofongoingresearch.
pearl_inf_enginecanexploitspecialstructureinnoisyorandgmuxnodestocomputemessagesefficiently.
http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics

30/32

9/21/2016

HowtousetheBayesNetToolbox

belprop_inf_engineislikepearl,butusespotentialstorepresentmessages.Hencethisisslower.
belprop_fg_inf_engineislikebelprop,butisdesignedforfactorgraphs.

Sampling
BNTnow(Mar'02)hastwosampling(MonteCarlo)inferencealgorithms:
likelihood_weighting_inf_enginewhichdoesimportancesamplingandcanhandleanynodetype.
gibbs_sampling_inf_engine,writtenbyBhaskaraMarthi.CurrentlythiscanonlyhandletabularCPDs.Foramuchfasterand

morepowerfulGibbssamplingprogram,seeBUGS.
Note:Togeneratesamplesfromanetwork(whichisnotthesameasinference!),usesample_bnet.

Summaryofinferenceengines
Theinferenceenginesdifferinmanyways.Herearesomeofthemajor"axes":
Worksforalltopologiesormakesrestrictions?
Worksforallnodetypesormakesrestrictions?
Exactorapproximateinference?
Intermsoftopology,mostengineshandleanykindofDAG.belprop_fgdoesapproximateinferenceonfactorgraphs(FG),whichcan
beusedtorepresentdirected,undirected,andmixed(chain)graphs.(Inthefuture,weplantosupportexactinferenceonchaingraphs.)
quickscoreonlyworksonQMRlikemodels.
Intermsofnodetypes:algorithmsthatusepotentialscanhandlediscrete(D),Gaussian(G)orconditionalGaussian(CG)models.
Samplingalgorithmscanessentiallyhandleanykindofnode(distribution).Otheralgorithmsmakemorerestrictiveassumptionsin
exchangeforspeed.
Finally,mostalgorithmsaredesignedtogivetheexactanswer.Thebeliefpropagationalgorithmsareexactifappliedtotrees,andin
someothercases.Samplingisconsideredapproximate,eventhough,inthelimitofaninfinitenumberofsamples,itgivestheexact
answer.
HereisasummaryofthepropertiesofalltheenginesinBNTwhichworkonstaticnetworks.
Name

Exact? Nodetype? topology

belprop
belprop_fg

approx D
approx D

DAG
factorgraph

cond_gauss
enumerative
gaussian

exact
exact
exact

DAG
DAG
DAG

gibbs
global_joint
jtree

approx D
exact D,G,CG
exact D,G,CG

CG
D
G

DAG
DAG
DAGb

likelihood_weighting approx any


pearl
approx D,G
pearl
exact D,G

DAG
DAG
polytree

quickscore
stab_cond_gauss
var_elim

QMR
DAG
DAG

exact
exact
exact

noisyor
CG
D,G,CG

Influencediagrams/decisionmaking
BNTimplementsanexactalgorithmforsolvingLIMIDs(limitedmemoryinfluencediagrams),describedin

http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics

31/32

9/21/2016

HowtousetheBayesNetToolbox

S.L.LauritzenandD.Nilsson.RepresentingandsolvingdecisionproblemswithlimitedinformationManagementScience,47,
12381251.September2001.
LIMIDsexplicitelyshowallinformationarcs,ratherthanimplicitelyassumingnoforgetting.Thisallowsthemtomodelforgetful
controllers.
SeetheexamplesinBNT/examples/limidsfordetails.

DBNs,HMMs,Kalmanfiltersandallthat
ClickherefordocumentationabouthowtouseBNTfordynamicalsystemsandsequencedata.

http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html#basics

32/32

You might also like