Seite - 374 - in Differential Geometrical Theory of Statistics

Bild der Seite - 374 -

Text der Seite - 374 -

Entropy2016,18, 98 Section 4.2 considers the problem of order selection for mixtures of Riemannian Laplace distributions. Precisely, thisconsistsofﬁndingthenumberMofmixturecomponents inEquation(27) that realizes the best representation of a given set of dataY1, . . . ,YN. This problem is solved by computingtheBICcriterion,whichisherefoundinexplicit formforthecaseofmixturesofRiemannian LaplacedistributionsonPm. 4.1. Estimationof theMixtureParameters In thissection,Y1, . . . ,YN are i.i.d. samples fromEquation(27). Basedontheseobservations,an EMalgorithm isproposed to estimate ( μ,Y¯μ,σμ)1≤μ≤M. Thederivationof this algorithmcanbe carriedoutsimilarly to [15]. Toexplainhowthisalgorithmworks,deﬁneforallϑ={( μ,Y¯μ,σμ)}, ωμ(Yn,ϑ)= μ×p(Yn|Y¯μ,σμ) ∑Ms=1 s×p(Yn|Y¯s,σs) , Nμ(ϑ)= N ∑ n=1 ωμ(Yn) (28) Thealgorithmiterativelyupdates ϑˆ={(ˆμ,Yˆμ, σˆμ)} ,whichisanapproximationofthemaximum likelihoodestimateof themixtureparametersϑ=( μ,Y¯μ,σμ)as follows. • Update for ˆμ: Basedonthecurrentvalueof ϑˆ, assignto ˆμ thenewvalue ˆμ= Nμ(ϑˆ) / N. • Update for Yˆμ: Basedonthecurrentvalueof ϑˆ, assignto Yˆμ thevalue: Yˆμ=argminY N ∑ n=1 ωμ(Yn, ϑˆ)d(Y,Yn) (29) • Update for σˆμ: Basedonthecurrentvalueof ϑˆ, assignto σˆμ thenewvalue: σˆμ=Φ(N−1μ (ϑˆ)×∑Nn=1 ωμ(Yn, ϑˆ)d(Yˆμ,Yn)) (30) where the functionΦ isdeﬁnedinProposition1. These threeupdaterulesshouldbeperformedin theaboveorder. Realizationof theupdaterules for ˆμ and σˆμ is straightforward. Theupdaterule for Yˆμ is realizedusingaslightmodiﬁcationof the sub-gradientdescentalgorithmdescribed inSection3.2.Moreprecisely, the factor1/Nappearing in Equation(22) isonlyreplacedwithωμ(Yn, ϑˆ)ateach iteration. Inpractice, the initial conditions (ˆμ0,Yˆμ0, σˆμ0) in thisalgorithmwerechosen in the following way. Theweights ( μ0) are uniformand equal to 1/M; (Yˆμ0) are M different observations from the set {Y1,..,YN} chosen randomly; and (σˆμ0) is computed from ( μ0) and (Yˆμ0) according to the rule Equation (30). Since the convergence of the algorithmdepends on the initial conditions, theEMalgorithm is run several times, and thebest result is retained, i.e., theonemaximizing the log-likelihoodfunction. 4.2. TheBayesian InformationCriterion TheBICwas introducedbySchwarz toﬁndtheappropriatedimensionofamodel thatwillﬁta givensetofobservations [16]. Since then,BIChasbeenusedinmanyBayesianmodelingproblems wherepriorsarehardtosetprecisely. In largesamplesettings, theﬁttedmodel favoredbyBICideally correspondsto thecandidatemodel that isaposteriorimostprobable; i.e., themodel that is rendered mostplausiblebythedataathand.Oneof themainfeaturesof theBICis itseasycomputation, since it isonlybasedontheempirical log-likelihoodfunction. 374

zurück zum Buch Differential Geometrical Theory of Statistics"

Differential Geometrical Theory of Statistics

Titel: Differential Geometrical Theory of Statistics
Autoren: Frédéric Barbaresco; Frank Nielsen
Herausgeber: MDPI
Ort: Basel
Datum: 2017
Sprache: englisch
Lizenz: CC BY-NC-ND 4.0
ISBN: 978-3-03842-425-3
Abmessungen: 17.0 x 24.4 cm
Seiten: 476
Schlagwörter: Entropy, Coding Theory, Maximum entropy, Information geometry, Computational Information Geometry, Hessian Geometry, Divergence Geometry, Information topology, Cohomology, Shape Space, Statistical physics, Thermodynamics
Kategorien: Naturwissenschaften Physik