Seite - 374 - in Differential Geometrical Theory of Statistics
Bild der Seite - 374 -
Text der Seite - 374 -
Entropy2016,18, 98
Section 4.2 considers the problem of order selection for mixtures of Riemannian Laplace
distributions. Precisely, thisconsistsoffindingthenumberMofmixturecomponents inEquation(27)
that realizes the best representation of a given set of dataY1, . . . ,YN. This problem is solved by
computingtheBICcriterion,whichisherefoundinexplicit formforthecaseofmixturesofRiemannian
LaplacedistributionsonPm.
4.1. Estimationof theMixtureParameters
In thissection,Y1, . . . ,YN are i.i.d. samples fromEquation(27). Basedontheseobservations,an
EMalgorithm isproposed to estimate ( μ,Y¯μ,σμ)1≤μ≤M. Thederivationof this algorithmcanbe
carriedoutsimilarly to [15].
Toexplainhowthisalgorithmworks,defineforallϑ={( μ,Y¯μ,σμ)},
ωμ(Yn,ϑ)= μ×p(Yn|Y¯μ,σμ)
∑Ms=1 s×p(Yn|Y¯s,σs) , Nμ(ϑ)= N
∑
n=1 ωμ(Yn) (28)
Thealgorithmiterativelyupdates ϑˆ={(ˆμ,Yˆμ, σˆμ)} ,whichisanapproximationofthemaximum
likelihoodestimateof themixtureparametersϑ=( μ,Y¯μ,σμ)as follows.
• Update for ˆμ: Basedonthecurrentvalueof ϑˆ, assignto ˆμ thenewvalue ˆμ= Nμ(ϑˆ) /
N.
• Update for Yˆμ: Basedonthecurrentvalueof ϑˆ, assignto Yˆμ thevalue:
Yˆμ=argminY N
∑
n=1 ωμ(Yn, ϑˆ)d(Y,Yn) (29)
• Update for σˆμ: Basedonthecurrentvalueof ϑˆ, assignto σˆμ thenewvalue:
σˆμ=Φ(N−1μ (ϑˆ)×∑Nn=1 ωμ(Yn, ϑˆ)d(Yˆμ,Yn)) (30)
where the functionΦ isdefinedinProposition1.
These threeupdaterulesshouldbeperformedin theaboveorder. Realizationof theupdaterules
for ˆμ and σˆμ is straightforward. Theupdaterule for Yˆμ is realizedusingaslightmodificationof the
sub-gradientdescentalgorithmdescribed inSection3.2.Moreprecisely, the factor1/Nappearing in
Equation(22) isonlyreplacedwithωμ(Yn, ϑˆ)ateach iteration.
Inpractice, the initial conditions (ˆμ0,Yˆμ0, σˆμ0) in thisalgorithmwerechosen in the following
way. Theweights ( μ0) are uniformand equal to 1/M; (Yˆμ0) are M different observations from
the set {Y1,..,YN} chosen randomly; and (σˆμ0) is computed from ( μ0) and (Yˆμ0) according to
the rule Equation (30). Since the convergence of the algorithmdepends on the initial conditions,
theEMalgorithm is run several times, and thebest result is retained, i.e., theonemaximizing the
log-likelihoodfunction.
4.2. TheBayesian InformationCriterion
TheBICwas introducedbySchwarz tofindtheappropriatedimensionofamodel thatwillfita
givensetofobservations [16]. Since then,BIChasbeenusedinmanyBayesianmodelingproblems
wherepriorsarehardtosetprecisely. In largesamplesettings, thefittedmodel favoredbyBICideally
correspondsto thecandidatemodel that isaposteriorimostprobable; i.e., themodel that is rendered
mostplausiblebythedataathand.Oneof themainfeaturesof theBICis itseasycomputation, since it
isonlybasedontheempirical log-likelihoodfunction.
374
Differential Geometrical Theory of Statistics
- Titel
- Differential Geometrical Theory of Statistics
- Autoren
- Frédéric Barbaresco
- Frank Nielsen
- Herausgeber
- MDPI
- Ort
- Basel
- Datum
- 2017
- Sprache
- englisch
- Lizenz
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03842-425-3
- Abmessungen
- 17.0 x 24.4 cm
- Seiten
- 476
- Schlagwörter
- Entropy, Coding Theory, Maximum entropy, Information geometry, Computational Information Geometry, Hessian Geometry, Divergence Geometry, Information topology, Cohomology, Shape Space, Statistical physics, Thermodynamics
- Kategorien
- Naturwissenschaften Physik