Seite - 309 - in Differential Geometrical Theory of Statistics
Bild der Seite - 309 -
Text der Seite - 309 -
Entropy2016,18, 442
Werequire that the JeffreysdivergencebetweenmixturesbeďŹnite inorder toapproximate theKL
betweenmixturesbyaBregmandivergence.Welooselyderive thisobservation(Carefulderivations
willbereportedelsewhere)usingtwodifferentapproaches:
⢠First, continuous mixture distributions have smooth densities that can be arbitrarily closely
approximatedusingasingledistribution (potentiallymulti-modal)belonging to thePolynomial
ExponentialFamilies [53,54] (PEFs).Apolynomialexponential familyoforderDhaslog-likelihood
l(x;θ)ââDi=1θix i: Therefore, aPEFisanexponential familywithpolynomial sufficientstatistics
t(x) = (x,x2,. . . ,xD).However, the log-normalizerFD(θ) = log âŤ
exp(θ t(x))dxof aD-order
PEF is not available in closed-form: It is computationally intractable. Nevertheless, theKL
betweentwomixturesm(x)andmâ˛(x)canbe theoreticallyapproximatedcloselybyaBregman
divergencebetween the twocorrespondingPEFs: KL(m(x) :mâ˛(x)) KL(p(x;θ) : p(x;θâ˛))=
BFD(θ â˛:θ),whereθandθâ˛are thenaturalparametersof thePEFfamily{p(x;θ)}approximating
m(x)andmâ˛(x), respectively (i.e.,m(x) p(x;θ)andmâ˛(x) p(x;θâ˛)).Notice that theBregman
divergenceofPEFshasnecessarilyďŹnitevaluebuttheKLoftwosmoothmixturescanpotentially
diverge (inďŹnitevalue),hence theconditionsonJeffreysdivergence tobeďŹnite.
⢠Second, consider twoďŹnitemixturesm(x) =âki=1wipi(x) andmâ˛(x) =âk â˛
j=1w â˛
jp â˛
j(x)of k and
kⲠcomponents (possibly with heterogeneous components pi(x)âs and pâ˛j(x)âs), respectively.
In informationgeometry,amixture family is thesetofconvexcombinationofďŹxedcomponent
densities. Thus in statistics, amixture is understoodas a convex combinationofparametric
componentswhile in informationgeometryamixture family is thesetofconvexcombination
of ďŹxed components. Let us consider themixture families {g(x;(w,wâ˛))} generated by the
D= k+kâ˛ďŹxedcomponents p1(x), . . . ,pk(x),pâ˛1(x), . . . ,p â˛
kâ˛(x):{
g(x;(w,wâ˛))= k
â
i=1 wipi(x)+ kâ˛
â
j=1 wâ˛jp â˛
j(x) : k
â
i=1 wi+ kâ˛
â
j=1 wâ˛j=1
}
WecanapproximatearbitrarilyďŹnely (withrespect to totalvariation)mixturem(x) forany >0
by g(x;Îą) (1â )m(x)+ mâ˛(x)withÎą= ((1â )w, wâ˛) (so thatâk+kâ˛i=1 Îąi= 1) andmâ˛(x)
g(x;Îąâ˛)= m(x)+(1â )mâ˛(x)withÎąâ˛=( w,(1â )wâ˛) (andâk+kâ˛i=1 Îąâ˛i=1). ThereforeKL(m(x) :
mâ˛(x)) KL(g(x;Îą) : g(x;Îąâ˛))=BFâ(Îą :Îąâ˛),whereFâ(Îą)= âŤ
g(x;Îą)logg(x;Îą)dx is theShannon
information(negativeShannonentropy) for thecompositemixture family.Again, theBregman
divergence BFâ(Îą : Îąâ˛) is necessarily ďŹnite but KL(m(x) : mâ˛(x)) betweenmixturesmay be
potentially inďŹnitewhentheKLintegraldiverges (hence, theconditiononJeffreysdivergence
ďŹniteness). Interestingly, thisShannoninformationcanbearbitrarilycloselyapproximatedwhen
considering isotropicGaussians [13].Notice that theconvexconjugateF(θ)of thecontinuous
Shannonneg-entropyFâ(Ρ) is the log-sum-expfunctiononthe inversesoftmap.
References
1. Huang, Z.K.; Chau, K.W. A new image thresholding method based on Gaussian mixture model.
Appl.Math. Comput.2008,205, 899â907.
2. Seabra, J.; Ciompi, F.; Pujol, O.;Mauri, J.; Radeva, P.; Sanches, J. Rayleighmixturemodel for plaque
characterization in intravascularultrasound. IEEETrans. Biomed. Eng. 2011,58, 1314â1324.
3. Julier,S.J.;Bailey,T.;Uhlmann, J.K. UsingExponentialMixtureModels forSuboptimalDistributedData
Fusion. InProceedingsof the2006IEEENonlinearStatisticalSignalProcessingWorkshop,Cambridge,
UK,13â15September2006; IEEE:NewYork,NY,USA,2006;pp. 160â163.
4. Cover,T.M.;Thomas, J.A.Elementsof InformationTheory; JohnWiley&Sons:Hoboken,NJ,USA,2012.
5. Banerjee,A.;Merugu,S.;Dhillon, I.S.;Ghosh, J. ClusteringwithBregmandivergences. J.Mach. Learn. Res.
2005,6, 1705â1749.
309
Differential Geometrical Theory of Statistics
- Titel
- Differential Geometrical Theory of Statistics
- Autoren
- FrĂŠdĂŠric Barbaresco
- Frank Nielsen
- Herausgeber
- MDPI
- Ort
- Basel
- Datum
- 2017
- Sprache
- englisch
- Lizenz
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03842-425-3
- Abmessungen
- 17.0 x 24.4 cm
- Seiten
- 476
- SchlagwĂśrter
- Entropy, Coding Theory, Maximum entropy, Information geometry, Computational Information Geometry, Hessian Geometry, Divergence Geometry, Information topology, Cohomology, Shape Space, Statistical physics, Thermodynamics
- Kategorien
- Naturwissenschaften Physik