Page - 309 - in Differential Geometrical Theory of Statistics
Image of the Page - 309 -
Text of the Page - 309 -
Entropy2016,18, 442
Werequire that the Jeffreysdivergencebetweenmixturesbefinite inorder toapproximate theKL
betweenmixturesbyaBregmandivergence.Welooselyderive thisobservation(Carefulderivations
willbereportedelsewhere)usingtwodifferentapproaches:
• First, continuous mixture distributions have smooth densities that can be arbitrarily closely
approximatedusingasingledistribution (potentiallymulti-modal)belonging to thePolynomial
ExponentialFamilies [53,54] (PEFs).Apolynomialexponential familyoforderDhaslog-likelihood
l(x;θ)∝∑Di=1θix i: Therefore, aPEFisanexponential familywithpolynomial sufficientstatistics
t(x) = (x,x2,. . . ,xD).However, the log-normalizerFD(θ) = log ∫
exp(θ t(x))dxof aD-order
PEF is not available in closed-form: It is computationally intractable. Nevertheless, theKL
betweentwomixturesm(x)andm′(x)canbe theoreticallyapproximatedcloselybyaBregman
divergencebetween the twocorrespondingPEFs: KL(m(x) :m′(x)) KL(p(x;θ) : p(x;θ′))=
BFD(θ ′:θ),whereθandθ′are thenaturalparametersof thePEFfamily{p(x;θ)}approximating
m(x)andm′(x), respectively (i.e.,m(x) p(x;θ)andm′(x) p(x;θ′)).Notice that theBregman
divergenceofPEFshasnecessarilyfinitevaluebuttheKLoftwosmoothmixturescanpotentially
diverge (infinitevalue),hence theconditionsonJeffreysdivergence tobefinite.
• Second, consider twofinitemixturesm(x) =∑ki=1wipi(x) andm′(x) =∑k ′
j=1w ′
jp ′
j(x)of k and
k′ components (possibly with heterogeneous components pi(x)’s and p′j(x)’s), respectively.
In informationgeometry,amixture family is thesetofconvexcombinationoffixedcomponent
densities. Thus in statistics, amixture is understoodas a convex combinationofparametric
componentswhile in informationgeometryamixture family is thesetofconvexcombination
of fixed components. Let us consider themixture families {g(x;(w,w′))} generated by the
D= k+k′fixedcomponents p1(x), . . . ,pk(x),p′1(x), . . . ,p ′
k′(x):{
g(x;(w,w′))= k
∑
i=1 wipi(x)+ k′
∑
j=1 w′jp ′
j(x) : k
∑
i=1 wi+ k′
∑
j=1 w′j=1
}
Wecanapproximatearbitrarilyfinely (withrespect to totalvariation)mixturem(x) forany >0
by g(x;α) (1− )m(x)+ m′(x)withα= ((1− )w, w′) (so that∑k+k′i=1 αi= 1) andm′(x)
g(x;α′)= m(x)+(1− )m′(x)withα′=( w,(1− )w′) (and∑k+k′i=1 α′i=1). ThereforeKL(m(x) :
m′(x)) KL(g(x;α) : g(x;α′))=BF∗(α :α′),whereF∗(α)= ∫
g(x;α)logg(x;α)dx is theShannon
information(negativeShannonentropy) for thecompositemixture family.Again, theBregman
divergence BF∗(α : α′) is necessarily finite but KL(m(x) : m′(x)) betweenmixturesmay be
potentially infinitewhentheKLintegraldiverges (hence, theconditiononJeffreysdivergence
finiteness). Interestingly, thisShannoninformationcanbearbitrarilycloselyapproximatedwhen
considering isotropicGaussians [13].Notice that theconvexconjugateF(θ)of thecontinuous
Shannonneg-entropyF∗(η) is the log-sum-expfunctiononthe inversesoftmap.
References
1. Huang, Z.K.; Chau, K.W. A new image thresholding method based on Gaussian mixture model.
Appl.Math. Comput.2008,205, 899–907.
2. Seabra, J.; Ciompi, F.; Pujol, O.;Mauri, J.; Radeva, P.; Sanches, J. Rayleighmixturemodel for plaque
characterization in intravascularultrasound. IEEETrans. Biomed. Eng. 2011,58, 1314–1324.
3. Julier,S.J.;Bailey,T.;Uhlmann, J.K. UsingExponentialMixtureModels forSuboptimalDistributedData
Fusion. InProceedingsof the2006IEEENonlinearStatisticalSignalProcessingWorkshop,Cambridge,
UK,13–15September2006; IEEE:NewYork,NY,USA,2006;pp. 160–163.
4. Cover,T.M.;Thomas, J.A.Elementsof InformationTheory; JohnWiley&Sons:Hoboken,NJ,USA,2012.
5. Banerjee,A.;Merugu,S.;Dhillon, I.S.;Ghosh, J. ClusteringwithBregmandivergences. J.Mach. Learn. Res.
2005,6, 1705–1749.
309
Differential Geometrical Theory of Statistics
- Title
- Differential Geometrical Theory of Statistics
- Authors
- Frédéric Barbaresco
- Frank Nielsen
- Editor
- MDPI
- Location
- Basel
- Date
- 2017
- Language
- English
- License
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03842-425-3
- Size
- 17.0 x 24.4 cm
- Pages
- 476
- Keywords
- Entropy, Coding Theory, Maximum entropy, Information geometry, Computational Information Geometry, Hessian Geometry, Divergence Geometry, Information topology, Cohomology, Shape Space, Statistical physics, Thermodynamics
- Categories
- Naturwissenschaften Physik