Seite - 309 - in Differential Geometrical Theory of Statistics

Bild der Seite - 309 -

Text der Seite - 309 -

Entropy2016,18, 442 Werequire that the Jeffreysdivergencebetweenmixturesbeﬁnite inorder toapproximate theKL betweenmixturesbyaBregmandivergence.Welooselyderive thisobservation(Carefulderivations willbereportedelsewhere)usingtwodifferentapproaches: • First, continuous mixture distributions have smooth densities that can be arbitrarily closely approximatedusingasingledistribution (potentiallymulti-modal)belonging to thePolynomial ExponentialFamilies [53,54] (PEFs).Apolynomialexponential familyoforderDhaslog-likelihood l(x;θ)∝∑Di=1θix i: Therefore, aPEFisanexponential familywithpolynomial sufficientstatistics t(x) = (x,x2,. . . ,xD).However, the log-normalizerFD(θ) = log ∫ exp(θ t(x))dxof aD-order PEF is not available in closed-form: It is computationally intractable. Nevertheless, theKL betweentwomixturesm(x)andm′(x)canbe theoreticallyapproximatedcloselybyaBregman divergencebetween the twocorrespondingPEFs: KL(m(x) :m′(x)) KL(p(x;θ) : p(x;θ′))= BFD(θ ′:θ),whereθandθ′are thenaturalparametersof thePEFfamily{p(x;θ)}approximating m(x)andm′(x), respectively (i.e.,m(x) p(x;θ)andm′(x) p(x;θ′)).Notice that theBregman divergenceofPEFshasnecessarilyﬁnitevaluebuttheKLoftwosmoothmixturescanpotentially diverge (inﬁnitevalue),hence theconditionsonJeffreysdivergence tobeﬁnite. • Second, consider twoﬁnitemixturesm(x) =∑ki=1wipi(x) andm′(x) =∑k ′ j=1w ′ jp ′ j(x)of k and k′ components (possibly with heterogeneous components pi(x)’s and p′j(x)’s), respectively. In informationgeometry,amixture family is thesetofconvexcombinationofﬁxedcomponent densities. Thus in statistics, amixture is understoodas a convex combinationofparametric componentswhile in informationgeometryamixture family is thesetofconvexcombination of ﬁxed components. Let us consider themixture families {g(x;(w,w′))} generated by the D= k+k′ﬁxedcomponents p1(x), . . . ,pk(x),p′1(x), . . . ,p ′ k′(x):{ g(x;(w,w′))= k ∑ i=1 wipi(x)+ k′ ∑ j=1 w′jp ′ j(x) : k ∑ i=1 wi+ k′ ∑ j=1 w′j=1 } Wecanapproximatearbitrarilyﬁnely (withrespect to totalvariation)mixturem(x) forany >0 by g(x;α) (1− )m(x)+ m′(x)withα= ((1− )w, w′) (so that∑k+k′i=1 αi= 1) andm′(x) g(x;α′)= m(x)+(1− )m′(x)withα′=( w,(1− )w′) (and∑k+k′i=1 α′i=1). ThereforeKL(m(x) : m′(x)) KL(g(x;α) : g(x;α′))=BF∗(α :α′),whereF∗(α)= ∫ g(x;α)logg(x;α)dx is theShannon information(negativeShannonentropy) for thecompositemixture family.Again, theBregman divergence BF∗(α : α′) is necessarily ﬁnite but KL(m(x) : m′(x)) betweenmixturesmay be potentially inﬁnitewhentheKLintegraldiverges (hence, theconditiononJeffreysdivergence ﬁniteness). Interestingly, thisShannoninformationcanbearbitrarilycloselyapproximatedwhen considering isotropicGaussians [13].Notice that theconvexconjugateF(θ)of thecontinuous Shannonneg-entropyF∗(η) is the log-sum-expfunctiononthe inversesoftmap. References 1. Huang, Z.K.; Chau, K.W. A new image thresholding method based on Gaussian mixture model. Appl.Math. Comput.2008,205, 899–907. 2. Seabra, J.; Ciompi, F.; Pujol, O.;Mauri, J.; Radeva, P.; Sanches, J. Rayleighmixturemodel for plaque characterization in intravascularultrasound. IEEETrans. Biomed. Eng. 2011,58, 1314–1324. 3. Julier,S.J.;Bailey,T.;Uhlmann, J.K. UsingExponentialMixtureModels forSuboptimalDistributedData Fusion. InProceedingsof the2006IEEENonlinearStatisticalSignalProcessingWorkshop,Cambridge, UK,13–15September2006; IEEE:NewYork,NY,USA,2006;pp. 160–163. 4. Cover,T.M.;Thomas, J.A.Elementsof InformationTheory; JohnWiley&Sons:Hoboken,NJ,USA,2012. 5. Banerjee,A.;Merugu,S.;Dhillon, I.S.;Ghosh, J. ClusteringwithBregmandivergences. J.Mach. Learn. Res. 2005,6, 1705–1749. 309

zurück zum Buch Differential Geometrical Theory of Statistics"

Differential Geometrical Theory of Statistics

Titel: Differential Geometrical Theory of Statistics
Autoren: Frédéric Barbaresco; Frank Nielsen
Herausgeber: MDPI
Ort: Basel
Datum: 2017
Sprache: englisch
Lizenz: CC BY-NC-ND 4.0
ISBN: 978-3-03842-425-3
Abmessungen: 17.0 x 24.4 cm
Seiten: 476
Schlagwörter: Entropy, Coding Theory, Maximum entropy, Information geometry, Computational Information Geometry, Hessian Geometry, Divergence Geometry, Information topology, Cohomology, Shape Space, Statistical physics, Thermodynamics
Kategorien: Naturwissenschaften Physik