Page - 288 - in Differential Geometrical Theory of Statistics

Image of the Page - 288 -

Text of the Page - 288 -

Entropy2016,18, 442 densitiesmandm′, anddenotebyH(m)=H×(m,m)= ∫ Xm(x) log 1 m(x)dx=− ∫ Xm(x) logm(x)dx theShannonentropy[4]. ThentheKullback–Leiblerdivergencebetweenmandm′ isgivenby: KL(m :m′)=H×(m,m′)−H(m)= ∫ X m(x) log m(x) m′(x)dx≥0. (1) Thenotation“:” isused insteadof theusualcomma“,”notationtoemphasize that thedistance isnotametricdistancesinceneither is it symmetric (KL(m :m′) =KL(m′ :m)), nordoes it satisfy the triangular inequality [4]ofmetricdistances (KL(m :m′)+KL(m′ :m′′) ≥KL(m :m′′)).Whenthe naturalbaseof the logarithmischosen,wegetadifferentialentropymeasureexpressedinnatunits. Alternatively,wecanalsouse thebase-2 logarithm(log2x= logx log2)andget theentropyexpressed inbit units.AlthoughtheKLdivergence isavailable inclosed-formformanydistributions (inparticular as equivalent Bregmandivergences for exponential families [5], seeAppendixC), it was proven that theKullback–Leiblerdivergencebetween two (univariate)GMMs isnot analytic [6] (see also theparticular caseofaGMMof twocomponentswith thesamevariance thatwasanalyzed in [7]). SeeAppendixAfor ananalysis. Note that thedifferential entropymaybenegative. For example, thedifferential entropyofaunivariateGaussiandistribution is log(σ √ 2πe), and is thereforenegative whenthestandardvarianceσ< 1√ 2πe ≈0.242.Weconsidercontinuousdistributionswithentropies well-deﬁned(entropymaybeundeﬁnedforsingulardistributions likeCantor’sdistribution[8]). 1.1. PriorWork Many approximation techniques have been designed to beat the computationally intensive MonteCarlo (MC) stochastic estimation: K̂Ls(m : m′) = 1s∑ s i=1 log m(xi) m′(xi) with x1, . . . ,xs ∼ m(x) (s independently and identically distributed (i.i.d.) samples x1, . . . ,xs). The MC estimator is asymptotically consistent, lims→∞ K̂Ls(m : m′) = KL(m : m′), so that the “true value” of theKL ofmixtures isestimatedinpracticebytakingavery largesample (say, s=109).However,wepoint out that theMCestimatorgivesasoutputastochasticapproximation, andthereforedoesnotguarantee deterministicbounds(conﬁdenceintervalsmaybeused).Deterministic lowerandupperboundsof the integral canbeobtainedbyvariousnumerical integration techniquesusingquadraturerules.Werefer to [9–12] for thecurrentstate-of-the-artapproximationtechniquesandboundsontheKLofGMMs. The latestworkforcomputing theentropyofGMMsis [13]. It considersarbitraryﬁnely tunedbounds of theentropyof isotropicGaussianmixtures (acaseencounteredwhendealingwithKDEs,kernel densityestimators).However, there isacatch in the techniqueof [13]: It reliesonsolvingtheunique rootsof somelog-sum-expequations (SeeTheorem1of [13],p. 3342) thatdonotadmitaclosed-form solution. Thus it isahybridmethodthatcontrastswithourcombinatorialapproach. Boundsof the KLdivergencebetweenmixturemodelscanbegeneralized toboundsof the likelihoodfunctionof mixturemodels [14],because log-likelihoodis just theKLbetweentheempiricaldistributionandthe mixturemodeluptoaconstantshift. In informationgeometry [15], amixture familyof linearly independentprobabilitydistributions p1(x), ...,pk(x) isdeﬁnedbytheconvexcombinationof thosenon-parametriccomponentdistributions: m(x;η) =∑ki=1ηipi(x)with ηi > 0 and∑ k i=1ηi = 1. Amixture family induces aduallyﬂat space where theKullback–Leiblerdivergence isequivalent toaBregmandivergence [5,15]deﬁnedonthe η-parameters.However, in thatcase, theBregmanconvexgeneratorF(η)= ∫ m(x;η) logm(x;η)dx (the Shannon information) is not available in closed-form. Except for the family ofmultinomial distributions that isbothamixture family (withclosed-formKL(m :m′)=∑ki=1mi log mi m′i , thediscrete KL[4])andanexponential family [15]. 1.2. Contributions In thiswork,wepresentasimpleandefﬁcientmethodthatbuildsalgorithmicallyaclosed-form formula thatguaranteesbothdeterministic lowerandupperboundsontheKLdivergencewithinan 288

back to the book Differential Geometrical Theory of Statistics"

Differential Geometrical Theory of Statistics

Title: Differential Geometrical Theory of Statistics
Authors: Frédéric Barbaresco; Frank Nielsen
Editor: MDPI
Location: Basel
Date: 2017
Language: English
License: CC BY-NC-ND 4.0
ISBN: 978-3-03842-425-3
Size: 17.0 x 24.4 cm
Pages: 476
Keywords: Entropy, Coding Theory, Maximum entropy, Information geometry, Computational Information Geometry, Hessian Geometry, Divergence Geometry, Information topology, Cohomology, Shape Space, Statistical physics, Thermodynamics
Categories: Naturwissenschaften Physik