Web-Books
in the Austria-Forum
Austria-Forum
Web-Books
Naturwissenschaften
Physik
Differential Geometrical Theory of Statistics
Page - 288 -
  • User
  • Version
    • full version
    • text only version
  • Language
    • Deutsch - German
    • English

Page - 288 - in Differential Geometrical Theory of Statistics

Image of the Page - 288 -

Image of the Page - 288 - in Differential Geometrical Theory of Statistics

Text of the Page - 288 -

Entropy2016,18, 442 densitiesmandm′, anddenotebyH(m)=H×(m,m)= ∫ Xm(x) log 1 m(x)dx=− ∫ Xm(x) logm(x)dx theShannonentropy[4]. ThentheKullback–Leiblerdivergencebetweenmandm′ isgivenby: KL(m :m′)=H×(m,m′)−H(m)= ∫ X m(x) log m(x) m′(x)dx≥0. (1) Thenotation“:” isused insteadof theusualcomma“,”notationtoemphasize that thedistance isnotametricdistancesinceneither is it symmetric (KL(m :m′) =KL(m′ :m)), nordoes it satisfy the triangular inequality [4]ofmetricdistances (KL(m :m′)+KL(m′ :m′′) ≥KL(m :m′′)).Whenthe naturalbaseof the logarithmischosen,wegetadifferentialentropymeasureexpressedinnatunits. Alternatively,wecanalsouse thebase-2 logarithm(log2x= logx log2)andget theentropyexpressed inbit units.AlthoughtheKLdivergence isavailable inclosed-formformanydistributions (inparticular as equivalent Bregmandivergences for exponential families [5], seeAppendixC), it was proven that theKullback–Leiblerdivergencebetween two (univariate)GMMs isnot analytic [6] (see also theparticular caseofaGMMof twocomponentswith thesamevariance thatwasanalyzed in [7]). SeeAppendixAfor ananalysis. Note that thedifferential entropymaybenegative. For example, thedifferential entropyofaunivariateGaussiandistribution is log(σ √ 2πe), and is thereforenegative whenthestandardvarianceσ< 1√ 2πe ≈0.242.Weconsidercontinuousdistributionswithentropies well-defined(entropymaybeundefinedforsingulardistributions likeCantor’sdistribution[8]). 1.1. PriorWork Many approximation techniques have been designed to beat the computationally intensive MonteCarlo (MC) stochastic estimation: K̂Ls(m : m′) = 1s∑ s i=1 log m(xi) m′(xi) with x1, . . . ,xs ∼ m(x) (s independently and identically distributed (i.i.d.) samples x1, . . . ,xs). The MC estimator is asymptotically consistent, lims→∞ K̂Ls(m : m′) = KL(m : m′), so that the “true value” of theKL ofmixtures isestimatedinpracticebytakingavery largesample (say, s=109).However,wepoint out that theMCestimatorgivesasoutputastochasticapproximation, andthereforedoesnotguarantee deterministicbounds(confidenceintervalsmaybeused).Deterministic lowerandupperboundsof the integral canbeobtainedbyvariousnumerical integration techniquesusingquadraturerules.Werefer to [9–12] for thecurrentstate-of-the-artapproximationtechniquesandboundsontheKLofGMMs. The latestworkforcomputing theentropyofGMMsis [13]. It considersarbitraryfinely tunedbounds of theentropyof isotropicGaussianmixtures (acaseencounteredwhendealingwithKDEs,kernel densityestimators).However, there isacatch in the techniqueof [13]: It reliesonsolvingtheunique rootsof somelog-sum-expequations (SeeTheorem1of [13],p. 3342) thatdonotadmitaclosed-form solution. Thus it isahybridmethodthatcontrastswithourcombinatorialapproach. Boundsof the KLdivergencebetweenmixturemodelscanbegeneralized toboundsof the likelihoodfunctionof mixturemodels [14],because log-likelihoodis just theKLbetweentheempiricaldistributionandthe mixturemodeluptoaconstantshift. In informationgeometry [15], amixture familyof linearly independentprobabilitydistributions p1(x), ...,pk(x) isdefinedbytheconvexcombinationof thosenon-parametriccomponentdistributions: m(x;η) =∑ki=1ηipi(x)with ηi > 0 and∑ k i=1ηi = 1. Amixture family induces aduallyflat space where theKullback–Leiblerdivergence isequivalent toaBregmandivergence [5,15]definedonthe η-parameters.However, in thatcase, theBregmanconvexgeneratorF(η)= ∫ m(x;η) logm(x;η)dx (the Shannon information) is not available in closed-form. Except for the family ofmultinomial distributions that isbothamixture family (withclosed-formKL(m :m′)=∑ki=1mi log mi m′i , thediscrete KL[4])andanexponential family [15]. 1.2. Contributions In thiswork,wepresentasimpleandefficientmethodthatbuildsalgorithmicallyaclosed-form formula thatguaranteesbothdeterministic lowerandupperboundsontheKLdivergencewithinan 288
back to the  book Differential Geometrical Theory of Statistics"
Differential Geometrical Theory of Statistics
Title
Differential Geometrical Theory of Statistics
Authors
Frédéric Barbaresco
Frank Nielsen
Editor
MDPI
Location
Basel
Date
2017
Language
English
License
CC BY-NC-ND 4.0
ISBN
978-3-03842-425-3
Size
17.0 x 24.4 cm
Pages
476
Keywords
Entropy, Coding Theory, Maximum entropy, Information geometry, Computational Information Geometry, Hessian Geometry, Divergence Geometry, Information topology, Cohomology, Shape Space, Statistical physics, Thermodynamics
Categories
Naturwissenschaften Physik
Web-Books
Library
Privacy
Imprint
Austria-Forum
Austria-Forum
Web-Books
Differential Geometrical Theory of Statistics