Page - 289 - in Differential Geometrical Theory of Statistics

Image of the Page - 289 -

Text of the Page - 289 -

Entropy2016,18, 442 additive factorof logk+ logk′.Wethenfurtherreﬁneour techniquetoget improvedadaptivebounds. ForunivariateGMMs,weget thenon-adaptivebounds inO(k logk+k′ logk′) time,andtheadaptive boundsinO(k2+k′2) time.Toillustrateourgenerictechnique,wedemonstrateitbasedonExponential MixtureModels (EMMs),Gammamixtures,RMMsandGMMs.Weextendourpreliminaryresults onKLdivergence [16] toother informationtheoreticalmeasuressuchas thedifferentialentropyand α-divergences. 1.3. PaperOutline The paper is organized as follows. Section 2 describes the algorithmic construction of the formulausingpiecewise log-sum-exp inequalities for the cross-entropy and theKullback–Leibler divergence. Section 3 instantiates this algorithmic principle to the entropy anddiscusses related works. Section4extends theproposedbounds to the familyofalphadivergences. Section5discusses an extension of the lower bound to f-divergences. Section 6 reports our experimental results on severalmixture families. Finally, Section 7 concludes thisworkbydiscussing extensions to other statisticaldistances.AppendixAproves that theKullback–Leiblerdivergenceofmixturemodels is notanalytic [6].AppendixBreports theclosed-formformula for theKLdivergencebetweenscaled andtruncateddistributionsof thesameexponential family [17] (that includeRayleigh,Gaussianand Gammadistributionsamongothers).AppendixCshowsthat theKLdivergencebetweentwomixtures canbeapproximatedbyaBregmandivergence. 2.AGenericCombinatorialBoundingAlgorithmBasedonDensityEnvelopes Let us bound the cross-entropy H×(m : m′) by deterministic lower and upper bounds, L×(m :m′)≤H×(m :m′)≤U×(m :m′), so that the bounds on the Kullback–Leibler divergence KL(m :m′)=H×(m :m′)−H×(m :m) followsas: L×(m :m′)−U×(m :m)≤KL(m :m′)≤U×(m :m′)−L×(m :m). (2) Since thecross-entropyof twomixtures∑ki=1wipi(x)and∑ k′ j=1w ′ jp ′ j(x): H×(m :m′)=− ∫ X ( k ∑ i=1 wipi(x) ) log ( k′ ∑ j=1 w′jp ′ j(x) ) dx (3) hasalog-sumtermofpositivearguments,weshalluseboundsonthelog-sum-exp(lse) function[18,19]: lse ( {xi}li=1 ) = log ( l ∑ i=1 exi ) . Wehavethe followingbasic inequalities: max{xi}li=1< lse ( {xi}li=1 ) ≤ log l+max{xi}li=1. (4) The left-hand-side (LHS)strict inequalityholdsbecause∑li=1e xi>max{exi}li=1= exp ( max{xi}li=1 ) since ex > 0,∀x ∈ R. The right-hand-side (RHS) inequality follows from the fact that ∑li=1e xi≤ lmax{exi}li=1= lexp(max{xi}li=1), andequalityholds ifandonly ifx1= · · ·= xl. The lse function isconvexbutnotstrictlyconvex, seeexercise7.9 [20]. It isknown[21] that theconjugateof the lse functionis thenegativeentropyrestrictedtotheprobabilitysimplex. Thelse functionenjoysthe followingtranslation identityproperty: lse ( {xi}li=1 ) = c+ lse ( {xi−c}li=1 ) ,∀c∈R. Similarly,we 289

back to the book Differential Geometrical Theory of Statistics"

Differential Geometrical Theory of Statistics

Title: Differential Geometrical Theory of Statistics
Authors: Frédéric Barbaresco; Frank Nielsen
Editor: MDPI
Location: Basel
Date: 2017
Language: English
License: CC BY-NC-ND 4.0
ISBN: 978-3-03842-425-3
Size: 17.0 x 24.4 cm
Pages: 476
Keywords: Entropy, Coding Theory, Maximum entropy, Information geometry, Computational Information Geometry, Hessian Geometry, Divergence Geometry, Information topology, Cohomology, Shape Space, Statistical physics, Thermodynamics
Categories: Naturwissenschaften Physik