Page - 289 - in Differential Geometrical Theory of Statistics
Image of the Page - 289 -
Text of the Page - 289 -
Entropy2016,18, 442
additive factorof logk+ logkâ˛.WethenfurtherreďŹneour techniquetoget improvedadaptivebounds.
ForunivariateGMMs,weget thenon-adaptivebounds inO(k logk+kⲠlogkâ˛) time,andtheadaptive
boundsinO(k2+kâ˛2) time.Toillustrateourgenerictechnique,wedemonstrateitbasedonExponential
MixtureModels (EMMs),Gammamixtures,RMMsandGMMs.Weextendourpreliminaryresults
onKLdivergence [16] toother informationtheoreticalmeasuressuchas thedifferentialentropyand
Îą-divergences.
1.3. PaperOutline
The paper is organized as follows. Section 2 describes the algorithmic construction of the
formulausingpiecewise log-sum-exp inequalities for the cross-entropy and theKullbackâLeibler
divergence. Section 3 instantiates this algorithmic principle to the entropy anddiscusses related
works. Section4extends theproposedbounds to the familyofalphadivergences. Section5discusses
an extension of the lower bound to f-divergences. Section 6 reports our experimental results on
severalmixture families. Finally, Section 7 concludes thisworkbydiscussing extensions to other
statisticaldistances.AppendixAproves that theKullbackâLeiblerdivergenceofmixturemodels is
notanalytic [6].AppendixBreports theclosed-formformula for theKLdivergencebetweenscaled
andtruncateddistributionsof thesameexponential family [17] (that includeRayleigh,Gaussianand
Gammadistributionsamongothers).AppendixCshowsthat theKLdivergencebetweentwomixtures
canbeapproximatedbyaBregmandivergence.
2.AGenericCombinatorialBoundingAlgorithmBasedonDensityEnvelopes
Let us bound the cross-entropy HĂ(m : mâ˛) by deterministic lower and upper bounds,
LĂ(m :mâ˛)â¤HĂ(m :mâ˛)â¤UĂ(m :mâ˛), so that the bounds on the KullbackâLeibler divergence
KL(m :mâ˛)=HĂ(m :mâ˛)âHĂ(m :m) followsas:
LĂ(m :mâ˛)âUĂ(m :m)â¤KL(m :mâ˛)â¤UĂ(m :mâ˛)âLĂ(m :m). (2)
Since thecross-entropyof twomixturesâki=1wipi(x)andâ kâ˛
j=1w â˛
jp â˛
j(x):
HĂ(m :mâ˛)=â âŤ
X (
k
â
i=1 wipi(x) )
log ( kâ˛
â
j=1 wâ˛jp â˛
j(x) )
dx (3)
hasalog-sumtermofpositivearguments,weshalluseboundsonthelog-sum-exp(lse) function[18,19]:
lse (
{xi}li=1 )
= log (
l
â
i=1 exi )
.
Wehavethe followingbasic inequalities:
max{xi}li=1< lse (
{xi}li=1 )
⤠log l+max{xi}li=1. (4)
The left-hand-side (LHS)strict inequalityholdsbecauseâli=1e xi>max{exi}li=1= exp (
max{xi}li=1 )
since ex > 0,âx â R. The right-hand-side (RHS) inequality follows from the fact that
âli=1e xi⤠lmax{exi}li=1= lexp(max{xi}li=1), andequalityholds ifandonly ifx1= ¡ ¡ ¡= xl. The lse
function isconvexbutnotstrictlyconvex, seeexercise7.9 [20]. It isknown[21] that theconjugateof
the lse functionis thenegativeentropyrestrictedtotheprobabilitysimplex. Thelse functionenjoysthe
followingtranslation identityproperty: lse (
{xi}li=1 )
= c+ lse (
{xiâc}li=1 )
,âcâR. Similarly,we
289
Differential Geometrical Theory of Statistics
- Title
- Differential Geometrical Theory of Statistics
- Authors
- FrĂŠdĂŠric Barbaresco
- Frank Nielsen
- Editor
- MDPI
- Location
- Basel
- Date
- 2017
- Language
- English
- License
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03842-425-3
- Size
- 17.0 x 24.4 cm
- Pages
- 476
- Keywords
- Entropy, Coding Theory, Maximum entropy, Information geometry, Computational Information Geometry, Hessian Geometry, Divergence Geometry, Information topology, Cohomology, Shape Space, Statistical physics, Thermodynamics
- Categories
- Naturwissenschaften Physik