Page - 288 - in Differential Geometrical Theory of Statistics
Image of the Page - 288 -
Text of the Page - 288 -
Entropy2016,18, 442
densitiesmandm′, anddenotebyH(m)=H×(m,m)= ∫
Xm(x) log 1
m(x)dx=− ∫
Xm(x) logm(x)dx
theShannonentropy[4]. ThentheKullback–Leiblerdivergencebetweenmandm′ isgivenby:
KL(m :m′)=H×(m,m′)−H(m)= ∫
X m(x) log m(x)
m′(x)dx≥0. (1)
Thenotation“:” isused insteadof theusualcomma“,”notationtoemphasize that thedistance
isnotametricdistancesinceneither is it symmetric (KL(m :m′) =KL(m′ :m)), nordoes it satisfy
the triangular inequality [4]ofmetricdistances (KL(m :m′)+KL(m′ :m′′) ≥KL(m :m′′)).Whenthe
naturalbaseof the logarithmischosen,wegetadifferentialentropymeasureexpressedinnatunits.
Alternatively,wecanalsouse thebase-2 logarithm(log2x= logx
log2)andget theentropyexpressed inbit
units.AlthoughtheKLdivergence isavailable inclosed-formformanydistributions (inparticular
as equivalent Bregmandivergences for exponential families [5], seeAppendixC), it was proven
that theKullback–Leiblerdivergencebetween two (univariate)GMMs isnot analytic [6] (see also
theparticular caseofaGMMof twocomponentswith thesamevariance thatwasanalyzed in [7]).
SeeAppendixAfor ananalysis. Note that thedifferential entropymaybenegative. For example,
thedifferential entropyofaunivariateGaussiandistribution is log(σ √
2πe), and is thereforenegative
whenthestandardvarianceσ< 1√
2πe ≈0.242.Weconsidercontinuousdistributionswithentropies
well-defined(entropymaybeundefinedforsingulardistributions likeCantor’sdistribution[8]).
1.1. PriorWork
Many approximation techniques have been designed to beat the computationally intensive
MonteCarlo (MC) stochastic estimation: K̂Ls(m : m′) = 1s∑ s
i=1 log m(xi)
m′(xi) with x1, . . . ,xs ∼ m(x)
(s independently and identically distributed (i.i.d.) samples x1, . . . ,xs). The MC estimator is
asymptotically consistent, lims→∞ K̂Ls(m : m′) = KL(m : m′), so that the “true value” of theKL
ofmixtures isestimatedinpracticebytakingavery largesample (say, s=109).However,wepoint
out that theMCestimatorgivesasoutputastochasticapproximation, andthereforedoesnotguarantee
deterministicbounds(confidenceintervalsmaybeused).Deterministic lowerandupperboundsof the
integral canbeobtainedbyvariousnumerical integration techniquesusingquadraturerules.Werefer
to [9–12] for thecurrentstate-of-the-artapproximationtechniquesandboundsontheKLofGMMs.
The latestworkforcomputing theentropyofGMMsis [13]. It considersarbitraryfinely tunedbounds
of theentropyof isotropicGaussianmixtures (acaseencounteredwhendealingwithKDEs,kernel
densityestimators).However, there isacatch in the techniqueof [13]: It reliesonsolvingtheunique
rootsof somelog-sum-expequations (SeeTheorem1of [13],p. 3342) thatdonotadmitaclosed-form
solution. Thus it isahybridmethodthatcontrastswithourcombinatorialapproach. Boundsof the
KLdivergencebetweenmixturemodelscanbegeneralized toboundsof the likelihoodfunctionof
mixturemodels [14],because log-likelihoodis just theKLbetweentheempiricaldistributionandthe
mixturemodeluptoaconstantshift.
In informationgeometry [15], amixture familyof linearly independentprobabilitydistributions
p1(x), ...,pk(x) isdefinedbytheconvexcombinationof thosenon-parametriccomponentdistributions:
m(x;η) =∑ki=1ηipi(x)with ηi > 0 and∑ k
i=1ηi = 1. Amixture family induces aduallyflat space
where theKullback–Leiblerdivergence isequivalent toaBregmandivergence [5,15]definedonthe
η-parameters.However, in thatcase, theBregmanconvexgeneratorF(η)= ∫
m(x;η) logm(x;η)dx
(the Shannon information) is not available in closed-form. Except for the family ofmultinomial
distributions that isbothamixture family (withclosed-formKL(m :m′)=∑ki=1mi log mi
m′i , thediscrete
KL[4])andanexponential family [15].
1.2. Contributions
In thiswork,wepresentasimpleandefficientmethodthatbuildsalgorithmicallyaclosed-form
formula thatguaranteesbothdeterministic lowerandupperboundsontheKLdivergencewithinan
288
Differential Geometrical Theory of Statistics
- Title
- Differential Geometrical Theory of Statistics
- Authors
- Frédéric Barbaresco
- Frank Nielsen
- Editor
- MDPI
- Location
- Basel
- Date
- 2017
- Language
- English
- License
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03842-425-3
- Size
- 17.0 x 24.4 cm
- Pages
- 476
- Keywords
- Entropy, Coding Theory, Maximum entropy, Information geometry, Computational Information Geometry, Hessian Geometry, Divergence Geometry, Information topology, Cohomology, Shape Space, Statistical physics, Thermodynamics
- Categories
- Naturwissenschaften Physik