Page - 297 - in Differential Geometrical Theory of Statistics
Image of the Page - 297 -
Text of the Page - 297 -
Entropy2016,18, 442
SeeAppendixBforaclosed-formformulawhendealingwithexponential familycomponents.
4.Boundingtheα-Divergence
Theα-divergence[15,32ā34]betweenm(x)=āki=1wipi(x)andm ā²(x)=āk ā²
i=1w ā²
ip ā²
i(x) isdeļ¬nedas
Dα (
m :mā² )
= 1
α(1āα) (
1ā ā«
X m(x)αmā²(x)1āαdx )
, (29)
which clearly satisļ¬esDα(m :mā²) = D1āα(mā² :m). The α-divergence is a family of information
divergencesparametrizedbyαāR\{0,1}. Letαā1,weget theKLdivergence (see [35] foraproof):
lim
αā1 Dα(m :mā²)=KL(m :mā²)= ā«
X m(x) log m(x)
mā²(x)dx, (30)
andαā0gives thereverseKLdivergence:
lim
αā0 Dα(m :mā²)=KL(mā² :m).
Other interesting values [33] include α = 1/2 (squaredHellinger distance), α = 2 (Pearson
Chi-squaredistance),α=ā1(NeymanChi-squaredistance), etc.Notably, theHellingerdistance isa
validdistancemetricwhichsatisļ¬esnon-negativity, symmetry,andthe triangle inequality. Ingeneral,
Dα(m : mā²) only satisļ¬es non-negativity so that Dα(m :mā²) ā„ 0 for anym(x) andmā²(x). It is
neithersymmetricnoradmitting the triangle inequality.Minimizationofα-divergencesallowsoneto
choosea trade-offbetweenmodeļ¬ttingandsupportļ¬ttingof theminimizer [36]. Theminimizerof
α-divergences includingMLEasaspecialcasehas interestingconnectionswithtranscendentalnumber
theory[37].
TocomputeDα(m :mā²) forgivenm(x)andmā²(x) reducestoevaluatetheHellinger integral [38,39]:
Hα(m :mā²)= ā«
X m(x)αmā²(x)1āαdx, (31)
which in general does not have a closed form, as itwas known that the α-divergence ofmixture
models isnotanalytic [6].Moreover,Hα(m :mā²)maydivergemakingtheα-divergenceunbounded.
OnceHα(m :mā²)canbesolved, theRĆ©nyiandTsallisdivergences [35]andingeneralSharmaāMittal
divergences [40]canbeeasilycomputed. Therefore theresultspresentedheredirectlyextendto those
divergence families.
Similar to thecaseofKLdivergence, theMonteCarlostochasticestimationofHα(m :mā²)canbe
computedeitheras
HĖnα (
m :mā² )
= 1
n n
ā
i=1 ( mā²(xi)
m(xi) )1āα
,
wherex1,. . . ,xnā¼m(x)are i.i.d. samples,oras
HĖnα (
m :mā² )
= 1
n n
ā
i=1 (
m(xi)
mā²(xi) )α
,
wherex1,. . . ,xnā¼mā²(x)are i.i.d. Ineithercase, it isconsistentsothat limnāā HĖnα (m :mā²)=Hα(m :mā²).
However, MC estimation requires a large sample and does not guarantee deterministic bounds.
The techniques described in [41]work in practice for very close distributions, and do not apply
betweenmixturemodels.Wewill thereforederivecombinatorialboundsforHα(m :mā²).Thestructure
of thisSectionisparallelwithSection2withnecessaryreformulationsforaclearpresentation.
297
Differential Geometrical Theory of Statistics
- Title
- Differential Geometrical Theory of Statistics
- Authors
- FrƩdƩric Barbaresco
- Frank Nielsen
- Editor
- MDPI
- Location
- Basel
- Date
- 2017
- Language
- English
- License
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03842-425-3
- Size
- 17.0 x 24.4 cm
- Pages
- 476
- Keywords
- Entropy, Coding Theory, Maximum entropy, Information geometry, Computational Information Geometry, Hessian Geometry, Divergence Geometry, Information topology, Cohomology, Shape Space, Statistical physics, Thermodynamics
- Categories
- Naturwissenschaften Physik