Seite - 297 - in Differential Geometrical Theory of Statistics
Bild der Seite - 297 -
Text der Seite - 297 -
Entropy2016,18, 442
SeeAppendixBforaclosed-formformulawhendealingwithexponential familycomponents.
4.Boundingtheα-Divergence
Theα-divergence[15,32–34]betweenm(x)=∑ki=1wipi(x)andm ′(x)=∑k ′
i=1w ′
ip ′
i(x) isdefinedas
Dα (
m :m′ )
= 1
α(1−α) (
1− ∫
X m(x)αm′(x)1−αdx )
, (29)
which clearly satisfiesDα(m :m′) = D1−α(m′ :m). The α-divergence is a family of information
divergencesparametrizedbyα∈R\{0,1}. Letα→1,weget theKLdivergence (see [35] foraproof):
lim
α→1 Dα(m :m′)=KL(m :m′)= ∫
X m(x) log m(x)
m′(x)dx, (30)
andα→0gives thereverseKLdivergence:
lim
α→0 Dα(m :m′)=KL(m′ :m).
Other interesting values [33] include α = 1/2 (squaredHellinger distance), α = 2 (Pearson
Chi-squaredistance),α=−1(NeymanChi-squaredistance), etc.Notably, theHellingerdistance isa
validdistancemetricwhichsatisfiesnon-negativity, symmetry,andthe triangle inequality. Ingeneral,
Dα(m : m′) only satisfies non-negativity so that Dα(m :m′) ≥ 0 for anym(x) andm′(x). It is
neithersymmetricnoradmitting the triangle inequality.Minimizationofα-divergencesallowsoneto
choosea trade-offbetweenmodefittingandsupportfittingof theminimizer [36]. Theminimizerof
α-divergences includingMLEasaspecialcasehas interestingconnectionswithtranscendentalnumber
theory[37].
TocomputeDα(m :m′) forgivenm(x)andm′(x) reducestoevaluatetheHellinger integral [38,39]:
Hα(m :m′)= ∫
X m(x)αm′(x)1−αdx, (31)
which in general does not have a closed form, as itwas known that the α-divergence ofmixture
models isnotanalytic [6].Moreover,Hα(m :m′)maydivergemakingtheα-divergenceunbounded.
OnceHα(m :m′)canbesolved, theRényiandTsallisdivergences [35]andingeneralSharma–Mittal
divergences [40]canbeeasilycomputed. Therefore theresultspresentedheredirectlyextendto those
divergence families.
Similar to thecaseofKLdivergence, theMonteCarlostochasticestimationofHα(m :m′)canbe
computedeitheras
Hˆnα (
m :m′ )
= 1
n n
∑
i=1 ( m′(xi)
m(xi) )1−α
,
wherex1,. . . ,xn∼m(x)are i.i.d. samples,oras
Hˆnα (
m :m′ )
= 1
n n
∑
i=1 (
m(xi)
m′(xi) )α
,
wherex1,. . . ,xn∼m′(x)are i.i.d. Ineithercase, it isconsistentsothat limn→∞ Hˆnα (m :m′)=Hα(m :m′).
However, MC estimation requires a large sample and does not guarantee deterministic bounds.
The techniques described in [41]work in practice for very close distributions, and do not apply
betweenmixturemodels.Wewill thereforederivecombinatorialboundsforHα(m :m′).Thestructure
of thisSectionisparallelwithSection2withnecessaryreformulationsforaclearpresentation.
297
Differential Geometrical Theory of Statistics
- Titel
- Differential Geometrical Theory of Statistics
- Autoren
- Frédéric Barbaresco
- Frank Nielsen
- Herausgeber
- MDPI
- Ort
- Basel
- Datum
- 2017
- Sprache
- englisch
- Lizenz
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03842-425-3
- Abmessungen
- 17.0 x 24.4 cm
- Seiten
- 476
- Schlagwörter
- Entropy, Coding Theory, Maximum entropy, Information geometry, Computational Information Geometry, Hessian Geometry, Divergence Geometry, Information topology, Cohomology, Shape Space, Statistical physics, Thermodynamics
- Kategorien
- Naturwissenschaften Physik