Page - 297 - in Differential Geometrical Theory of Statistics

Image of the Page - 297 -

Text of the Page - 297 -

Entropy2016,18, 442 SeeAppendixBforaclosed-formformulawhendealingwithexponential familycomponents. 4.Boundingtheα-Divergence Theα-divergence[15,32–34]betweenm(x)=∑ki=1wipi(x)andm ′(x)=∑k ′ i=1w ′ ip ′ i(x) isdeﬁnedas Dα ( m :m′ ) = 1 α(1−α) ( 1− ∫ X m(x)αm′(x)1−αdx ) , (29) which clearly satisﬁesDα(m :m′) = D1−α(m′ :m). The α-divergence is a family of information divergencesparametrizedbyα∈R\{0,1}. Letα→1,weget theKLdivergence (see [35] foraproof): lim α→1 Dα(m :m′)=KL(m :m′)= ∫ X m(x) log m(x) m′(x)dx, (30) andα→0gives thereverseKLdivergence: lim α→0 Dα(m :m′)=KL(m′ :m). Other interesting values [33] include α = 1/2 (squaredHellinger distance), α = 2 (Pearson Chi-squaredistance),α=−1(NeymanChi-squaredistance), etc.Notably, theHellingerdistance isa validdistancemetricwhichsatisﬁesnon-negativity, symmetry,andthe triangle inequality. Ingeneral, Dα(m : m′) only satisﬁes non-negativity so that Dα(m :m′) ≥ 0 for anym(x) andm′(x). It is neithersymmetricnoradmitting the triangle inequality.Minimizationofα-divergencesallowsoneto choosea trade-offbetweenmodeﬁttingandsupportﬁttingof theminimizer [36]. Theminimizerof α-divergences includingMLEasaspecialcasehas interestingconnectionswithtranscendentalnumber theory[37]. TocomputeDα(m :m′) forgivenm(x)andm′(x) reducestoevaluatetheHellinger integral [38,39]: Hα(m :m′)= ∫ X m(x)αm′(x)1−αdx, (31) which in general does not have a closed form, as itwas known that the α-divergence ofmixture models isnotanalytic [6].Moreover,Hα(m :m′)maydivergemakingtheα-divergenceunbounded. OnceHα(m :m′)canbesolved, theRényiandTsallisdivergences [35]andingeneralSharma–Mittal divergences [40]canbeeasilycomputed. Therefore theresultspresentedheredirectlyextendto those divergence families. Similar to thecaseofKLdivergence, theMonteCarlostochasticestimationofHα(m :m′)canbe computedeitheras Hˆnα ( m :m′ ) = 1 n n ∑ i=1 ( m′(xi) m(xi) )1−α , wherex1,. . . ,xn∼m(x)are i.i.d. samples,oras Hˆnα ( m :m′ ) = 1 n n ∑ i=1 ( m(xi) m′(xi) )α , wherex1,. . . ,xn∼m′(x)are i.i.d. Ineithercase, it isconsistentsothat limn→∞ Hˆnα (m :m′)=Hα(m :m′). However, MC estimation requires a large sample and does not guarantee deterministic bounds. The techniques described in [41]work in practice for very close distributions, and do not apply betweenmixturemodels.Wewill thereforederivecombinatorialboundsforHα(m :m′).Thestructure of thisSectionisparallelwithSection2withnecessaryreformulationsforaclearpresentation. 297

back to the book Differential Geometrical Theory of Statistics"

Differential Geometrical Theory of Statistics

Title: Differential Geometrical Theory of Statistics
Authors: Frédéric Barbaresco; Frank Nielsen
Editor: MDPI
Location: Basel
Date: 2017
Language: English
License: CC BY-NC-ND 4.0
ISBN: 978-3-03842-425-3
Size: 17.0 x 24.4 cm
Pages: 476
Keywords: Entropy, Coding Theory, Maximum entropy, Information geometry, Computational Information Geometry, Hessian Geometry, Divergence Geometry, Information topology, Cohomology, Shape Space, Statistical physics, Thermodynamics
Categories: Naturwissenschaften Physik