Page - 308 - in Differential Geometrical Theory of Statistics

Image of the Page - 308 -

Text of the Page - 308 -

Entropy2016,18, 442 withAk = ∫∞ x0 1√ 2π exp(−x−k22 )dx. When k→∞, we haveAk → 1. Consider k0 ∈ N such that Ak0 >0.9. Thentheradiusofconvergence r is suchthat: 1 r ≥ lim k→∞ ( 1 kk! 0.9exp ( k2 8 ))1 k =∞. Thus theconvergenceradius is r=0,andtherefore theKLdivergence isnotananalytic function of theparameterw. TheKLofmixtures isanexampleofanon-analytic smoothfunction. (Notice that theabsolutevalue isnotanalyticat0.) AppendixB. Closed-FormFormulafor theKullback–LeiblerDivergencebetweenScaledand TruncatedExponentialFamilies Whencomputingapproximationbounds for theKLdivergencebetweentwomixturesm(x)and m′(x),weendupwith the taskof computing ∫ Dwapa(x) log w′bp ′ b(x) w′cp′c(x) dxwhereD⊆X is a subsetof the full supportX . We report a generic formula for computing these formulawhen themixture (scaledandtruncated)componentsbelongto thesameexponential family [17].Anexponential family hascanonical log-densitywrittenas l(x;θ)= logp(x;θ)= θ t(x)−F(θ)+k(x),where t(x)denotes thesufﬁcientstatistics,F(θ) the log-normalizer (alsocalledcumulant functionorpartitionfunction), andk(x)anauxiliarycarrier term. LetKL(w1p1 :w2p2 :w3p3)= ∫ Xw1p1(x) log w2p2(x) w3p3(x) dx=H×(w1p1 :w3p3)−H×(w1p1 :w2p2). Since it is adifferenceof twocross-entropies,weget for threedistributionsbelonging to the same exponential family [26] the followingformula: KL(w1p1 :w2p2 :w3p3)=w1 log w2 w3 +w1(F(θ3)−F(θ2)−(θ3−θ2) ∇F(θ1)). Furthermore, when the support is restricted, say to support range D ⊆ X , let mD(θ)= ∫ D p(x;θ)dxdenotethemassand ˜p(x;θ)= p(x;θ) mD(θ) thenormalizeddistribution. Thenwehave:∫ D w1p1(x) log w2p2(x) w3p3(x) dx=mD(θ1)(KL(w1p˜1 :w2p˜2 :w3p˜3))− logw2mD(θ3)w3mD(θ2). WhenFD(θ)=F(θ)−logmD(θ) isstrictlyconvexanddifferentiablethen ˜p(x;θ) isanexponential familyandtheclosed-formformulafollowsstraightforwardly.Otherwise,westillgetaclosed-formbut needmorederivations. Forunivariatedistributions,wewriteD=(a,b)andmD(θ)= ∫ b a p(x;θ)dx= Pθ(b)−Pθ(a)wherePθ(a)= ∫ a p(x;θ)dxdenotes thecumulativedistributionfunction. Theusual formula for truncatedandscaledKullback–Leiblerdivergence is: KLD(wp(x;θ) :w′p(x;θ′))=wmD(θ) ( log w w′+BF(θ ′ : θ) ) +w(θ′−θ) ∇mD(θ), (B1) whereBF(θ′ : θ) isaBregmandivergence [5]: BF(θ′ : θ)=F(θ′)−F(θ)−(θ′−θ) ∇F(θ). This formula extends the classic formula [5] for full regular exponential families (by setting w=w′=1andmD(θ)=1with∇mD(θ)=0). Similar formulæareavailable for thecross-entropyandentropyofexponential families [26]. AppendixC. OntheApproximationofKLbetweenSmoothMixturesbyaBregman Divergence [5] Clearly, sinceBregmandivergencesarealwaysﬁnitewhileKLdivergencesmaydiverge,weneed extraconditions toassert that theKLbetweenmixturescanbeapproximatedbyBregmandivergences. 308

back to the book Differential Geometrical Theory of Statistics"

Differential Geometrical Theory of Statistics

Title: Differential Geometrical Theory of Statistics
Authors: Frédéric Barbaresco; Frank Nielsen
Editor: MDPI
Location: Basel
Date: 2017
Language: English
License: CC BY-NC-ND 4.0
ISBN: 978-3-03842-425-3
Size: 17.0 x 24.4 cm
Pages: 476
Keywords: Entropy, Coding Theory, Maximum entropy, Information geometry, Computational Information Geometry, Hessian Geometry, Divergence Geometry, Information topology, Cohomology, Shape Space, Statistical physics, Thermodynamics
Categories: Naturwissenschaften Physik