Page - 308 - in Differential Geometrical Theory of Statistics
Image of the Page - 308 -
Text of the Page - 308 -
Entropy2016,18, 442
withAk = ∫∞
x0 1√
2π exp(−x−k22 )dx. When k→∞, we haveAk → 1. Consider k0 ∈ N such that
Ak0 >0.9. Thentheradiusofconvergence r is suchthat:
1
r ≥ lim
k→∞ (
1
kk! 0.9exp ( k2
8 ))1
k
=∞.
Thus theconvergenceradius is r=0,andtherefore theKLdivergence isnotananalytic function
of theparameterw. TheKLofmixtures isanexampleofanon-analytic smoothfunction. (Notice that
theabsolutevalue isnotanalyticat0.)
AppendixB. Closed-FormFormulafor theKullback–LeiblerDivergencebetweenScaledand
TruncatedExponentialFamilies
Whencomputingapproximationbounds for theKLdivergencebetweentwomixturesm(x)and
m′(x),weendupwith the taskof computing ∫
Dwapa(x) log w′bp ′
b(x)
w′cp′c(x) dxwhereD⊆X is a subsetof
the full supportX . We report a generic formula for computing these formulawhen themixture
(scaledandtruncated)componentsbelongto thesameexponential family [17].Anexponential family
hascanonical log-densitywrittenas l(x;θ)= logp(x;θ)= θ t(x)−F(θ)+k(x),where t(x)denotes
thesufficientstatistics,F(θ) the log-normalizer (alsocalledcumulant functionorpartitionfunction),
andk(x)anauxiliarycarrier term.
LetKL(w1p1 :w2p2 :w3p3)= ∫
Xw1p1(x) log w2p2(x)
w3p3(x) dx=H×(w1p1 :w3p3)−H×(w1p1 :w2p2).
Since it is adifferenceof twocross-entropies,weget for threedistributionsbelonging to the same
exponential family [26] the followingformula:
KL(w1p1 :w2p2 :w3p3)=w1 log w2
w3 +w1(F(θ3)−F(θ2)−(θ3−θ2) ∇F(θ1)).
Furthermore, when the support is restricted, say to support range D ⊆ X , let
mD(θ)= ∫
D p(x;θ)dxdenotethemassand ˜p(x;θ)= p(x;θ)
mD(θ) thenormalizeddistribution.
Thenwehave:∫
D w1p1(x) log w2p2(x)
w3p3(x) dx=mD(θ1)(KL(w1p˜1 :w2p˜2 :w3p˜3))− logw2mD(θ3)w3mD(θ2).
WhenFD(θ)=F(θ)−logmD(θ) isstrictlyconvexanddifferentiablethen ˜p(x;θ) isanexponential
familyandtheclosed-formformulafollowsstraightforwardly.Otherwise,westillgetaclosed-formbut
needmorederivations. Forunivariatedistributions,wewriteD=(a,b)andmD(θ)= ∫ b
a p(x;θ)dx=
Pθ(b)−Pθ(a)wherePθ(a)= ∫ a p(x;θ)dxdenotes thecumulativedistributionfunction.
Theusual formula for truncatedandscaledKullback–Leiblerdivergence is:
KLD(wp(x;θ) :w′p(x;θ′))=wmD(θ) (
log w
w′+BF(θ ′ : θ) )
+w(θ′−θ) ∇mD(θ), (B1)
whereBF(θ′ : θ) isaBregmandivergence [5]:
BF(θ′ : θ)=F(θ′)−F(θ)−(θ′−θ) ∇F(θ).
This formula extends the classic formula [5] for full regular exponential families (by setting
w=w′=1andmD(θ)=1with∇mD(θ)=0).
Similar formulæareavailable for thecross-entropyandentropyofexponential families [26].
AppendixC. OntheApproximationofKLbetweenSmoothMixturesbyaBregman
Divergence [5]
Clearly, sinceBregmandivergencesarealwaysfinitewhileKLdivergencesmaydiverge,weneed
extraconditions toassert that theKLbetweenmixturescanbeapproximatedbyBregmandivergences.
308
Differential Geometrical Theory of Statistics
- Title
- Differential Geometrical Theory of Statistics
- Authors
- Frédéric Barbaresco
- Frank Nielsen
- Editor
- MDPI
- Location
- Basel
- Date
- 2017
- Language
- English
- License
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03842-425-3
- Size
- 17.0 x 24.4 cm
- Pages
- 476
- Keywords
- Entropy, Coding Theory, Maximum entropy, Information geometry, Computational Information Geometry, Hessian Geometry, Divergence Geometry, Information topology, Cohomology, Shape Space, Statistical physics, Thermodynamics
- Categories
- Naturwissenschaften Physik