Page - 293 - in Differential Geometrical Theory of Statistics
Image of the Page - 293 -
Text of the Page - 293 -
Entropy2016,18, 442
aregivenby{ρi,δ(ar), ρi,δ(ar+1)}. Inanycase, letρmini,δ (r)andρmaxi,δ (r) represent the resulting lower
andupperboundsofρi,δ(x) in Ir. Then tδ isboundedin therange Irby:
0< log (
1+∑
i =δ ρmini,δ (r) )
≤ tδ≤ log (
1+∑
i =δ ρmaxi,δ (r) )
≤ logk.
Inpractice,wealwaysgetbetterboundsusingtheshape-dependent techniqueat theexpenseof
computingoverallO(k2) intersectionpointsofthepairwisedensities.Wecall thoseboundsCEALBand
CEAUBforCombinatorialEnvelopeAdaptiveLowerBoundandCombinatorialEnvelopeAdaptive
UpperBound.
Letus illustrateonescenariowhere thisadaptive techniqueyieldsverygoodapproximations.
ConsideraGMMwithallvarianceσ2 tendingtozero (amixtureofkDiracs). Then inacombinatorial
slab Ir,wehaveρmaxi,δ (r)→0 forall i = δ, andthereforeweget tightbounds.
Asarelatedtechnique,wecouldalsoupperbound ∫ ar+1
ar logm(x)dxby (ar+1−ar) logm(ar,ar+1)
wherem(x,x′)denotes themaximalvalueof themixturedensity in therange (x,x′). Thismaximal
value is either foundat the slab extremities, or is amodeof theGMM. It then requires tofind the
modesofaGMM[29,30], forwhichnoanalytical solution isknowningeneral.
2.2.AnotherDerivationUsing theArithmetic-GeometricMeanInequality
Let us start by considering the inequality of arithmetic and geometric weighted means
(AGI,Arithmetic-Geometric Inequality)appliedto themixturecomponentdistributions:
m(x)= k
∑
i=1 wip(x;θi)≥ k
∏
i=1 p(x;θi)wi
withequalityholds iff. θ1= . . .= θk.
Togeta tractable formulawithapositive remainderof the log-sumtermlogm(x),weneedto
havethe logargumentgreaterorequal to1,andthusweshallwrite thepositiveremainder:
R(x)= log (
m(x)
∏ki=1p(x;θi)wi )
≥0.
Therefore,wecandecompose the log-sumintoa tractablepartandaremainderas:
logm(x)= k
∑
i=1 wi logp(x;θi)+ log (
m(x)
∏ki=1p(x;θi)wi )
. (15)
For exponential families, the first term can be integrated accurately. For the second term,
we notice that ∏ki=1p(x;θi) wi is a distribution in the same exponential family. We denote
p(x;θ0)=∏ki=1p(x;θi) wi. Then
R(x)= log (
k
∑
i=1 wi p(x;θi)
p(x;θ0) )
Astheratio p(x;θi)/p(x;θ0) canbeboundedaboveandbelowusingtechniques inSection2.1,
R(x) canbecorrespondinglybounded.Notice thesimilaritybetweenEquations (14)and(15). Thekey
difference with the adaptive bounds is that, here we choose p(x;θ0) instead of the dominating
component inm(x) as the “reference distribution” in the decomposition. This subtle difference
isnotpresented indetail inourexperimental studiesbutdiscussedhere forcompleteness. Essentially,
thegapof theboundsisuptothedifferencebetweenthegeometricaverageandthearithmeticaverage.
In theextremecase thatallmixturecomponentsare identical, thisgapwill reachzero. Thereforewe
293
Differential Geometrical Theory of Statistics
- Title
- Differential Geometrical Theory of Statistics
- Authors
- Frédéric Barbaresco
- Frank Nielsen
- Editor
- MDPI
- Location
- Basel
- Date
- 2017
- Language
- English
- License
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03842-425-3
- Size
- 17.0 x 24.4 cm
- Pages
- 476
- Keywords
- Entropy, Coding Theory, Maximum entropy, Information geometry, Computational Information Geometry, Hessian Geometry, Divergence Geometry, Information topology, Cohomology, Shape Space, Statistical physics, Thermodynamics
- Categories
- Naturwissenschaften Physik