Page - 293 - in Differential Geometrical Theory of Statistics

Image of the Page - 293 -

Text of the Page - 293 -

Entropy2016,18, 442 aregivenby{ρi,δ(ar), ρi,δ(ar+1)}. Inanycase, letρmini,δ (r)andρmaxi,δ (r) represent the resulting lower andupperboundsofρi,δ(x) in Ir. Then tδ isboundedin therange Irby: 0< log ( 1+∑ i =δ ρmini,δ (r) ) ≤ tδ≤ log ( 1+∑ i =δ ρmaxi,δ (r) ) ≤ logk. Inpractice,wealwaysgetbetterboundsusingtheshape-dependent techniqueat theexpenseof computingoverallO(k2) intersectionpointsofthepairwisedensities.Wecall thoseboundsCEALBand CEAUBforCombinatorialEnvelopeAdaptiveLowerBoundandCombinatorialEnvelopeAdaptive UpperBound. Letus illustrateonescenariowhere thisadaptive techniqueyieldsverygoodapproximations. ConsideraGMMwithallvarianceσ2 tendingtozero (amixtureofkDiracs). Then inacombinatorial slab Ir,wehaveρmaxi,δ (r)→0 forall i = δ, andthereforeweget tightbounds. Asarelatedtechnique,wecouldalsoupperbound ∫ ar+1 ar logm(x)dxby (ar+1−ar) logm(ar,ar+1) wherem(x,x′)denotes themaximalvalueof themixturedensity in therange (x,x′). Thismaximal value is either foundat the slab extremities, or is amodeof theGMM. It then requires toﬁnd the modesofaGMM[29,30], forwhichnoanalytical solution isknowningeneral. 2.2.AnotherDerivationUsing theArithmetic-GeometricMeanInequality Let us start by considering the inequality of arithmetic and geometric weighted means (AGI,Arithmetic-Geometric Inequality)appliedto themixturecomponentdistributions: m(x)= k ∑ i=1 wip(x;θi)≥ k ∏ i=1 p(x;θi)wi withequalityholds iff. θ1= . . .= θk. Togeta tractable formulawithapositive remainderof the log-sumtermlogm(x),weneedto havethe logargumentgreaterorequal to1,andthusweshallwrite thepositiveremainder: R(x)= log ( m(x) ∏ki=1p(x;θi)wi ) ≥0. Therefore,wecandecompose the log-sumintoa tractablepartandaremainderas: logm(x)= k ∑ i=1 wi logp(x;θi)+ log ( m(x) ∏ki=1p(x;θi)wi ) . (15) For exponential families, the ﬁrst term can be integrated accurately. For the second term, we notice that ∏ki=1p(x;θi) wi is a distribution in the same exponential family. We denote p(x;θ0)=∏ki=1p(x;θi) wi. Then R(x)= log ( k ∑ i=1 wi p(x;θi) p(x;θ0) ) Astheratio p(x;θi)/p(x;θ0) canbeboundedaboveandbelowusingtechniques inSection2.1, R(x) canbecorrespondinglybounded.Notice thesimilaritybetweenEquations (14)and(15). Thekey difference with the adaptive bounds is that, here we choose p(x;θ0) instead of the dominating component inm(x) as the “reference distribution” in the decomposition. This subtle difference isnotpresented indetail inourexperimental studiesbutdiscussedhere forcompleteness. Essentially, thegapof theboundsisuptothedifferencebetweenthegeometricaverageandthearithmeticaverage. In theextremecase thatallmixturecomponentsare identical, thisgapwill reachzero. Thereforewe 293

back to the book Differential Geometrical Theory of Statistics"

Differential Geometrical Theory of Statistics

Title: Differential Geometrical Theory of Statistics
Authors: Frédéric Barbaresco; Frank Nielsen
Editor: MDPI
Location: Basel
Date: 2017
Language: English
License: CC BY-NC-ND 4.0
ISBN: 978-3-03842-425-3
Size: 17.0 x 24.4 cm
Pages: 476
Keywords: Entropy, Coding Theory, Maximum entropy, Information geometry, Computational Information Geometry, Hessian Geometry, Divergence Geometry, Information topology, Cohomology, Shape Space, Statistical physics, Thermodynamics
Categories: Naturwissenschaften Physik