Seite - 306 - in Differential Geometrical Theory of Statistics
Bild der Seite - 306 -
Text der Seite - 306 -
Entropy2016,18, 442
For amorequantitative comparison, Table 1 shows the estimatedα-divergencebyMC,Basic,
Adaptive,andVR.AsDα isdefinedonR\{0,1}, theKLboundsCE(A)LBandCE(A)UBarepresented
forα=0or1.Overall,wehavethe followingorderofgapsize: Basic>Adaptive>VR,andVRis
recommendedingeneral forboundingα-divergences. Therearecertaincases that theupperVRbound
is looser thanAdaptive. Inpracticeonecancompute the intersectionof theseboundsaswellas the
trivialboundDα(m :m′)≥0 toget thebestestimation.
Table1.TheestimatedDα anditsbounds. The95%confidence interval is shownforMC.
α MC(102) MC(103) MC(104) Basic Adaptive VR
L U L U L U
GMM1&GMM2 0 15.96±3.9 12.30±1.0 13.63±0.3 11.75 15.89 12.96 14.63
0.01 13.36±2.9 10.63±0.8 11.66±0.3 −700.50 11.73 −77.33 11.73 11.40 12.27
0.5 3.57±0.3 3.47±0.1 3.47±0.07 −0.60 3.42 3.01 3.42 3.17 3.51
0.99 40.04±7.7 37.22±2.3 38.58±0.8 −333.90 39.04 5.36 38.98 38.28 38.96
1 104.01±28 84.96±7.2 92.57±2.5 91.44 95.59 92.76 94.41
GMM3&GMM4 0 0.71±0.2 0.63±0.07 0.62±0.02 0.00 1.76 0.00 1.16
0.01 0.71±0.2 0.63±0.07 0.62±0.02 −179.13 7.63 −38.74 4.96 0.29 1.00
0.5 0.82±0.3 0.57±0.1 0.62±0.04 −5.23 0.93 −0.71 0.85 −0.18 1.19
0.99 0.79±0.3 0.76±0.1 0.80±0.03 −165.72 12.10 −59.76 9.11 0.37 1.28
1 0.80±0.3 0.77±0.1 0.81±0.03 0.00 1.82 0.31 1.40
NotethesimilaritybetweenKLinEquation(30)andtheexpressioninEquation(47).Wegivewithout
a formalanalysis that:CEAL(U)Bisequivalent toVRat the limitα→0orα→1. Experimentallyas
weslowlysetα→1,wecansee thatVRisconsistentwithCEAL(U)B.
7.ConcludingRemarksandPerspectives
WehavepresentedafastversatilemethodtocomputeboundsontheKullback–Leiblerdivergence
betweenmixturesbybuildingalgorithmic formulae. We reportedonour experiments forvarious
mixturemodels in theexponential family. ForunivariateGMMs,wegetaguaranteedboundof theKL
divergenceoftwomixturesmandm′withkandk′componentswithinanadditiveapproximationfactor
of logk+ logk′ inO((k+k′) log(k+k′))-time. Therefore, the larger theKLdivergence, the better
the boundwhen considering amultiplicative (1+α)-approximation factor, since α = logk+logk ′
KL(m:m′) .
Theadaptiveboundsareguaranteedtoyieldbetterboundsat theexpenseofcomputingpotentially
O ( k2+(k′)2 )
intersectionpointsofpairwiseweightedcomponents.
Our techniquealsoyields theboundfor the Jeffreysdivergence (thesymmetrizedKLdivergence:
J(m,m′)=KL(m :m′)+KL(m′ :m)) andthe Jensen–Shannondivergence [47] (JS):
JS(m,m′)= 1
2 (
KL (
m : m+m′
2 )
+KL (
m′ : m+m ′
2 ))
,
since m+m ′
2 is amixturemodelwith k+k ′ components. One advantage of this statistical distance
is that it is symmetric, always boundedby log2, and its square root yields ametric distance [48].
The log-sum-expinequalitiesmayalsobeusedtocomputesomeRényidivergences [35]:
Rα(m,p)= 1
α−1 log (∫
m(x)αp(x)1−α )
dx,
when α is an integer, m(x) a mixture and p(x) a single (component) distribution. Getting fast
guaranteedtightboundsonstatisticaldistancesbetweenmixturesopensmanyavenues. Forexample,
wemayconsiderbuildinghierarchicalmixturemodelsbymergingiterativelytwomixturecomponents
so that thosepairsofcomponentsarechosenso that theKLdistancebetweenthe fullmixtureandthe
simplifiedmixture isminimized.
In order to be useful, our technique is unfortunately limited to univariatemixtures: indeed,
in higher dimensions, we can still compute themaximization diagramofweighted components
306
Differential Geometrical Theory of Statistics
- Titel
- Differential Geometrical Theory of Statistics
- Autoren
- Frédéric Barbaresco
- Frank Nielsen
- Herausgeber
- MDPI
- Ort
- Basel
- Datum
- 2017
- Sprache
- englisch
- Lizenz
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03842-425-3
- Abmessungen
- 17.0 x 24.4 cm
- Seiten
- 476
- Schlagwörter
- Entropy, Coding Theory, Maximum entropy, Information geometry, Computational Information Geometry, Hessian Geometry, Divergence Geometry, Information topology, Cohomology, Shape Space, Statistical physics, Thermodynamics
- Kategorien
- Naturwissenschaften Physik