Seite - 256 - in Differential Geometrical Theory of Statistics
Bild der Seite - 256 -
Text der Seite - 256 -
Entropy2016,18, 277
φk+1 = argmax
Φ n
∑
i=1 ∫
X log ( hi(x|φ)×pφ(yi) ) hi(x|φk)dx
= argmax
Φ n
∑
i=1 ∫
X log (
pφ(yi) ) hi(x|φk)dx+ n
∑
i=1 ∫
X log(hi(x|φ))hi(x|φk)dx
= argmax
Φ n
∑
i=1 log (
pφ(yi) )
+ n
∑
i=1 ∫
X log ( hi(x|φ)
hi(x|φk) )
hi(x|φk)dx
+ n
∑
i=1 ∫
X log (
hi(x|φk) )
hi(x|φk)dx.
Thefinal line is justifiedbythe fact thathi(x|φ) isadensity, therefore it integrates to1. Theadditional
termdoesnotdependonφand,hence,canbeomitted.Wenowhavethefollowingiterativeprocedure:
φk+1=argmax
Φ n
∑
i=1 log ( pφ(yi|φ) )
+ n
∑
i=1 ∫
X log ( hi(x|φ)
hi(x|φk) )
hi(x|φk)dx.
Theprevious iterationhas the formofaproximalpointmaximizationof the log-likelihood, i.e.,
aperturbationof the log-likelihoodbyadistance-like functiondefinedontheconditionaldensities
of the labels. Tseng[2]generalizes this iterationbyallowinganynonnegativeconvex functionψ to
replace the t →−log(t) function. Tseng’s recurrence isdefinedby:
φk+1=argsup
φ J(φ)−Dψ(φ,φk), (7)
where J is the log-likelihoodfunctionandDψ isgivenby:
Dψ(φ,φk)= n
∑
i=1 ∫
X ψ ( hi(x|φ)
hi(x|φk) )
hi(x|φk)dx, (8)
foranyrealnonnegativeconvexfunctionψ suchthatψ(1)=ψ′(1)=0.Dψ(φ1,φ2) isnonnegative,and
Dψ(φ1,φ2)=0 ifandonly if∀i,hi(x|φ1)=hi(x|φ2)dxalmosteverywhere.
2.3.GeneralizationofTseng’sAlgorithm
We use the relationship between maximizing the log-likelihood and minimizing the
Kullback–Liebler divergence to generalize the previous algorithm. We, therefore, replace the
log-likelihood function by an estimate of a ϕ−divergenceDϕ between the true distribution and
themodel. Weuse thedual estimators of thedivergencepresented earlier in the introduction (2)
or (3),whichwedenote in thesamemanner Dˆϕ,unlessmentionedotherwise.Ournewalgorithmis
definedby:
φk+1=arginf
φ Dˆϕ(pφ,pφT)+ 1
n Dψ(φ,φk), (9)
where Dψ(φ,φk) is defined by (8). When ϕ(t) = −log(t)+ t−1, it is easy to see that we get
recurrence (7). Indeed, for thecaseof (2)wehave:
Dˆϕ(pφ,pφT)= sup
α 1
n n
∑
i=1 log(pα(yi))− 1n n
∑
i=1 log(pφ(yi)).
256
Differential Geometrical Theory of Statistics
- Titel
- Differential Geometrical Theory of Statistics
- Autoren
- Frédéric Barbaresco
- Frank Nielsen
- Herausgeber
- MDPI
- Ort
- Basel
- Datum
- 2017
- Sprache
- englisch
- Lizenz
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03842-425-3
- Abmessungen
- 17.0 x 24.4 cm
- Seiten
- 476
- Schlagwörter
- Entropy, Coding Theory, Maximum entropy, Information geometry, Computational Information Geometry, Hessian Geometry, Divergence Geometry, Information topology, Cohomology, Shape Space, Statistical physics, Thermodynamics
- Kategorien
- Naturwissenschaften Physik