Page - 256 - in Differential Geometrical Theory of Statistics

Image of the Page - 256 -

Text of the Page - 256 -

Entropy2016,18, 277 φk+1 = argmax Φ n ∑ i=1 ∫ X log ( hi(x|φ)×pφ(yi) ) hi(x|φk)dx = argmax Φ n ∑ i=1 ∫ X log ( pφ(yi) ) hi(x|φk)dx+ n ∑ i=1 ∫ X log(hi(x|φ))hi(x|φk)dx = argmax Φ n ∑ i=1 log ( pφ(yi) ) + n ∑ i=1 ∫ X log ( hi(x|φ) hi(x|φk) ) hi(x|φk)dx + n ∑ i=1 ∫ X log ( hi(x|φk) ) hi(x|φk)dx. Theﬁnal line is justiﬁedbythe fact thathi(x|φ) isadensity, therefore it integrates to1. Theadditional termdoesnotdependonφand,hence,canbeomitted.Wenowhavethefollowingiterativeprocedure: φk+1=argmax Φ n ∑ i=1 log ( pφ(yi|φ) ) + n ∑ i=1 ∫ X log ( hi(x|φ) hi(x|φk) ) hi(x|φk)dx. Theprevious iterationhas the formofaproximalpointmaximizationof the log-likelihood, i.e., aperturbationof the log-likelihoodbyadistance-like functiondeﬁnedontheconditionaldensities of the labels. Tseng[2]generalizes this iterationbyallowinganynonnegativeconvex functionψ to replace the t →−log(t) function. Tseng’s recurrence isdeﬁnedby: φk+1=argsup φ J(φ)−Dψ(φ,φk), (7) where J is the log-likelihoodfunctionandDψ isgivenby: Dψ(φ,φk)= n ∑ i=1 ∫ X ψ ( hi(x|φ) hi(x|φk) ) hi(x|φk)dx, (8) foranyrealnonnegativeconvexfunctionψ suchthatψ(1)=ψ′(1)=0.Dψ(φ1,φ2) isnonnegative,and Dψ(φ1,φ2)=0 ifandonly if∀i,hi(x|φ1)=hi(x|φ2)dxalmosteverywhere. 2.3.GeneralizationofTseng’sAlgorithm We use the relationship between maximizing the log-likelihood and minimizing the Kullback–Liebler divergence to generalize the previous algorithm. We, therefore, replace the log-likelihood function by an estimate of a ϕ−divergenceDϕ between the true distribution and themodel. Weuse thedual estimators of thedivergencepresented earlier in the introduction (2) or (3),whichwedenote in thesamemanner Dˆϕ,unlessmentionedotherwise.Ournewalgorithmis deﬁnedby: φk+1=arginf φ Dˆϕ(pφ,pφT)+ 1 n Dψ(φ,φk), (9) where Dψ(φ,φk) is deﬁned by (8). When ϕ(t) = −log(t)+ t−1, it is easy to see that we get recurrence (7). Indeed, for thecaseof (2)wehave: Dˆϕ(pφ,pφT)= sup α 1 n n ∑ i=1 log(pα(yi))− 1n n ∑ i=1 log(pφ(yi)). 256

back to the book Differential Geometrical Theory of Statistics"

Differential Geometrical Theory of Statistics

Title: Differential Geometrical Theory of Statistics
Authors: Frédéric Barbaresco; Frank Nielsen
Editor: MDPI
Location: Basel
Date: 2017
Language: English
License: CC BY-NC-ND 4.0
ISBN: 978-3-03842-425-3
Size: 17.0 x 24.4 cm
Pages: 476
Keywords: Entropy, Coding Theory, Maximum entropy, Information geometry, Computational Information Geometry, Hessian Geometry, Divergence Geometry, Information topology, Cohomology, Shape Space, Statistical physics, Thermodynamics
Categories: Naturwissenschaften Physik