Web-Books
in the Austria-Forum
Austria-Forum
Web-Books
Naturwissenschaften
Physik
Differential Geometrical Theory of Statistics
Page - 256 -
  • User
  • Version
    • full version
    • text only version
  • Language
    • Deutsch - German
    • English

Page - 256 - in Differential Geometrical Theory of Statistics

Image of the Page - 256 -

Image of the Page - 256 - in Differential Geometrical Theory of Statistics

Text of the Page - 256 -

Entropy2016,18, 277 φk+1 = argmax Φ n ∑ i=1 ∫ X log ( hi(x|φ)×pφ(yi) ) hi(x|φk)dx = argmax Φ n ∑ i=1 ∫ X log ( pφ(yi) ) hi(x|φk)dx+ n ∑ i=1 ∫ X log(hi(x|φ))hi(x|φk)dx = argmax Φ n ∑ i=1 log ( pφ(yi) ) + n ∑ i=1 ∫ X log ( hi(x|φ) hi(x|φk) ) hi(x|φk)dx + n ∑ i=1 ∫ X log ( hi(x|φk) ) hi(x|φk)dx. Thefinal line is justifiedbythe fact thathi(x|φ) isadensity, therefore it integrates to1. Theadditional termdoesnotdependonφand,hence,canbeomitted.Wenowhavethefollowingiterativeprocedure: φk+1=argmax Φ n ∑ i=1 log ( pφ(yi|φ) ) + n ∑ i=1 ∫ X log ( hi(x|φ) hi(x|φk) ) hi(x|φk)dx. Theprevious iterationhas the formofaproximalpointmaximizationof the log-likelihood, i.e., aperturbationof the log-likelihoodbyadistance-like functiondefinedontheconditionaldensities of the labels. Tseng[2]generalizes this iterationbyallowinganynonnegativeconvex functionψ to replace the t →−log(t) function. Tseng’s recurrence isdefinedby: φk+1=argsup φ J(φ)−Dψ(φ,φk), (7) where J is the log-likelihoodfunctionandDψ isgivenby: Dψ(φ,φk)= n ∑ i=1 ∫ X ψ ( hi(x|φ) hi(x|φk) ) hi(x|φk)dx, (8) foranyrealnonnegativeconvexfunctionψ suchthatψ(1)=ψ′(1)=0.Dψ(φ1,φ2) isnonnegative,and Dψ(φ1,φ2)=0 ifandonly if∀i,hi(x|φ1)=hi(x|φ2)dxalmosteverywhere. 2.3.GeneralizationofTseng’sAlgorithm We use the relationship between maximizing the log-likelihood and minimizing the Kullback–Liebler divergence to generalize the previous algorithm. We, therefore, replace the log-likelihood function by an estimate of a ϕ−divergenceDϕ between the true distribution and themodel. Weuse thedual estimators of thedivergencepresented earlier in the introduction (2) or (3),whichwedenote in thesamemanner Dˆϕ,unlessmentionedotherwise.Ournewalgorithmis definedby: φk+1=arginf φ Dˆϕ(pφ,pφT)+ 1 n Dψ(φ,φk), (9) where Dψ(φ,φk) is defined by (8). When ϕ(t) = −log(t)+ t−1, it is easy to see that we get recurrence (7). Indeed, for thecaseof (2)wehave: Dˆϕ(pφ,pφT)= sup α 1 n n ∑ i=1 log(pα(yi))− 1n n ∑ i=1 log(pφ(yi)). 256
back to the  book Differential Geometrical Theory of Statistics"
Differential Geometrical Theory of Statistics
Title
Differential Geometrical Theory of Statistics
Authors
Frédéric Barbaresco
Frank Nielsen
Editor
MDPI
Location
Basel
Date
2017
Language
English
License
CC BY-NC-ND 4.0
ISBN
978-3-03842-425-3
Size
17.0 x 24.4 cm
Pages
476
Keywords
Entropy, Coding Theory, Maximum entropy, Information geometry, Computational Information Geometry, Hessian Geometry, Divergence Geometry, Information topology, Cohomology, Shape Space, Statistical physics, Thermodynamics
Categories
Naturwissenschaften Physik
Web-Books
Library
Privacy
Imprint
Austria-Forum
Austria-Forum
Web-Books
Differential Geometrical Theory of Statistics