Page - 255 - in Differential Geometrical Theory of Statistics

Image of the Page - 255 -

Text of the Page - 255 -

Entropy2016,18, 277 algorithm,andextendstheEMalgorithm.Ourconvergenceproofdemandssomeregularity(continuity anddifferentiability)of theestimateddivergencewithrespect to theparametervectorφ)which isnot simplycheckedusing(2). Recentresults in thebookofRockafellarandWets [13]providesufﬁcient conditions to prove continuity anddifferentiability of supremal functions of the formof (2)with respect toφ. Differentiabilitywith respect toφ still remainsaveryhard task; therefore, our results covercaseswhentheobjective function isnotdifferentiable. Thepaperisorganizedasfollows: inSection2,wepresentthegeneralcontext.Wealsopresentthe derivationofouralgorithmfromtheEMalgorithmandpassingbyTseng’sgeneralization. InSection3, wepresent someconvergenceproperties.Wediscuss inSection4avariantof thealgorithmwitha theoreticalglobal inﬁmum,andanexampleof the two-Gaussianmixturemodelandaconvergence proof of the EMalgorithm in the spirit of our approach. Finally, Section 5 contains simulations conﬁrmingour claimabout the efﬁciencyand the robustnessof ourapproach in comparisonwith theMLE.Thealgorithmisalsoappliedto theso-calledminimumdensitypowerdivergence (MDPD) introducedby[14]. 2.ADescriptionof theAlgorithm 2.1.GeneralContextandNotations Let (X,Y) be a couple of random variables with joint probability density function f(x,y|φ) parametrizedbyavectorofparametersφ∈Φ⊂Rd. Let (X1,Y1),··· , (Xn,Yn)ben copiesof (X,Y) independentlyand identicallydistributed. Finally, let (x1,y1),··· ,(xn,yn)ben realizationsof then copiesof (X,Y). Thexisare theunobserveddata (labels)andtheyisare theobservations. Thevector of parametersφ is unknownandneeds tobe estimated. Theobserveddata yi are supposed tobe realnumbers,andthe labelsxibelongtoaspaceX notnecessarilyﬁniteunlessmentionedotherwise. Themarginaldensityof theobserveddata isgivenby pφ(y)= ∫ f(x,y|φ)dx,wheredx is ameasure deﬁnedonthe label space (forexample, thecountingmeasure ifweworkwithmixturemodels). Foraparametrized function f withaparameter a,wewrite f(x|a). Weuse thenotationφk for sequenceswith the indexabove. ThederivativesofarealvaluedfunctionψdeﬁnedonRaredenoted ψ′,ψ′′, etc.Wedenote∇f thegradientofa real function f deﬁnedonRd. Forageneric functionof two (vectorial)argumentsD(φ|θ), then∇1D(φ|θ)denotes thegradientwithrespect to theﬁrst (vectorial) variable. Finally, foranysetA,weuse int(A) todenote the interiorofA. 2.2. EMAlgorithmandTseng’sGeneralization TheEMalgorithmestimates theunknownparametervectorby(see [15]): φk+1=argmax Φ E [ log(f(X,Y|φ)) ∣∣∣Y=y,φk] , whereX=(X1,··· ,Xn),Y=(Y1,··· ,Yn)andy=(y1,··· ,yn). By independencebetweenthecouples (Xi,Yi)’s, theprevious iterationmaybewrittenas: φk+1 = argmax Φ n ∑ i=1 E [ log(f(Xi,Yi|φ)) ∣∣∣Yi=yi,φk] = argmax Φ n ∑ i=1 ∫ X log(f(x,yi|φ))hi(x|φk)dx, (6) where hi(x|φk) = f(x,yi|φ k) p φk(yi) is the conditional density of the labels (at step k) provided yi whichwe supposetobepositivedx−almosteverywhere. It iswell-knownthat theEMiterationscanberewritten asadifferencebetweenthe log-likelihoodandaKullback–Lieblerdistance-like function. Indeed, 255

back to the book Differential Geometrical Theory of Statistics"

Differential Geometrical Theory of Statistics

Title: Differential Geometrical Theory of Statistics
Authors: Frédéric Barbaresco; Frank Nielsen
Editor: MDPI
Location: Basel
Date: 2017
Language: English
License: CC BY-NC-ND 4.0
ISBN: 978-3-03842-425-3
Size: 17.0 x 24.4 cm
Pages: 476
Keywords: Entropy, Coding Theory, Maximum entropy, Information geometry, Computational Information Geometry, Hessian Geometry, Divergence Geometry, Information topology, Cohomology, Shape Space, Statistical physics, Thermodynamics
Categories: Naturwissenschaften Physik