Page - 255 - in Differential Geometrical Theory of Statistics
Image of the Page - 255 -
Text of the Page - 255 -
Entropy2016,18, 277
algorithm,andextendstheEMalgorithm.Ourconvergenceproofdemandssomeregularity(continuity
anddifferentiability)of theestimateddivergencewithrespect to theparametervectorφ)which isnot
simplycheckedusing(2). Recentresults in thebookofRockafellarandWets [13]providesufficient
conditions to prove continuity anddifferentiability of supremal functions of the formof (2)with
respect toφ. Differentiabilitywith respect toφ still remainsaveryhard task; therefore, our results
covercaseswhentheobjective function isnotdifferentiable.
Thepaperisorganizedasfollows: inSection2,wepresentthegeneralcontext.Wealsopresentthe
derivationofouralgorithmfromtheEMalgorithmandpassingbyTseng’sgeneralization. InSection3,
wepresent someconvergenceproperties.Wediscuss inSection4avariantof thealgorithmwitha
theoreticalglobal infimum,andanexampleof the two-Gaussianmixturemodelandaconvergence
proof of the EMalgorithm in the spirit of our approach. Finally, Section 5 contains simulations
confirmingour claimabout the efficiencyand the robustnessof ourapproach in comparisonwith
theMLE.Thealgorithmisalsoappliedto theso-calledminimumdensitypowerdivergence (MDPD)
introducedby[14].
2.ADescriptionof theAlgorithm
2.1.GeneralContextandNotations
Let (X,Y) be a couple of random variables with joint probability density function f(x,y|φ)
parametrizedbyavectorofparametersφ∈Φ⊂Rd. Let (X1,Y1),··· , (Xn,Yn)ben copiesof (X,Y)
independentlyand identicallydistributed. Finally, let (x1,y1),··· ,(xn,yn)ben realizationsof then
copiesof (X,Y). Thexisare theunobserveddata (labels)andtheyisare theobservations. Thevector
of parametersφ is unknownandneeds tobe estimated. Theobserveddata yi are supposed tobe
realnumbers,andthe labelsxibelongtoaspaceX notnecessarilyfiniteunlessmentionedotherwise.
Themarginaldensityof theobserveddata isgivenby pφ(y)= ∫ f(x,y|φ)dx,wheredx is ameasure
definedonthe label space (forexample, thecountingmeasure ifweworkwithmixturemodels).
Foraparametrized function f withaparameter a,wewrite f(x|a). Weuse thenotationφk for
sequenceswith the indexabove. ThederivativesofarealvaluedfunctionψdefinedonRaredenoted
ψ′,ψ′′, etc.Wedenote∇f thegradientofa real function f definedonRd. Forageneric functionof two
(vectorial)argumentsD(φ|θ), then∇1D(φ|θ)denotes thegradientwithrespect to thefirst (vectorial)
variable. Finally, foranysetA,weuse int(A) todenote the interiorofA.
2.2. EMAlgorithmandTseng’sGeneralization
TheEMalgorithmestimates theunknownparametervectorby(see [15]):
φk+1=argmax
Φ E [
log(f(X,Y|φ)) ∣∣∣Y=y,φk] ,
whereX=(X1,··· ,Xn),Y=(Y1,··· ,Yn)andy=(y1,··· ,yn). By independencebetweenthecouples
(Xi,Yi)’s, theprevious iterationmaybewrittenas:
φk+1 = argmax
Φ n
∑
i=1 E [
log(f(Xi,Yi|φ)) ∣∣∣Yi=yi,φk]
= argmax
Φ n
∑
i=1 ∫
X log(f(x,yi|φ))hi(x|φk)dx, (6)
where hi(x|φk) = f(x,yi|φ k)
p φk(yi) is the conditional density of the labels (at step k) provided yi whichwe
supposetobepositivedx−almosteverywhere. It iswell-knownthat theEMiterationscanberewritten
asadifferencebetweenthe log-likelihoodandaKullback–Lieblerdistance-like function. Indeed,
255
Differential Geometrical Theory of Statistics
- Title
- Differential Geometrical Theory of Statistics
- Authors
- Frédéric Barbaresco
- Frank Nielsen
- Editor
- MDPI
- Location
- Basel
- Date
- 2017
- Language
- English
- License
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03842-425-3
- Size
- 17.0 x 24.4 cm
- Pages
- 476
- Keywords
- Entropy, Coding Theory, Maximum entropy, Information geometry, Computational Information Geometry, Hessian Geometry, Divergence Geometry, Information topology, Cohomology, Shape Space, Statistical physics, Thermodynamics
- Categories
- Naturwissenschaften Physik