Page - 266 - in Differential Geometrical Theory of Statistics
Image of the Page - 266 -
Text of the Page - 266 -
Entropy2016,18, 277
amount andsuitablyadjustingλ, thevalueof hi(x|φ)wouldbeunchanged. Weexplore thismore
thoroughlybywritingthecorrespondingequations. Letussuppose,absurdly, that fordistinctφandφ′,
wehaveDψ(φ|φ′)=0. BydefinitionofDψ, it isgivenbyasumofnonnegative terms,which implies
thatall termsneedtobeequal tozero. Thefollowing linesareequivalent∀i∈{1,··· ,n}:
hi(0|λ,μ1,μ2) = hi(0|λ′,μ′1,μ′2),
λe−12(yi−μ1)2
λe−12(yi−μ1)2+(1−λ)e−12(yi−μ2)2 = λ′e−12(yi−μ′1)2
λ′e− 1
2(yi−μ′1)2+(1−λ′)e−12(yi−μ′2)2 ,
log ( 1−λ
λ )
− 1
2 (yi−μ2)2+ 12(yi−μ1)
2 = log ( 1−λ′
λ′ )
− 1
2 (yi−μ′2)2+ 1
2 (yi−μ′1)2.
Lookingat this setofnequationsasanequalityof twopolynomialsonyofdegree1atnpoints,
wededuce thataswehave twodistinctobservations, say,y1 andy2, the twopolynomialsneedtohave
thesamecoefficients. Thus, thesetofnequations isequivalent to the
followingtwoequations:{
μ1−μ2 = μ′1−μ′2
log (
1−λ
λ )
+ 12μ
2
1− 12μ22 = log ( 1−λ′
λ′ )
+ 12μ ′
1 2− 12μ′22. (21)
These twoequationswith threevariableshavean infinitenumberofsolutions. Take, forexample,
μ1=0, μ2=1, λ= 23, μ ′
1= 1
2, μ ′
2= 3
2, λ ′= 12.
Remark2. Theprevious conclusion canbe extended to any two-componentmixture of exponential families
having the form:
pφ(y)=λe∑ m1
i=1θ1,iy
i−F(θ1)+(1−λ)e∑m2i=1θ2,iyi−F(θ2).
Onemaywrite the correspondingn equations. Thepolynomial of yi hasadegreeof atmostmax(m1,m2).
Thus, if onedisposesofmax(m1,m2)+1distinctobservations, the twopolynomialswill have the samesetof
coefficients. Finally, if (θ1,θ2)∈Rd−1withd>max(m1,m2), thenassumptionA3doesnothold.
Unfortunately, we have no an information about the difference between consecutive terms
‖φk+1−φk‖except for thecaseofψ(t)= ϕ(t)=−log(t)+ t−1whichcorresponds to theclassical
EMrecurrence:
λk+1= 1
n n
∑
i=1 hi(0|φk), μk+11 = ∑ni=1yihi(0|φk)
∑ni=1hi(0|φk) μk+11 = ∑ni=1yihi(1|φk)
∑ni=1hi(1|φk) .
Tseng[2]hasshownthatwecanprovedirectly thatφk+1−φk converges to0.
5. SimulationStudy
We summarize the results of 100 experiments on 100 samples by giving the average of the
estimates and theerror committed, and the correspondingstandarddeviation. The criterionerror
is the totalvariationdistance (TVD),which is calculatedusing theL1distance. Indeed, theScheffé
Lemma(see [20] (Page129)) states that:
sup
A∈Bn(R) ∣∣∣Pφ(A)−PφT(A)∣∣∣= 12 ∫
R ∣∣∣pφ(y)−pφT(y)∣∣∣dy.
TheTVDgivesameasureof themaximumerrorwemaycommitwhenweuse theestimated
model in lieuof the truedistribution.Weconsider theHellingerdivergence forestimatorsbasedon
ϕ−divergences,whichcorresponds toϕ(t)= 12( √ t−1)2.Ourpreferenceof theHellingerdivergence
is thatwehopetoobtainrobustestimatorswithout lossofefficiency(see [21]).Dψ is calculatedwith
266
Differential Geometrical Theory of Statistics
- Title
- Differential Geometrical Theory of Statistics
- Authors
- Frédéric Barbaresco
- Frank Nielsen
- Editor
- MDPI
- Location
- Basel
- Date
- 2017
- Language
- English
- License
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03842-425-3
- Size
- 17.0 x 24.4 cm
- Pages
- 476
- Keywords
- Entropy, Coding Theory, Maximum entropy, Information geometry, Computational Information Geometry, Hessian Geometry, Divergence Geometry, Information topology, Cohomology, Shape Space, Statistical physics, Thermodynamics
- Categories
- Naturwissenschaften Physik