Page - 266 - in Differential Geometrical Theory of Statistics

Image of the Page - 266 -

Text of the Page - 266 -

Entropy2016,18, 277 amount andsuitablyadjustingλ, thevalueof hi(x|φ)wouldbeunchanged. Weexplore thismore thoroughlybywritingthecorrespondingequations. Letussuppose,absurdly, that fordistinctφandφ′, wehaveDψ(φ|φ′)=0. BydeﬁnitionofDψ, it isgivenbyasumofnonnegative terms,which implies thatall termsneedtobeequal tozero. Thefollowing linesareequivalent∀i∈{1,··· ,n}: hi(0|λ,μ1,μ2) = hi(0|λ′,μ′1,μ′2), λe−12(yi−μ1)2 λe−12(yi−μ1)2+(1−λ)e−12(yi−μ2)2 = λ′e−12(yi−μ′1)2 λ′e− 1 2(yi−μ′1)2+(1−λ′)e−12(yi−μ′2)2 , log ( 1−λ λ ) − 1 2 (yi−μ2)2+ 12(yi−μ1) 2 = log ( 1−λ′ λ′ ) − 1 2 (yi−μ′2)2+ 1 2 (yi−μ′1)2. Lookingat this setofnequationsasanequalityof twopolynomialsonyofdegree1atnpoints, wededuce thataswehave twodistinctobservations, say,y1 andy2, the twopolynomialsneedtohave thesamecoefﬁcients. Thus, thesetofnequations isequivalent to the followingtwoequations:{ μ1−μ2 = μ′1−μ′2 log ( 1−λ λ ) + 12μ 2 1− 12μ22 = log ( 1−λ′ λ′ ) + 12μ ′ 1 2− 12μ′22. (21) These twoequationswith threevariableshavean inﬁnitenumberofsolutions. Take, forexample, μ1=0, μ2=1, λ= 23, μ ′ 1= 1 2, μ ′ 2= 3 2, λ ′= 12. Remark2. Theprevious conclusion canbe extended to any two-componentmixture of exponential families having the form: pφ(y)=λe∑ m1 i=1θ1,iy i−F(θ1)+(1−λ)e∑m2i=1θ2,iyi−F(θ2). Onemaywrite the correspondingn equations. Thepolynomial of yi hasadegreeof atmostmax(m1,m2). Thus, if onedisposesofmax(m1,m2)+1distinctobservations, the twopolynomialswill have the samesetof coefﬁcients. Finally, if (θ1,θ2)∈Rd−1withd>max(m1,m2), thenassumptionA3doesnothold. Unfortunately, we have no an information about the difference between consecutive terms ‖φk+1−φk‖except for thecaseofψ(t)= ϕ(t)=−log(t)+ t−1whichcorresponds to theclassical EMrecurrence: λk+1= 1 n n ∑ i=1 hi(0|φk), μk+11 = ∑ni=1yihi(0|φk) ∑ni=1hi(0|φk) μk+11 = ∑ni=1yihi(1|φk) ∑ni=1hi(1|φk) . Tseng[2]hasshownthatwecanprovedirectly thatφk+1−φk converges to0. 5. SimulationStudy We summarize the results of 100 experiments on 100 samples by giving the average of the estimates and theerror committed, and the correspondingstandarddeviation. The criterionerror is the totalvariationdistance (TVD),which is calculatedusing theL1distance. Indeed, theScheffé Lemma(see [20] (Page129)) states that: sup A∈Bn(R) ∣∣∣Pφ(A)−PφT(A)∣∣∣= 12 ∫ R ∣∣∣pφ(y)−pφT(y)∣∣∣dy. TheTVDgivesameasureof themaximumerrorwemaycommitwhenweuse theestimated model in lieuof the truedistribution.Weconsider theHellingerdivergence forestimatorsbasedon ϕ−divergences,whichcorresponds toϕ(t)= 12( √ t−1)2.Ourpreferenceof theHellingerdivergence is thatwehopetoobtainrobustestimatorswithout lossofefﬁciency(see [21]).Dψ is calculatedwith 266

back to the book Differential Geometrical Theory of Statistics"

Differential Geometrical Theory of Statistics

Title: Differential Geometrical Theory of Statistics
Authors: Frédéric Barbaresco; Frank Nielsen
Editor: MDPI
Location: Basel
Date: 2017
Language: English
License: CC BY-NC-ND 4.0
ISBN: 978-3-03842-425-3
Size: 17.0 x 24.4 cm
Pages: 476
Keywords: Entropy, Coding Theory, Maximum entropy, Information geometry, Computational Information Geometry, Hessian Geometry, Divergence Geometry, Information topology, Cohomology, Shape Space, Statistical physics, Thermodynamics
Categories: Naturwissenschaften Physik