Seite - 257 - in Differential Geometrical Theory of Statistics

Bild der Seite - 257 -

Text der Seite - 257 -

Entropy2016,18, 277 Usingthe fact that theﬁrst termin Dˆϕ(pφ,pφT)doesnotdependonφ, so itdoesnotcount in the arginf deﬁningφk+1,weeasilyget (7). Thesameapplies for thecaseof (3). Fornotational simplicity, fromnowon,weredeﬁneDψ withanormalizationbyn, i.e., Dψ(φ,φk)= 1 n n ∑ i=1 ∫ X ψ ( hi(x|φ) hi(x|φk) ) hi(x|φk)dx. (10) Hence,oursetofalgorithmsis redeﬁnedby: φk+1=arginf φ Dˆϕ(pφ,pφT)+Dψ(φ,φ k). (11) Wewill see later that this iteration forces thedivergence todecrease and that, under suitable conditions, it converges toa (local)minimumof Dˆϕ(pφ,pφT). It results thatalgorithm(11)beingaway tocalculateboth theMDϕDE(4)andthekernel-basedMDϕDE(5). 3. SomeConvergencePropertiesofφk Weshowherehow, according to somepossible situations, onemayproveconvergenceof the algorithmdeﬁnedby(11). Letφ0 beagiven initialization,anddeﬁne Φ0 :={φ∈Φ : Dˆϕ(pφ,pφT)≤ Dˆϕ(pφ0,pφT)}, whichwesuppose tobeasubsetof int(Φ). The ideaofdeﬁning this set in this context is inherited fromthepaperWu[16],whichprovidedtheﬁrst correctproof ofconvergence for theEMalgorithm. Beforegoinganyfurther,werecall the followingdeﬁnitionofa (generalized)stationarypoint. Deﬁnition 1. Let f : Rd → R be a real valued function. If f is differentiable at a point φ∗ such that ∇f(φ∗)=0,we thensay thatφ∗ is a stationarypoint of f. If f isnotdifferentiable atφ∗ but the subgradientof fatφ∗, say∂f(φ∗), exists such that0∈∂f(φ∗), thenφ∗ is calledageneralized stationarypointof f. Remark 1. In the whole paper, the subgradient is deﬁned for any function not necessarily convex (seeDeﬁnition8.3) in [13] formoredetails. Wewillbeusingthe followingassumptions: A0. Functionsφ → Dˆϕ(pφ|pφT),Dψ are lowersemicontinuous; A1. Functionsφ → Dˆϕ(pφ|pφT),Dψ and∇1Dψ aredeﬁnedandcontinuouson, respectively,Φ,Φ×Φ andΦ×Φ; AC. Functionφ →∇Dˆϕ(pφ|pφT) isdeﬁnedandcontinuousonΦ; A2. Φ0 isacompactsubsetof int(Φ); A3. Dψ(φ,φ¯)>0forall φ¯ =φ∈Φ. Recall also thatwe suppose that hi(x|φ)> 0,dx−a.e.We relax the convexity assumption of functionψ.Weonlysuppose thatψ isnonnegativeandψ(t)=0 iff t=1. Inaddition,ψ′(t)=0 if t=1. Continuityanddifferentiabilityassumptionsof functionφ → Dˆϕ(pφ|pφT) for thecaseof (3)canbe easilycheckedusingLebesgue theorems. Thecontinuityassumptionfor thecaseof (2) canbechecked usingTheorem1.17orCorollary10.14 in [13].DifferentiabilitycanalsobecheckedusingCorollary 10.14orTheorem10.31 in thesamebook. InwhatconcernsDψ, continuityanddifferentiabilitycanbe obtainedmerelybyfulﬁllingLebesguetheoremsconditions.Whenworkingwithmixturemodels,we onlyneedthecontinuityanddifferentiabilityofψandfunctionshi. The later iseasilydeducedfrom regularityassumptionsonthemodel. ForassumptionA2, there isnouniversalmethod, seeSection4.2 foranExample.AssumptionA3canbecheckedusingLemma2in[2]. 257

zurück zum Buch Differential Geometrical Theory of Statistics"

Differential Geometrical Theory of Statistics

Titel: Differential Geometrical Theory of Statistics
Autoren: Frédéric Barbaresco; Frank Nielsen
Herausgeber: MDPI
Ort: Basel
Datum: 2017
Sprache: englisch
Lizenz: CC BY-NC-ND 4.0
ISBN: 978-3-03842-425-3
Abmessungen: 17.0 x 24.4 cm
Seiten: 476
Schlagwörter: Entropy, Coding Theory, Maximum entropy, Information geometry, Computational Information Geometry, Hessian Geometry, Divergence Geometry, Information topology, Cohomology, Shape Space, Statistical physics, Thermodynamics
Kategorien: Naturwissenschaften Physik