Seite - 257 - in Differential Geometrical Theory of Statistics
Bild der Seite - 257 -
Text der Seite - 257 -
Entropy2016,18, 277
Usingthe fact that thefirst termin Dˆϕ(pφ,pφT)doesnotdependonφ, so itdoesnotcount in the
arginf definingφk+1,weeasilyget (7). Thesameapplies for thecaseof (3). Fornotational simplicity,
fromnowon,weredefineDψ withanormalizationbyn, i.e.,
Dψ(φ,φk)= 1
n n
∑
i=1 ∫
X ψ ( hi(x|φ)
hi(x|φk) )
hi(x|φk)dx. (10)
Hence,oursetofalgorithmsis redefinedby:
φk+1=arginf
φ Dˆϕ(pφ,pφT)+Dψ(φ,φ k). (11)
Wewill see later that this iteration forces thedivergence todecrease and that, under suitable
conditions, it converges toa (local)minimumof Dˆϕ(pφ,pφT). It results thatalgorithm(11)beingaway
tocalculateboth theMDϕDE(4)andthekernel-basedMDϕDE(5).
3. SomeConvergencePropertiesofφk
Weshowherehow, according to somepossible situations, onemayproveconvergenceof the
algorithmdefinedby(11). Letφ0 beagiven initialization,anddefine
Φ0 :={φ∈Φ : Dˆϕ(pφ,pφT)≤ Dˆϕ(pφ0,pφT)},
whichwesuppose tobeasubsetof int(Φ). The ideaofdefining this set in this context is inherited
fromthepaperWu[16],whichprovidedthefirst correctproof ofconvergence for theEMalgorithm.
Beforegoinganyfurther,werecall the followingdefinitionofa (generalized)stationarypoint.
Definition 1. Let f : Rd → R be a real valued function. If f is differentiable at a point φ∗ such that
∇f(φ∗)=0,we thensay thatφ∗ is a stationarypoint of f. If f isnotdifferentiable atφ∗ but the subgradientof
fatφ∗, say∂f(φ∗), exists such that0∈∂f(φ∗), thenφ∗ is calledageneralized stationarypointof f.
Remark 1. In the whole paper, the subgradient is defined for any function not necessarily convex
(seeDefinition8.3) in [13] formoredetails.
Wewillbeusingthe followingassumptions:
A0. Functionsφ → Dˆϕ(pφ|pφT),Dψ are lowersemicontinuous;
A1. Functionsφ → Dˆϕ(pφ|pφT),Dψ and∇1Dψ aredefinedandcontinuouson, respectively,Φ,Φ×Φ
andΦ×Φ;
AC. Functionφ →∇Dˆϕ(pφ|pφT) isdefinedandcontinuousonΦ;
A2. Φ0 isacompactsubsetof int(Φ);
A3. Dψ(φ,φ¯)>0forall φ¯ =φ∈Φ.
Recall also thatwe suppose that hi(x|φ)> 0,dx−a.e.We relax the convexity assumption of
functionψ.Weonlysuppose thatψ isnonnegativeandψ(t)=0 iff t=1. Inaddition,ψ′(t)=0 if t=1.
Continuityanddifferentiabilityassumptionsof functionφ → Dˆϕ(pφ|pφT) for thecaseof (3)canbe
easilycheckedusingLebesgue theorems. Thecontinuityassumptionfor thecaseof (2) canbechecked
usingTheorem1.17orCorollary10.14 in [13].DifferentiabilitycanalsobecheckedusingCorollary
10.14orTheorem10.31 in thesamebook. InwhatconcernsDψ, continuityanddifferentiabilitycanbe
obtainedmerelybyfulfillingLebesguetheoremsconditions.Whenworkingwithmixturemodels,we
onlyneedthecontinuityanddifferentiabilityofψandfunctionshi. The later iseasilydeducedfrom
regularityassumptionsonthemodel. ForassumptionA2, there isnouniversalmethod, seeSection4.2
foranExample.AssumptionA3canbecheckedusingLemma2in[2].
257
Differential Geometrical Theory of Statistics
- Titel
- Differential Geometrical Theory of Statistics
- Autoren
- Frédéric Barbaresco
- Frank Nielsen
- Herausgeber
- MDPI
- Ort
- Basel
- Datum
- 2017
- Sprache
- englisch
- Lizenz
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03842-425-3
- Abmessungen
- 17.0 x 24.4 cm
- Seiten
- 476
- Schlagwörter
- Entropy, Coding Theory, Maximum entropy, Information geometry, Computational Information Geometry, Hessian Geometry, Divergence Geometry, Information topology, Cohomology, Shape Space, Statistical physics, Thermodynamics
- Kategorien
- Naturwissenschaften Physik