Seite - 263 - in Differential Geometrical Theory of Statistics
Bild der Seite - 263 -
Text der Seite - 263 -
Entropy2016,18, 277
Nowassumptions of Theorem3.2.4. from [18] are verified. Thus, using the same lines from
theproofof this theorem(invertingall inequalitiessinceweareminimizing insteadofmaximizing),
wemayprovethatφ∞ isaglobal infimumof theestimateddivergence, that is
Dˆϕ(pφ∞,pφT)≤ Dˆϕ(pφ,pφT), ∀φ∈Φ.
Theproblemwith this approach is that it dependsheavily on the fact that the supremumon
each step of the algorithm is calculated exactly. This does not happen in general unless function
Dˆϕ(pφ,pφT)+βkDψ(φ,φ k) is convexor thatwedisposeofanalgorithmthatcanperfectlysolvenon
convexoptimizationproblems (In this case, there isnomeaning inapplyingan iterativeproximal
algorithm. We would have used the optimization algorithm directly on the objective function
Dˆϕ(pφ,pφT)). Although in our approach, we use a similar assumption to prove the consecutive
decreasingofDˆϕ(pφ,pφT),wecanreplacetheinfimumcalculusin(11)bytwothings.Werequireateach
stepthatwefindalocal infimumof Dˆϕ(pφ,pφT)+Dψ(φ,φ k)whoseevaluationwithφ → Dˆϕ(pφ,pφT)
is less thantheprevious termof thesequenceφk. Ifwecannolongerfindanylocalminimaverifying
theclaim, theprocedurestopswithφk+1=φk. Thisensures theavailabilityofall theproofspresented
in thispaperwithnochange.
4.2. TheTwo-ComponentGaussianMixture
Wesuppose that themodel (pφ)φ∈Φ isamixtureof twogaussiandensities, andthatweareonly
interested inestimating themeansμ= (μ1,μ2)∈R2 and theproportionλ∈ [η,1−η]. Theuseof
η is toavoidcancellationofanyof the twocomponents, andtokeep thehypothesishi(x|φ)>0 for
x= 1,2verified. Wealso suppose that the componentsvariancesare reduced (σi= 1). Themodel
takes the form
pλ,μ(x)= λ√
2π e− 1
2(x−μ1)2+ 1−λ√
2π e− 1
2(x−μ2)2. (17)
Here,Φ=[η,1−η]×R2. TheregularizationtermDψ isdefinedby(8)where:
hi(1|φ)= λe −12(yi−μ1)2
λe−12(yi−μ1)2+(1−λ)e−12(yi−μ2)2 , hi(2|φ)=1−hi(1|φ).
Functions hi are clearly of class C1(int(Φ)), and so doesDψ. Weprove thatΦ0 is closed and
bounded,which is sufficient toconclude its compactness, since thespace [η,1−η]×R2 providedwith
theeuclideandistance iscomplete.
Ifweareusing thedual estimatorof the ϕ−divergencegivenby (2), thenassumptionA0can
beverifiedusing themaximumtheoremofBerge [19]. There is still a great difficulty in studying
theproperties (closedness or compactness) of the setΦ0. Moreover, all convergenceproperties of
thesequenceφk require thecontinuityof theestimatedϕ−divergence Dˆϕ(pφ,pφT)withrespect toφ.
Inorder toprove thecontinuityof theestimateddivergence,weneedtoassumethatΦ is compact,
i.e., assumethat themeansare includedinanintervalof the form [μmin,μmax].Now,usingTheorem
10.31 from[13],φ → Dˆϕ(pφ,pφT) is continuousanddifferentiablealmosteverywherewithrespect toφ.
ThecompactnessassumptionofΦ impliesdirectly thecompactnessofΦ0. Indeed,
Φ0 = {
φ∈Φ,Dˆϕ(pφ,pφT)≤ Dˆϕ(pφ0,pφT) }
= Dˆϕ(pφ,pφT) −1 (
(−∞,Dˆϕ(pφ0,pφT)] )
.
Φ0 is then the inverse imagebyacontinuous functionofaclosedset, so it is closed inΦ. Hence, it
is compact.
263
Differential Geometrical Theory of Statistics
- Titel
- Differential Geometrical Theory of Statistics
- Autoren
- Frédéric Barbaresco
- Frank Nielsen
- Herausgeber
- MDPI
- Ort
- Basel
- Datum
- 2017
- Sprache
- englisch
- Lizenz
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03842-425-3
- Abmessungen
- 17.0 x 24.4 cm
- Seiten
- 476
- Schlagwörter
- Entropy, Coding Theory, Maximum entropy, Information geometry, Computational Information Geometry, Hessian Geometry, Divergence Geometry, Information topology, Cohomology, Shape Space, Statistical physics, Thermodynamics
- Kategorien
- Naturwissenschaften Physik