Seite - 264 - in Differential Geometrical Theory of Statistics
Bild der Seite - 264 -
Text der Seite - 264 -
Entropy2016,18, 277
Conclusion1. UsingPropositions4and1, ifΦ=[η,1−η]× [μmin,μmax]2, the sequence (Dˆϕ(pφk,pφT))k
defined throughFormula (2) convergesand there exists a subsequence (φN(k))whichconverges toa stationary
pointof the estimateddivergence.Moreover, every limitpointof the sequence (φk)k is a stationarypointof the
estimateddivergence.
Ifwe are using the kernel-baseddual estimator given by (3)with aGaussian kernel density
estimator, thenfunctionφ → Dˆϕ(pφ,pφT) is continuouslydifferentiableoverΦevenif themeansμ1
andμ2 arenotbounded. Forexample, takeϕ=ϕγ definedby(1). There isoneconditionwhichrelates
thewindowof thekernel, sayw,with thevalueofγ. Indeed,usingFormula (3),wecanwrite
Dˆϕ(pφ,pφT)= 1
γ−1 ∫ pγφ
Kγ−1n,w (y)dy− 1
γn n
∑
i=1 pγφ
Kγn,w (yi)− 1γ(γ−1).
Inorder tostudythecontinuityandthedifferentiabilityof theestimateddivergencewithrespect
toφ, it suffices tostudythe integral term.Wehave
pγφ
Kγ−1n,w (y)= (
λ√
2π exp [
−12(y−μ1)2 ]
+ 1−λ√
2π exp [
−12(y−μ2)2 ])γ
(
1
nw∑ n
i=1exp [ −(y−yi)22w2 ])γ−1 .
The dominating term at infinity in the nominator is exp(−γy2/2), whereas it is
exp(−(γ−1)y2/(2w2)) in thedenominator. It sufficesnowinorder that the integrandtobebounded
byan integrable function independentlyofφ= (λ,μ) thatwehave−γ+(γ−1)/w2< 0. That is
−γw2+γ−1<0,which isequivalent toγ(w2−1)<−1. Thisargumentalsoholds ifwedifferentiate
the integrandwithrespect toλoreitherof themeansμ1 orμ2. Forγ=2(thePearson’sχ2),weneed
w2>1/2. Forγ=1/2(theHellinger), there isnoconditiononw.
ClosednessofΦ0 isprovedsimilarlytothepreviouscase. Boundedness,however,mustbetreated
differentlysinceΦ isnotnecessarilycompactandis supposedtobeΦ=[η,1−η]×R2. Forsimplicity,
takeϕ=ϕγ. The idea is tochooseφ0 an initializationfor theproximalalgorithminawaythatΦ0does
not includeunboundedvaluesof themeans. Continuityofφ → Dˆϕ(pφ,pφT)permits calculationof
the limitswheneither (orboth)of themeans tends to infinity. Ifboth themeansgoto infinity, then
pφ(x)→0,∀x. Thus, forγ∈ (0,∞)\{1},wehave Dˆϕ(pφ,pφT)→ 1γ(γ−1). Forγ<0, the limit is infinity.
Ifonlyoneof themeans tends to∞, thenthecorrespondingcomponentvanishes fromthemixture.
Thus, ifwechooseφ0 suchthat:
Dˆϕ(pφ0,pφT) < min (
1
γ(γ−1),infλ,μDˆϕ(p(λ,∞,μ),pφT) )
ifγ∈ (0,∞)\{1}, (18)
Dˆϕ(pφ0,pφT) < inf
λ,μ Dˆϕ(p(λ,∞,μ),pφT) ifγ<0, (19)
thenthealgorithmstartsatapointofΦwhosefunctionvalue is inferior to the limitsof Dˆϕ(pφ,pφT)
at infinity. ByProposition 1, the algorithmwill continue todecrease thevalue of Dˆϕ(pφ,pφT) and
nevergoesback to the limitsat infinity. Inaddition, thedefinitionofΦ0 permits toconclude that if
φ0 is chosenaccording toconditions (18)and(19), thenΦ0 isbounded. Thus,Φ0 becomescompact.
Unfortunately thevalueof infλ,μ Dˆϕ(p(λ,∞,μ),pφT)canbecalculatedbutnumerically.Wewill seenext
that in thecaseof the likelihoodfunction,asimilarconditionwillbe imposedfor thecompactnessof
Φ0, andtherewillbenoneedforanynumerical calculus.
Conclusion 2. UsingPropositions 4 and 1, under conditions (18) and (19) the sequence (Dˆϕ(pφk,pφT))k
defined throughFormula (3) converges and there exists a subsequence (φN(k)) that converges to a stationary
264
Differential Geometrical Theory of Statistics
- Titel
- Differential Geometrical Theory of Statistics
- Autoren
- Frédéric Barbaresco
- Frank Nielsen
- Herausgeber
- MDPI
- Ort
- Basel
- Datum
- 2017
- Sprache
- englisch
- Lizenz
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03842-425-3
- Abmessungen
- 17.0 x 24.4 cm
- Seiten
- 476
- Schlagwörter
- Entropy, Coding Theory, Maximum entropy, Information geometry, Computational Information Geometry, Hessian Geometry, Divergence Geometry, Information topology, Cohomology, Shape Space, Statistical physics, Thermodynamics
- Kategorien
- Naturwissenschaften Physik