Page - 254 - in Differential Geometrical Theory of Statistics
Image of the Page - 254 -
Text of the Page - 254 -
Entropy2016,18, 277
log-likelihoodfunctionbyanestimatorofaϕ−divergencebetweenthe truedistributionof thedataand
themodel.Aϕ–divergence in thesenseofCsiszár [6] isdefinedin thesamewayas [7]by:
Dϕ(Q,P)= ∫
ϕ (
dQ
dP (y) )
dP(y),
where ϕ is a nonnegative strictly convex function. Examples of such divergences are: the
Kullback–Leibler (KL)divergence , themodifiedKLdivergence, theHellingerdistanceamongothers.
All thesewell-knowndivergencesbelongto theclassofCressie-Readfunctions [8]definedby
ϕγ(x)= xγ−γx+γ−1
γ(γ−1) forγ∈R\{0,1}. (1)
forγ= 12,0,1respectively. Forγ∈{0,1}, the limit iscalculated,andwedenoteϕ0(x)=−logx+x−1
for thecaseof themodifiedKLandϕ1(x)= xlogx−x+1for theKL.
Since the ϕ−divergence calculus uses theunknown truedistribution,weneed to estimate it.
We consider the dual estimator of the divergence introduced independently by [9,10]. The use
of this estimator is motivated by many reasons. Its minimum coincides with the MLE for
ϕ(t)=−log(t)+ t−1. Inaddition, ithas thesameformfordiscreteandcontinuousmodels,anddoes
notconsideranypartitioningorsmoothing.
Let (Pφ)φ∈Φ beaparametricmodelwithΦ⊂Rd, anddenoteφT as the true set ofparameters.
Letdybe theLebesguemeasuredefinedonR. Suppose that∀φ∈Φ, theprobabilitymeasurePφ is
absolutelycontinuouswithrespect todyanddenote pφ thecorrespondingprobabilitydensity. The
dualestimatorof theϕ−divergencegivenann−sampley1,··· ,yn isgivenby:
Dˆϕ(pφ,pφT)= sup
α∈Φ ∫
ϕ′ (
pφ
pα )
(x)pφ(x)dx− 1n n
∑
i=1 ϕ# (
pφ
pα )
(yi), (2)
withϕ#(t)= tϕ′(t)−ϕ(t). AlMohamad[11]argues that this formulaworkswellunder themodel;
however,whenweare not, this quantity largely underestimates thedivergence between the true
distributionandthemodel,andproposes the followingmodification:
D˜ϕ(pφ,pφT)= ∫
ϕ′ (
pφ
Kn,w )
(x)pφ(x)dx− 1n n
∑
i=1 ϕ# (
pφ
Kn,w )
(yi), (3)
whereKn,w is theRosenblatt–Parzenkernel estimatewithwindowparameterw. Whether it is Dˆϕ,
orD˜ϕ, theminimumdualϕ−divergenceestimator(MDϕDE)isdefinedastheargumentoftheinfimum
of thedualapproximation:
φˆn = arginf
φ∈Φ Dˆϕ(pφ,pφT), (4)
φ˜n = arginf
φ∈Φ D˜ϕ(pφ,pφT). (5)
Asymptoticpropertiesandconsistencyof these twoestimatorscanbefoundin[7,11]. Robustness
propertieswere also studied using the influence function approach in [11,12]. The kernel-based
MDϕDE(5)seemstobeabetterestimator thantheclassicalMDϕDE(4) in thesense that the former
is robustwhereas the later isgenerallynot.Under themodel, theestimatorgivenby(4) is,however,
moreefficient,especiallywhenthetruedensityof thedata isunbounded.Moreinvestigationisneeded
in thecontextofunboundeddensities, sincewemayuseasymmetrickernels inorder to improvethe
efficiencyof thekernel-basedMDϕDE,see [11] formoredetails.
In thispaper,weproposecalculationof theMDϕDEusinganiterativeprocedurebasedonthe
workofTseng [2] on the log-likelihood function. Thisprocedurehas the formofaproximalpoint
254
Differential Geometrical Theory of Statistics
- Title
- Differential Geometrical Theory of Statistics
- Authors
- Frédéric Barbaresco
- Frank Nielsen
- Editor
- MDPI
- Location
- Basel
- Date
- 2017
- Language
- English
- License
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03842-425-3
- Size
- 17.0 x 24.4 cm
- Pages
- 476
- Keywords
- Entropy, Coding Theory, Maximum entropy, Information geometry, Computational Information Geometry, Hessian Geometry, Divergence Geometry, Information topology, Cohomology, Shape Space, Statistical physics, Thermodynamics
- Categories
- Naturwissenschaften Physik