Page - 254 - in Differential Geometrical Theory of Statistics

Image of the Page - 254 -

Text of the Page - 254 -

Entropy2016,18, 277 log-likelihoodfunctionbyanestimatorofaϕ−divergencebetweenthe truedistributionof thedataand themodel.Aϕ–divergence in thesenseofCsiszár [6] isdeﬁnedin thesamewayas [7]by: Dϕ(Q,P)= ∫ ϕ ( dQ dP (y) ) dP(y), where ϕ is a nonnegative strictly convex function. Examples of such divergences are: the Kullback–Leibler (KL)divergence , themodiﬁedKLdivergence, theHellingerdistanceamongothers. All thesewell-knowndivergencesbelongto theclassofCressie-Readfunctions [8]deﬁnedby ϕγ(x)= xγ−γx+γ−1 γ(γ−1) forγ∈R\{0,1}. (1) forγ= 12,0,1respectively. Forγ∈{0,1}, the limit iscalculated,andwedenoteϕ0(x)=−logx+x−1 for thecaseof themodiﬁedKLandϕ1(x)= xlogx−x+1for theKL. Since the ϕ−divergence calculus uses theunknown truedistribution,weneed to estimate it. We consider the dual estimator of the divergence introduced independently by [9,10]. The use of this estimator is motivated by many reasons. Its minimum coincides with the MLE for ϕ(t)=−log(t)+ t−1. Inaddition, ithas thesameformfordiscreteandcontinuousmodels,anddoes notconsideranypartitioningorsmoothing. Let (Pφ)φ∈Φ beaparametricmodelwithΦ⊂Rd, anddenoteφT as the true set ofparameters. Letdybe theLebesguemeasuredeﬁnedonR. Suppose that∀φ∈Φ, theprobabilitymeasurePφ is absolutelycontinuouswithrespect todyanddenote pφ thecorrespondingprobabilitydensity. The dualestimatorof theϕ−divergencegivenann−sampley1,··· ,yn isgivenby: Dˆϕ(pφ,pφT)= sup α∈Φ ∫ ϕ′ ( pφ pα ) (x)pφ(x)dx− 1n n ∑ i=1 ϕ# ( pφ pα ) (yi), (2) withϕ#(t)= tϕ′(t)−ϕ(t). AlMohamad[11]argues that this formulaworkswellunder themodel; however,whenweare not, this quantity largely underestimates thedivergence between the true distributionandthemodel,andproposes the followingmodiﬁcation: D˜ϕ(pφ,pφT)= ∫ ϕ′ ( pφ Kn,w ) (x)pφ(x)dx− 1n n ∑ i=1 ϕ# ( pφ Kn,w ) (yi), (3) whereKn,w is theRosenblatt–Parzenkernel estimatewithwindowparameterw. Whether it is Dˆϕ, orD˜ϕ, theminimumdualϕ−divergenceestimator(MDϕDE)isdeﬁnedastheargumentoftheinﬁmum of thedualapproximation: φˆn = arginf φ∈Φ Dˆϕ(pφ,pφT), (4) φ˜n = arginf φ∈Φ D˜ϕ(pφ,pφT). (5) Asymptoticpropertiesandconsistencyof these twoestimatorscanbefoundin[7,11]. Robustness propertieswere also studied using the inﬂuence function approach in [11,12]. The kernel-based MDϕDE(5)seemstobeabetterestimator thantheclassicalMDϕDE(4) in thesense that the former is robustwhereas the later isgenerallynot.Under themodel, theestimatorgivenby(4) is,however, moreefﬁcient,especiallywhenthetruedensityof thedata isunbounded.Moreinvestigationisneeded in thecontextofunboundeddensities, sincewemayuseasymmetrickernels inorder to improvethe efﬁciencyof thekernel-basedMDϕDE,see [11] formoredetails. In thispaper,weproposecalculationof theMDϕDEusinganiterativeprocedurebasedonthe workofTseng [2] on the log-likelihood function. Thisprocedurehas the formofaproximalpoint 254

back to the book Differential Geometrical Theory of Statistics"

Differential Geometrical Theory of Statistics

Title: Differential Geometrical Theory of Statistics
Authors: Frédéric Barbaresco; Frank Nielsen
Editor: MDPI
Location: Basel
Date: 2017
Language: English
License: CC BY-NC-ND 4.0
ISBN: 978-3-03842-425-3
Size: 17.0 x 24.4 cm
Pages: 476
Keywords: Entropy, Coding Theory, Maximum entropy, Information geometry, Computational Information Geometry, Hessian Geometry, Divergence Geometry, Information topology, Cohomology, Shape Space, Statistical physics, Thermodynamics
Categories: Naturwissenschaften Physik