Seite - 253 - in Differential Geometrical Theory of Statistics

Bild der Seite - 253 -

Text der Seite - 253 -

entropy Article AProximalPointAlgorithmforMinimum DivergenceEstimatorswithApplicationto MixtureModels † DiaaAlMohamad*andMichelBroniatowski LaboratoiredeStatistiqueThéoriqueetAppliquée,UniversitéPierreetMarieCURIE,4place Jussieu, 75005Paris,France;michel.broniatowski@upmc.fr * Correspondence: diaa.almohamad@gmail.com;Tel.: +33-7-62-59-17-73 † Thispaper isanextendedversionofourpaperpublishedin the2ndConferenceonGeometricScienceof Information,Palaiseau,France,28–30October2015. AcademicEditors: FrédéricBarbarescoandFrankNielsen Received: 11 June2016;Accepted: 21 July2016;Published: 27 July2016 Abstract: Estimators derived from a divergence criterion such as ϕ−divergences are generally more robust than themaximum likelihoodones. Weare interested inparticular in the so-called minimumdual ϕ–divergenceestimator (MDϕDE), anestimatorbuiltusingadual representation ofϕ–divergences.Wepresent in thispaperan iterativeproximalpointalgorithmthatpermits the calculationofsuchanestimator. Thealgorithmcontainsbyconstruction thewell-knownExpectation Maximization(EM)algorithm.Ourwork isbasedonthepaperofTsengonthe likelihoodfunction. Weprovidesomeconvergencepropertiesbyadapting the ideasofTseng.WeimproveTseng’s results byrelaxingthe identiﬁabilityconditionontheproximal term,aconditionwhich isnotveriﬁedfor mostmixturemodels and ishard tobeveriﬁed for “nonmixture”ones. Convergenceof theEM algorithminatwo-componentGaussianmixture isdiscussed in thespiritofourapproach. Several experimental resultsonmixturemodelsareprovidedtoconﬁrmthevalidityof theapproach. Keywords:ϕ–divergences; robustestimation;EMalgorithm;proximal-pointalgorithms;mixturemodels 1. Introduction The Expectation Maximization (EM) algorithm is a well-known method for calculating the maximumlikelihoodestimatorofamodelwhere incompletedata isconsidered. Forexample,when working with mixture models in the context of clustering, the labels or classes of observations are unknown during the training phase. Several variants of the EM algorithm were proposed (see [1]). Another way to look at the EM algorithm is as a proximal point problem (see [2,3]). Indeed, onemay rewrite the conditional expectation of the complete log-likelihood as a sumof the log-likelihood functionandadistance-like functionover the conditionaldensitiesof the labels providedanobservation.Generally, theproximal termhasaregularizationeffect in thesense thata proximalpointalgorithmismorestableandfrequentlyoutperformsclassicaloptimizationalgorithms (see [4]). ChrétienandHero[5]provesuperlinearconvergenceofaproximalpointalgorithmderived fromtheEMalgorithm.NoticethatEM-typealgorithmsusuallyenjoynomorethanlinearconvergence. Takingintoconsiderationtheneedforrobustestimators,andthefact thatthemaximumlikelihood estimator (MLE) is the least robust estimator among the class of divergence-type estimators that wepresentbelow,wegeneralize theEMalgorithm(and theversionofTseng [2]) by replacing the Entropy2016,18, 277 253 www.mdpi.com/journal/entropy

zurück zum Buch Differential Geometrical Theory of Statistics"

Differential Geometrical Theory of Statistics

Titel: Differential Geometrical Theory of Statistics
Autoren: Frédéric Barbaresco; Frank Nielsen
Herausgeber: MDPI
Ort: Basel
Datum: 2017
Sprache: englisch
Lizenz: CC BY-NC-ND 4.0
ISBN: 978-3-03842-425-3
Abmessungen: 17.0 x 24.4 cm
Seiten: 476
Schlagwörter: Entropy, Coding Theory, Maximum entropy, Information geometry, Computational Information Geometry, Hessian Geometry, Divergence Geometry, Information topology, Cohomology, Shape Space, Statistical physics, Thermodynamics
Kategorien: Naturwissenschaften Physik