Seite - 253 - in Differential Geometrical Theory of Statistics
Bild der Seite - 253 -
Text der Seite - 253 -
entropy
Article
AProximalPointAlgorithmforMinimum
DivergenceEstimatorswithApplicationto
MixtureModels †
DiaaAlMohamad*andMichelBroniatowski
LaboratoiredeStatistiqueThéoriqueetAppliquée,UniversitéPierreetMarieCURIE,4place Jussieu,
75005Paris,France;michel.broniatowski@upmc.fr
* Correspondence: diaa.almohamad@gmail.com;Tel.: +33-7-62-59-17-73
† Thispaper isanextendedversionofourpaperpublishedin the2ndConferenceonGeometricScienceof
Information,Palaiseau,France,28–30October2015.
AcademicEditors: FrédéricBarbarescoandFrankNielsen
Received: 11 June2016;Accepted: 21 July2016;Published: 27 July2016
Abstract: Estimators derived from a divergence criterion such as ϕ−divergences are generally
more robust than themaximum likelihoodones. Weare interested inparticular in the so-called
minimumdual ϕ–divergenceestimator (MDϕDE), anestimatorbuiltusingadual representation
ofϕ–divergences.Wepresent in thispaperan iterativeproximalpointalgorithmthatpermits the
calculationofsuchanestimator. Thealgorithmcontainsbyconstruction thewell-knownExpectation
Maximization(EM)algorithm.Ourwork isbasedonthepaperofTsengonthe likelihoodfunction.
Weprovidesomeconvergencepropertiesbyadapting the ideasofTseng.WeimproveTseng’s results
byrelaxingthe identifiabilityconditionontheproximal term,aconditionwhich isnotverifiedfor
mostmixturemodels and ishard tobeverified for “nonmixture”ones. Convergenceof theEM
algorithminatwo-componentGaussianmixture isdiscussed in thespiritofourapproach. Several
experimental resultsonmixturemodelsareprovidedtoconfirmthevalidityof theapproach.
Keywords:ϕ–divergences; robustestimation;EMalgorithm;proximal-pointalgorithms;mixturemodels
1. Introduction
The Expectation Maximization (EM) algorithm is a well-known method for calculating the
maximumlikelihoodestimatorofamodelwhere incompletedata isconsidered. Forexample,when
working with mixture models in the context of clustering, the labels or classes of observations
are unknown during the training phase. Several variants of the EM algorithm were proposed
(see [1]). Another way to look at the EM algorithm is as a proximal point problem (see [2,3]).
Indeed, onemay rewrite the conditional expectation of the complete log-likelihood as a sumof
the log-likelihood functionandadistance-like functionover the conditionaldensitiesof the labels
providedanobservation.Generally, theproximal termhasaregularizationeffect in thesense thata
proximalpointalgorithmismorestableandfrequentlyoutperformsclassicaloptimizationalgorithms
(see [4]). ChrétienandHero[5]provesuperlinearconvergenceofaproximalpointalgorithmderived
fromtheEMalgorithm.NoticethatEM-typealgorithmsusuallyenjoynomorethanlinearconvergence.
Takingintoconsiderationtheneedforrobustestimators,andthefact thatthemaximumlikelihood
estimator (MLE) is the least robust estimator among the class of divergence-type estimators that
wepresentbelow,wegeneralize theEMalgorithm(and theversionofTseng [2]) by replacing the
Entropy2016,18, 277 253 www.mdpi.com/journal/entropy
Differential Geometrical Theory of Statistics
- Titel
- Differential Geometrical Theory of Statistics
- Autoren
- Frédéric Barbaresco
- Frank Nielsen
- Herausgeber
- MDPI
- Ort
- Basel
- Datum
- 2017
- Sprache
- englisch
- Lizenz
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03842-425-3
- Abmessungen
- 17.0 x 24.4 cm
- Seiten
- 476
- Schlagwörter
- Entropy, Coding Theory, Maximum entropy, Information geometry, Computational Information Geometry, Hessian Geometry, Divergence Geometry, Information topology, Cohomology, Shape Space, Statistical physics, Thermodynamics
- Kategorien
- Naturwissenschaften Physik