Seite - 325 - in Differential Geometrical Theory of Statistics
Bild der Seite - 325 -
Text der Seite - 325 -
entropy
Article
TheInformationGeometryofSparse
Goodness-of-FitTesting
PaulMarriott 1,*,RadkaSabolová2,GermainVanBever3 andFrankCritchley2
1 DepartmentofStatisticsandActuarialScience,UniversityofWaterloo,200UniversityAvenueWest,
Waterloo,ONN2L3G1,Canada
2 SchoolofMathematicsandStatistics,TheOpenUniversity,WaltonHall,MiltonKeynes,
BuckinghamshireMK76AA,UK;radka.sabolova@open.ac.uk(R.S.); f.critchley@open.ac.uk(F.C.)
3 DepartmentofMathematics&ECARES,Université libredeBruxelles,AvenueF.D.Roosevelt42,
1050Brussels,Belgium;gvbever@ulb.ac.be
* Correspondence: pmarriot@uwaterloo.ca;Tel.: +1-519-888-4567
AcademicEditors: FrédéricBarbarescoandFrankNielsen
Received: 31August2016;Accepted: 19November2016;Published: 24November2016
Abstract: Thispapertakesaninformation-geometricapproachtothechallengingissueofgoodness-of-fit
testinginthehighdimensional, lowsamplesizecontextwhere—potentially—boundaryeffectsdominate.
Themaincontributionsof thispaperare threefold: first,wepresentandprove twonewtheoremson
thebehaviourofcommonlyusedtest statistics in thiscontext; second,weinvestigate—inthenovel
environmentof theextendedmultinomialmodel—the linksbetweeninformationgeometry-based
divergences andstandardgoodness-of-fit statistics, allowingus to formalise relationshipswhich
havebeenmissing in the literature;finally,weusesimulationstudies tovalidateandillustrateour
theoretical resultsandtoexplorecurrentlyopenresearchquestionsabout thewaythatdiscretisation
effects can dominate sampling distributions near the boundary. Novelly accommodating these
discretisationeffects contrasts sharplywith theessentially continuousapproachof skewnessand
othercorrectionsflowingfromstandardhigher-orderasymptoticanalysis.
Keywords: extendedmultinomialmodels;goodness-of-fit testing; informationgeometry
1. Introduction
Westartbyemphasising the threefoldachievementsof thispaper, spelledout indetail in termsof
thepaper’s sectionstructurebelow. First,wepresentandprovetwonewtheoremsonthebehaviour
ofsomestandardgoodness-of-fit statistics in thehighdimensional, lowsamplesizecontext, focusing
onbehaviour “near the boundary”of the extendedmultinomial family. Wealso comment on the
methodsofproofwhichallowexplicit calculationsofhigherordermoments in thiscontext. Second,
workingagainexplicitly in theextendedmultinomial context,wefillahole in the literatureby linking
information-geometric-baseddivergences and standard goodness-of-fit statistics. Finally, weuse
simulationstudies toexplorediscretisationeffects that candominate samplingdistributions“near
theboundary”. Indeed,we illustrate andexplorehow—in thehighdimensional, lowsample size
context—alldistributionsareaffectedbyboundaryeffects. Wealsouse these simulation results to
explorecurrentlyopenresearchquestions.Ascanbeseen, theoverarchingthemeis the importance
of working in the geometry of the extended exponential family [1], rather than the traditional
manifold-basedstructureof informationgeometry.
Inmore detail, the paper extends and builds on the results of [2], andweuse notation and
definitionsconsistentlyacross these twopapers. Bothpapers investigate the issueofgoodness-of-fit
testing inthehighdimensionalsparseextendedmultinomialcontext,usingthetoolsofComputational
InformationGeometry (CIG) [1].
Entropy2016,18, 421 325 www.mdpi.com/journal/entropy
Differential Geometrical Theory of Statistics
- Titel
- Differential Geometrical Theory of Statistics
- Autoren
- Frédéric Barbaresco
- Frank Nielsen
- Herausgeber
- MDPI
- Ort
- Basel
- Datum
- 2017
- Sprache
- englisch
- Lizenz
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03842-425-3
- Abmessungen
- 17.0 x 24.4 cm
- Seiten
- 476
- Schlagwörter
- Entropy, Coding Theory, Maximum entropy, Information geometry, Computational Information Geometry, Hessian Geometry, Divergence Geometry, Information topology, Cohomology, Shape Space, Statistical physics, Thermodynamics
- Kategorien
- Naturwissenschaften Physik