Seite - 331 - in Differential Geometrical Theory of Statistics
Bild der Seite - 331 -
Text der Seite - 331 -
Entropy2016,18, 421
anddiscretegraphicalmodels. Testing isoftenusedtochecktheconsistencyofaparametricmodel
withgivendata, andtocheckdependencyassumptionssuchas independencebetweencategorical
variables.However,wenotean importantcaveat: aspointedoutby[14,15], the fact thataparametric
model“passes”agoodness-of-fit testonlyweaklyconstrains the resulting inference. Theessential
point here is that goodness-of-fit is a necessary, but not sufficient, condition for model choice,
since—ingeneral—manymodelswillbeempiricallysupported. This issuehasrecentlybeenexplored
geometrically in [16]usingCIG.
Therehavebeenmanypossible test statisticsproposed forgoodness-of-fit testing, andoneof
theattractionsof thePower-Divergence family,defined in (11), is that themost importantonesare
included in the family and indexedbya single scalar λ. Of course,when there is a choice of test
statistic, different inferences can result fromdifferent choices. Oneof themain themesof [5] is to
give theanalyst insightabout selectingaparticularλ. Keyconsiderations formaking theselection
ofλ include the tractabilityof the samplingdistribution, its poweragainst important alternatives,
andinterpretationwhenhypothesesarerejected.
Thefirstorder,asymptotic inN,χ2-samplingdistributionforallmembersof thePower-Divergence
family,which is appropriatewhenall observedcounts are “large enough”, is themost commonly
usedtool, andaveryattractive featureof the family.However, thiscanfailbadly in the“sparse”case
andwhen themodel is close to theboundary. Elementary,momentbasedcorrections, to improve
small sampleperformance,arediscussedin[5] (Chapter5).Moreformalasymptoticapproaches to
these issues includethedoublyasymptotic, inNandk, approachof [17],discussedinSection2and
similarnormalapproximation ideas in [18]. Seealso [19]. Extensivesimulationexperimentshavebeen
undertakento learn inpracticewhat ‘largeenough’means, see [5,20,21].
Whentherearenuisanceparameters tobeestimated(as iscommon), [22]pointsout that it is the
samplingdistribution conditionalupontheseestimateswhichneeds tobeapproximated,andproposes
higher ordermethodsbasedon theEdgeworth expansion. Simulation approaches are oftenused
in the conditional context due to the common intractability of the conditional distribution [23,24],
and importancesamplingmethodsplayan important role—see [25–27].Otherapproachesused to
investigatethesamplingdistributionincludejackknifing[28], theChen–Steinmethod[29],anddetailed
asymptoticanalysis in [30–32].
Inveryhighdimensionalmodel spaces, considerationsof thepowerof tests rarelygenerates
uniformly best procedures but,we feel, geometry can be an important tool in understanding the
choices thatneedtobemade. Further, [5], states thesituation is“complicated”, showingthis through
simulationexperiments.Oneof thereasons forReadandCressie’spreferredchoiceofλ=2/3is its
goodpoweragainst someimportant typesofalternative–theso-calledbumpordipcases–aswellas
therelative tractabilityof its samplingdistributionunder thenull.Otherconsiderationsaboutpower
canbefoundin[33]which looksspecificallyatmixturemodelbasedalternatives.
3.3. Linkswith InformationGeometry
At the time that the Power-Divergence family was being examined, there was a parallel
development in InformationGeometry; oddly,however, it seemedtohave takensometimebefore
the links between the two areas were fully recognised. A good treatment of these links can be
found in [6] (Chapter 9). Since it is important to understand the extreme values of divergence
functions, considerationsofconvexitycanclearlyplayanimportantrole. ThegeneralclassofBregman
divergences, [6,34] (page240), and[35] (page13) isveryusefulhere. ForeachBregmandivergence,
therewill existaffineparametersof theexponential family inwhich thedivergence function isconvex.
In theclassofproductPoissonmodels—whichare thekeybuildingblocksof log–linearmodels—all
membersof thePower-Divergence familyhavetheBregmanproperty. Theseare thenα-divergences,
capableofgenerating thecomplete InformationGeometryof themodel [35],with the linkbetweenα
andλgiven inTable1. Theα-representationhighlights thedualityproperties,whichareacornerstone
of InformationGeometry,butwhich is ratherhidden in theλ representation. TheBregmandivergence
331
Differential Geometrical Theory of Statistics
- Titel
- Differential Geometrical Theory of Statistics
- Autoren
- Frédéric Barbaresco
- Frank Nielsen
- Herausgeber
- MDPI
- Ort
- Basel
- Datum
- 2017
- Sprache
- englisch
- Lizenz
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03842-425-3
- Abmessungen
- 17.0 x 24.4 cm
- Seiten
- 476
- Schlagwörter
- Entropy, Coding Theory, Maximum entropy, Information geometry, Computational Information Geometry, Hessian Geometry, Divergence Geometry, Information topology, Cohomology, Shape Space, Statistical physics, Thermodynamics
- Kategorien
- Naturwissenschaften Physik