Page - 394 - in Differential Geometrical Theory of Statistics

Image of the Page - 394 -

Text of the Page - 394 -

Entropy2016,9, 337 as the cost ofmoving the smoothed density around γ1 to the uniformdistribution on the curve, thenmoving γ1 to γ2, keepingpointswith equal scaled arclength in correspondence, andﬁnally, movingtheuniformdistributiononγ2 to thesmootheddensity. Havingthedensityathand, theentropyof thesystemofcurvesγ1, . . . ,γN isdeﬁnedtheusual wayas: E(γ1, . . . ,γN)=− ∫ Ω d˜(x) log ( d˜(x) ) dx. The entropy isdependenton theparticular choiceof thekernelK. Asmentionedbefore, it is acommonpractice in theﬁeldofnon-parametricstatistics to introduceatuningparameterν>0 in thekernel, calledbandwidth, so that it is expressedasa scaledversionK= fν of agiven function f :R+→R+. Thevalueofν is themost inﬂuentialparameter intheestimationofthedensityandmust beselectedcarefully. Forcurveclusteringapplications, it isdeﬁnedbythedesired interaction length: ifν tends tozero, thecurveswillbehaveas independentobjects,whileontheotherendof thescale, veryhighbandwidthwill tendtoremovethe inﬂuenceof thecurves themselves. For themoment,no automatedmeansofﬁndinganoptimalνwasused,althoughitwillbepartofa futurework. 2.4.Minimizing theEntropy In order to fulﬁll the initial requirement of ﬁnding bundles of curve segments as straight as possible, one seeks after the system of curves minimizing the entropy E(γ1, . . . ,γN), orequivalentlymaximizing: ∫ Ω d˜(x) log ( d˜(x) ) dx. The reasonwhy this criterion gives the expected behaviorwill becomemore apparent after derivationof itsgradientat theendof thispart.Nevertheless,whenconsideringasingle trajectory, it is intuitivethat themostconcentrateddensitydistributionisobtainedwithastraightsegmentconnecting theendpoints: thispointwillbemaderigorous later. Letting beaperturbationof thecurveγj, suchthat (0)= (1)=0, theﬁrstorderexpansion of−E(γ1, . . . ,γN)willbecomputed inorder togetamaximizingdisplacementﬁeld, analogous to a gradient ascent (the choice has beenmade tomaximize the opposite of the entropy, so that the algorithmwillbeagradientascentone) in theﬁnitedimensional setting. Thenotation: ∂F ∂γj willbeused in thesequel todenote thederivativeofa functionFof thecurveγj in thesense that fora perturbation : F(γj+ )=F(γj)+ ∂F ∂γj ( )+o(‖ ‖2). Firstofall,pleasenote thatsince d˜has integraloneover thedomainΩ: ∫ Ω ∂d˜(x) ∂γj ( )dx=0 so that: − ∂ ∂γj E(γ1, . . . ,γN)( )= ∫ Ω ∂d˜(x) ∂γj ( ) log ( d˜(x) ) dx. (14) Starting fromtheexpressionof d˜given inEquation(7), theﬁrstorderexpansionof d˜withrespect to theperturbation ofγj isobtainedasasumofa termcomingfromthenumerator:∫ 1 0 K (‖x−γj(t)‖)‖γ′j(t)‖dt. (15) 394

back to the book Differential Geometrical Theory of Statistics"

Differential Geometrical Theory of Statistics

Title: Differential Geometrical Theory of Statistics
Authors: Frédéric Barbaresco; Frank Nielsen
Editor: MDPI
Location: Basel
Date: 2017
Language: English
License: CC BY-NC-ND 4.0
ISBN: 978-3-03842-425-3
Size: 17.0 x 24.4 cm
Pages: 476
Keywords: Entropy, Coding Theory, Maximum entropy, Information geometry, Computational Information Geometry, Hessian Geometry, Divergence Geometry, Information topology, Cohomology, Shape Space, Statistical physics, Thermodynamics
Categories: Naturwissenschaften Physik