Seite - 394 - in Differential Geometrical Theory of Statistics
Bild der Seite - 394 -
Text der Seite - 394 -
Entropy2016,9, 337
as the cost ofmoving the smoothed density around γ1 to the uniformdistribution on the curve,
thenmoving γ1 to γ2, keepingpointswith equal scaled arclength in correspondence, andfinally,
movingtheuniformdistributiononγ2 to thesmootheddensity.
Havingthedensityathand, theentropyof thesystemofcurvesγ1, . . . ,γN isdefinedtheusual
wayas:
E(γ1, . . . ,γN)=− ∫
Ω d˜(x) log (
d˜(x) )
dx.
The entropy isdependenton theparticular choiceof thekernelK. Asmentionedbefore, it is
acommonpractice in thefieldofnon-parametricstatistics to introduceatuningparameterν>0 in
thekernel, calledbandwidth, so that it is expressedasa scaledversionK= fν of agiven function
f :R+→R+. Thevalueofν is themost influentialparameter intheestimationofthedensityandmust
beselectedcarefully. Forcurveclusteringapplications, it isdefinedbythedesired interaction length:
ifν tends tozero, thecurveswillbehaveas independentobjects,whileontheotherendof thescale,
veryhighbandwidthwill tendtoremovethe influenceof thecurves themselves. For themoment,no
automatedmeansoffindinganoptimalνwasused,althoughitwillbepartofa futurework.
2.4.Minimizing theEntropy
In order to fulfill the initial requirement of finding bundles of curve segments as straight
as possible, one seeks after the system of curves minimizing the entropy E(γ1, . . . ,γN),
orequivalentlymaximizing: ∫
Ω d˜(x) log (
d˜(x) )
dx.
The reasonwhy this criterion gives the expected behaviorwill becomemore apparent after
derivationof itsgradientat theendof thispart.Nevertheless,whenconsideringasingle trajectory, it is
intuitivethat themostconcentrateddensitydistributionisobtainedwithastraightsegmentconnecting
theendpoints: thispointwillbemaderigorous later.
Letting beaperturbationof thecurveγj, suchthat (0)= (1)=0, thefirstorderexpansion
of−E(γ1, . . . ,γN)willbecomputed inorder togetamaximizingdisplacementfield, analogous to
a gradient ascent (the choice has beenmade tomaximize the opposite of the entropy, so that the
algorithmwillbeagradientascentone) in thefinitedimensional setting. Thenotation:
∂F
∂γj
willbeused in thesequel todenote thederivativeofa functionFof thecurveγj in thesense that fora
perturbation :
F(γj+ )=F(γj)+ ∂F
∂γj ( )+o(‖ ‖2).
Firstofall,pleasenote thatsince d˜has integraloneover thedomainΩ:
∫
Ω ∂d˜(x)
∂γj ( )dx=0
so that:
− ∂
∂γj E(γ1, . . . ,γN)( )= ∫
Ω ∂d˜(x)
∂γj ( ) log (
d˜(x) )
dx. (14)
Starting fromtheexpressionof d˜given inEquation(7), thefirstorderexpansionof d˜withrespect
to theperturbation ofγj isobtainedasasumofa
termcomingfromthenumerator:∫
1
0 K (‖x−γj(t)‖)‖γ′j(t)‖dt. (15)
394
Differential Geometrical Theory of Statistics
- Titel
- Differential Geometrical Theory of Statistics
- Autoren
- Frédéric Barbaresco
- Frank Nielsen
- Herausgeber
- MDPI
- Ort
- Basel
- Datum
- 2017
- Sprache
- englisch
- Lizenz
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03842-425-3
- Abmessungen
- 17.0 x 24.4 cm
- Seiten
- 476
- Schlagwörter
- Entropy, Coding Theory, Maximum entropy, Information geometry, Computational Information Geometry, Hessian Geometry, Divergence Geometry, Information topology, Cohomology, Shape Space, Statistical physics, Thermodynamics
- Kategorien
- Naturwissenschaften Physik