Page - 452 - in Differential Geometrical Theory of Statistics
Image of the Page - 452 -
Text of the Page - 452 -
Entropy2016,18, 110
To construct inductively Am+1 andBm+1, given Am andBm, one takes Am+1 to consist of the
elements in the list
Lm+1={yâ f(X) : νâ1Y (y)â¤Nm+1, âxâX, withy= f(x)andn(x)=m+1}.
Hereone invokes theoracle,whichensures that, if suchxexists, then itmustbecontainedina
ďŹnite listofpointsxâXwithboundedcomplexity
KTU(xm)⤠c ¡νâ1Y (y)m log(νâ1Y (y)m).
One then takesBm+1 toconsistof theremainingelements in the listLm+1.Werefer thereader
to [16] foramoredetailedformulation.
Moregenerally, theargumentof [16] recalledaboveshowsthat, forarecursive function f :Z+â
Q,determiningwhichvalueshave inďŹnitemultiplicities is computablegivenanoracle thatenumerate
integers inorderofKolmogorovcomplexity.
Asdiscussedin[16,24], theasymptoticboundcanalsobeseenas thephase transitioncurvefor
aquantumstatisticalmechanical systemconstructedoutof thespaceof codes,where thepartition
functionof thesystemweightscodesaccordingto theirKolmogorovcomplexity. This isasclose toa
âstatisticaldescriptionâof theasymptoticboundthatonecanachieve.
In comparisonwith thebehaviorof randomcodes (codeswhosecomplexity is comparable to
their size),whichconcentrate in theregionboundedbytheGilbertâVarshamovline,whenordering
codesbycomplexity,non-randomcodesof lowercomplexitypopulate theregionabove,withcode
pointsaccumulating in the intermediate regionboundedbytheasymptoticbound. That intermediate
regionthus, inasense, reďŹects thedifferencebetweenShannonentropyandcomplexity.
3.5. EntropyandComplexityEstimates forLanguageFamilies
Onthebasisof theconsiderationsof theprevioussectionsandof theresultsof [16,24] recalled
above,weproposeawaytoassignaquantitativeestimateofentropyandcomplexity toagivensetof
natural languages.
Asbefore letC= {L1, . . . ,Lk}be abinary (or ternary) codewhere the codewords Li are the
binary (ternary) strings of syntactic parameters of a set of languages Li. WedeďŹne the entropyof
the languagefamily{L1, . . . ,Lk}as theq-aryShannonentropyHq(δ(C)),whereq iseither2or3 for
binaryor ternarycodes,andδ(C) is therelativeminimumdistanceparameterof thecodeC.Wealso
deďŹnethe entropygapof the languagefamily{L1, . . . ,Lk}as thevalueofHq(δ(C))â1+R(C),which
measures thedistanceof thecodepoint (R(C),δ(C)) fromtheGilbertâVarshamovline, that is, from
thebehaviorofa typical randomcode.
Asasourceofestimatesof complexityofa language family{L1, . . . ,Lk}onecanconsiderany
upperboundonKolmogorovcomplexityof thecodeC. Apossibleapproach,whichcontainsmore
linguistic input,wouldbetoprovideestimatesofcomplexityforeachindividual languageinthefamily
and then compare these. Estimates of complexity for individual languageshavebeen considered
in the literature, some of thembased on the description of languages in terms of their syntactic
parameters. For instance, following[18], forasyntacticparameterÎ withpossiblevaluesvâ{Âą1},
theKolmogorovcomplexityofÎ set tovaluev isgivenby
K(Î = v)= min
Ď expressingÎ KTU(Ď),
withtheminimumtakenoverthecomplexitiesofall theparsetreesthatexpressthesyntacticparameter
Î andrequireÎ = v tobegrammatical in the language.Notice that, in thisapproach, thesyntactic
parametersarenot just regardedasbinaryor ternaryvalues,butoneneeds toconsideractualparse
treesofsentences in the languagethatexpress theparameter. Thus, suchanapproachtocomplexity
452
Differential Geometrical Theory of Statistics
- Title
- Differential Geometrical Theory of Statistics
- Authors
- FrĂŠdĂŠric Barbaresco
- Frank Nielsen
- Editor
- MDPI
- Location
- Basel
- Date
- 2017
- Language
- English
- License
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03842-425-3
- Size
- 17.0 x 24.4 cm
- Pages
- 476
- Keywords
- Entropy, Coding Theory, Maximum entropy, Information geometry, Computational Information Geometry, Hessian Geometry, Divergence Geometry, Information topology, Cohomology, Shape Space, Statistical physics, Thermodynamics
- Categories
- Naturwissenschaften Physik