Seite - 452 - in Differential Geometrical Theory of Statistics
Bild der Seite - 452 -
Text der Seite - 452 -
Entropy2016,18, 110
To construct inductively Am+1 andBm+1, given Am andBm, one takes Am+1 to consist of the
elements in the list
Lm+1={y∈ f(X) : ν−1Y (y)≤Nm+1, ∃x∈X, withy= f(x)andn(x)=m+1}.
Hereone invokes theoracle,whichensures that, if suchxexists, then itmustbecontainedina
finite listofpointsx∈Xwithboundedcomplexity
KTU(xm)≤ c ·ν−1Y (y)m log(ν−1Y (y)m).
One then takesBm+1 toconsistof theremainingelements in the listLm+1.Werefer thereader
to [16] foramoredetailedformulation.
Moregenerally, theargumentof [16] recalledaboveshowsthat, forarecursive function f :Z+→
Q,determiningwhichvalueshave infinitemultiplicities is computablegivenanoracle thatenumerate
integers inorderofKolmogorovcomplexity.
Asdiscussedin[16,24], theasymptoticboundcanalsobeseenas thephase transitioncurvefor
aquantumstatisticalmechanical systemconstructedoutof thespaceof codes,where thepartition
functionof thesystemweightscodesaccordingto theirKolmogorovcomplexity. This isasclose toa
“statisticaldescription”of theasymptoticboundthatonecanachieve.
In comparisonwith thebehaviorof randomcodes (codeswhosecomplexity is comparable to
their size),whichconcentrate in theregionboundedbytheGilbert–Varshamovline,whenordering
codesbycomplexity,non-randomcodesof lowercomplexitypopulate theregionabove,withcode
pointsaccumulating in the intermediate regionboundedbytheasymptoticbound. That intermediate
regionthus, inasense, reflects thedifferencebetweenShannonentropyandcomplexity.
3.5. EntropyandComplexityEstimates forLanguageFamilies
Onthebasisof theconsiderationsof theprevioussectionsandof theresultsof [16,24] recalled
above,weproposeawaytoassignaquantitativeestimateofentropyandcomplexity toagivensetof
natural languages.
Asbefore letC= {L1, . . . ,Lk}be abinary (or ternary) codewhere the codewords Li are the
binary (ternary) strings of syntactic parameters of a set of languages Li. Wedefine the entropyof
the languagefamily{L1, . . . ,Lk}as theq-aryShannonentropyHq(δ(C)),whereq iseither2or3 for
binaryor ternarycodes,andδ(C) is therelativeminimumdistanceparameterof thecodeC.Wealso
definethe entropygapof the languagefamily{L1, . . . ,Lk}as thevalueofHq(δ(C))−1+R(C),which
measures thedistanceof thecodepoint (R(C),δ(C)) fromtheGilbert–Varshamovline, that is, from
thebehaviorofa typical randomcode.
Asasourceofestimatesof complexityofa language family{L1, . . . ,Lk}onecanconsiderany
upperboundonKolmogorovcomplexityof thecodeC. Apossibleapproach,whichcontainsmore
linguistic input,wouldbetoprovideestimatesofcomplexityforeachindividual languageinthefamily
and then compare these. Estimates of complexity for individual languageshavebeen considered
in the literature, some of thembased on the description of languages in terms of their syntactic
parameters. For instance, following[18], forasyntacticparameterΠwithpossiblevaluesv∈{±1},
theKolmogorovcomplexityofΠ set tovaluev isgivenby
K(Π= v)= min
τ expressingΠ KTU(τ),
withtheminimumtakenoverthecomplexitiesofall theparsetreesthatexpressthesyntacticparameter
ΠandrequireΠ= v tobegrammatical in the language.Notice that, in thisapproach, thesyntactic
parametersarenot just regardedasbinaryor ternaryvalues,butoneneeds toconsideractualparse
treesofsentences in the languagethatexpress theparameter. Thus, suchanapproachtocomplexity
452
Differential Geometrical Theory of Statistics
- Titel
- Differential Geometrical Theory of Statistics
- Autoren
- Frédéric Barbaresco
- Frank Nielsen
- Herausgeber
- MDPI
- Ort
- Basel
- Datum
- 2017
- Sprache
- englisch
- Lizenz
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03842-425-3
- Abmessungen
- 17.0 x 24.4 cm
- Seiten
- 476
- Schlagwörter
- Entropy, Coding Theory, Maximum entropy, Information geometry, Computational Information Geometry, Hessian Geometry, Divergence Geometry, Information topology, Cohomology, Shape Space, Statistical physics, Thermodynamics
- Kategorien
- Naturwissenschaften Physik