Web-Books
in the Austria-Forum
Austria-Forum
Web-Books
Naturwissenschaften
Physik
Differential Geometrical Theory of Statistics
Page - 445 -
  • User
  • Version
    • full version
    • text only version
  • Language
    • Deutsch - German
    • English

Page - 445 - in Differential Geometrical Theory of Statistics

Image of the Page - 445 -

Image of the Page - 445 - in Differential Geometrical Theory of Statistics

Text of the Page - 445 -

Entropy2016,18, 110 TheGilbert–Varshamovcurvecanbecharacterizedintermsofthebehaviorofsufficientlyrandom codes, in thesenseof theShannonRandomCodeEnsemble, see [26,27],while theasymptoticbound canbecharacterized in termsofKolmogorovcomplexity, see [16]. 2.5. CodeParametersofLanguageFamilies Fromthecodingtheoryviewpoint, it isnatural toaskwhether therearecodesC, formedoutofa choiceofacollectionofnatural languagesandtheir syntacticparameters,whosecodeparameters lie abovetheasymptoticboundcurveR=α2(δ). For instance,acodeCwhosecodeparametersviolate thePlotkinbound(5)mustbean isolated codeabovetheasymptoticbound.ThismeansconstructingacodeCwithδ≥1/2, that is, suchthat anypairofcodewordsw =w′ ∈Cdifferbyat leasthalfof theparameters.Adirectexaminationof the listofparameters inTableAof [3]andFigure7of [4] showsthat it isverydifficult tofind,within thesamehistorical linguistic family (e.g., the Indo-Europeanfamily)pairsof languagesL1,L2with δH(L1,L2)≥ 1/2. For example, among the syntactic relativedistances listed inFigure7of [4] one findsonly thepair (Farsi,Romanian)witha relativedistanceof 0.5. Otherpairs comeclose to this value, forexampleFarsiandFrenchhavearelativedistanceof0.483,butFrenchandRomanianonly differby0.162. Onehasbetterchancesofobtainingcodesabovetheasymptoticboundifonecompares languages thatarenotsocloselyrelatedat thehistorical level. Example 2. Consider the set C = {L1,L2,L3}with languages L1 = Arabic, L2 = Wolof, and L3=Basque.Weexcludefromthe listofTableAof [3]all thoseparameters thatareentailedandmade irrelevantbysomeotherparameter inat leastoneof these threechosen languages. Thisgivesusa list of25remainingparameters,whichare thosenumberedas1–5,7, 10,20–21,25,27–29,31–32,34,37,42, 50–53,55–57 in [3], andthe followingthreecodewords: L1 1 1 1 1 1 1 0 1 0 1 0 1 0 1 1 1 1 1 1 0 1 0 0 0 0 L2 1 1 1 0 0 1 1 0 1 0 1 0 0 1 0 1 1 0 0 1 1 1 1 1 1 L3 1 1 0 1 0 0 1 0 0 0 1 1 1 0 1 1 0 1 1 1 1 1 1 0 0 This example, although very simple and quite artificial in the choice of languages, already suffices toproduceacodeC that liesabovetheasymptoticbound. In fact,wehavedH(L1,L2)=16, dH(L2,L3)=13anddH(L1,L3)=13, so thatδ=0.52. SinceR>0, thecodepoint (δ,R)violates the Plotkinbound,hence it liesabovetheasymptoticbound. Itwouldbemore interesting tofinda codeC consisting of languages belonging to the same historical-linguistic family (outsideof the Indo-Europeangroup), that liesabovetheasymptoticbound. Suchexampleswouldcorrespond to linguistic families that exhibit avery strongvariabilityof the syntacticparameters, inawaythat isquantifiable throughthepropertiesofCasacode. Ifoneconsiders the22 Indo-European languages in [3]with theirparameters,oneobtainsacode C that isbelowtheGilbert–Varshamovline,hencebelowtheasymptoticboundbyEquation(8).Afew otherexamples, takenfromothernonIndo-Europeanhistorical-linguistic families, computedusing thoseparameters reported in theSSWLdatabase (forexample thesetofMalayo–Polynesian languages currentlyrecordedinSSWL)alsogivecodeswhosecodeparameters liebelowtheGilbert–Varshamov curve. One can conjecture that any codeC constructedout of natural languages belonging to the same historical-linguistic family will be below the asymptotic bound (or perhaps below the GV bound),whichwouldprovideaquantitativeboundonthepossible spreadofsyntacticparameters within a historical family, given the size of the family. Examples like the simple one constructed above,using languagesnotbelongingto thesamehistorical familyshowthat, to thecontrary,across different historical families one encounters a greater variability of syntactic parameters. To our knowledge,nosystematic studyofparametervariability fromthiscodingtheoryperspectivehasbeen implementedsofar. 445
back to the  book Differential Geometrical Theory of Statistics"
Differential Geometrical Theory of Statistics
Title
Differential Geometrical Theory of Statistics
Authors
Frédéric Barbaresco
Frank Nielsen
Editor
MDPI
Location
Basel
Date
2017
Language
English
License
CC BY-NC-ND 4.0
ISBN
978-3-03842-425-3
Size
17.0 x 24.4 cm
Pages
476
Keywords
Entropy, Coding Theory, Maximum entropy, Information geometry, Computational Information Geometry, Hessian Geometry, Divergence Geometry, Information topology, Cohomology, Shape Space, Statistical physics, Thermodynamics
Categories
Naturwissenschaften Physik
Web-Books
Library
Privacy
Imprint
Austria-Forum
Austria-Forum
Web-Books
Differential Geometrical Theory of Statistics