Page - 445 - in Differential Geometrical Theory of Statistics
Image of the Page - 445 -
Text of the Page - 445 -
Entropy2016,18, 110
TheGilbert–Varshamovcurvecanbecharacterizedintermsofthebehaviorofsufficientlyrandom
codes, in thesenseof theShannonRandomCodeEnsemble, see [26,27],while theasymptoticbound
canbecharacterized in termsofKolmogorovcomplexity, see [16].
2.5. CodeParametersofLanguageFamilies
Fromthecodingtheoryviewpoint, it isnatural toaskwhether therearecodesC, formedoutofa
choiceofacollectionofnatural languagesandtheir syntacticparameters,whosecodeparameters lie
abovetheasymptoticboundcurveR=α2(δ).
For instance,acodeCwhosecodeparametersviolate thePlotkinbound(5)mustbean isolated
codeabovetheasymptoticbound.ThismeansconstructingacodeCwithδ≥1/2, that is, suchthat
anypairofcodewordsw =w′ ∈Cdifferbyat leasthalfof theparameters.Adirectexaminationof
the listofparameters inTableAof [3]andFigure7of [4] showsthat it isverydifficult tofind,within
thesamehistorical linguistic family (e.g., the Indo-Europeanfamily)pairsof languagesL1,L2with
δH(L1,L2)≥ 1/2. For example, among the syntactic relativedistances listed inFigure7of [4] one
findsonly thepair (Farsi,Romanian)witha relativedistanceof 0.5. Otherpairs comeclose to this
value, forexampleFarsiandFrenchhavearelativedistanceof0.483,butFrenchandRomanianonly
differby0.162.
Onehasbetterchancesofobtainingcodesabovetheasymptoticboundifonecompares languages
thatarenotsocloselyrelatedat thehistorical level.
Example 2. Consider the set C = {L1,L2,L3}with languages L1 = Arabic, L2 = Wolof, and
L3=Basque.Weexcludefromthe listofTableAof [3]all thoseparameters thatareentailedandmade
irrelevantbysomeotherparameter inat leastoneof these threechosen languages. Thisgivesusa list
of25remainingparameters,whichare thosenumberedas1–5,7, 10,20–21,25,27–29,31–32,34,37,42,
50–53,55–57 in [3], andthe followingthreecodewords:
L1 1 1 1 1 1 1 0 1 0 1 0 1 0 1 1 1 1 1 1 0 1 0 0 0 0
L2 1 1 1 0 0 1 1 0 1 0 1 0 0 1 0 1 1 0 0 1 1 1 1 1 1
L3 1 1 0 1 0 0 1 0 0 0 1 1 1 0 1 1 0 1 1 1 1 1 1 0 0
This example, although very simple and quite artificial in the choice of languages, already
suffices toproduceacodeC that liesabovetheasymptoticbound. In fact,wehavedH(L1,L2)=16,
dH(L2,L3)=13anddH(L1,L3)=13, so thatδ=0.52. SinceR>0, thecodepoint (δ,R)violates the
Plotkinbound,hence it liesabovetheasymptoticbound.
Itwouldbemore interesting tofinda codeC consisting of languages belonging to the same
historical-linguistic family (outsideof the Indo-Europeangroup), that liesabovetheasymptoticbound.
Suchexampleswouldcorrespond to linguistic families that exhibit avery strongvariabilityof the
syntacticparameters, inawaythat isquantifiable throughthepropertiesofCasacode.
Ifoneconsiders the22 Indo-European languages in [3]with theirparameters,oneobtainsacode
C that isbelowtheGilbert–Varshamovline,hencebelowtheasymptoticboundbyEquation(8).Afew
otherexamples, takenfromothernonIndo-Europeanhistorical-linguistic families, computedusing
thoseparameters reported in theSSWLdatabase (forexample thesetofMalayo–Polynesian languages
currentlyrecordedinSSWL)alsogivecodeswhosecodeparameters liebelowtheGilbert–Varshamov
curve. One can conjecture that any codeC constructedout of natural languages belonging to the
same historical-linguistic family will be below the asymptotic bound (or perhaps below the GV
bound),whichwouldprovideaquantitativeboundonthepossible spreadofsyntacticparameters
within a historical family, given the size of the family. Examples like the simple one constructed
above,using languagesnotbelongingto thesamehistorical familyshowthat, to thecontrary,across
different historical families one encounters a greater variability of syntactic parameters. To our
knowledge,nosystematic studyofparametervariability fromthiscodingtheoryperspectivehasbeen
implementedsofar.
445
Differential Geometrical Theory of Statistics
- Title
- Differential Geometrical Theory of Statistics
- Authors
- Frédéric Barbaresco
- Frank Nielsen
- Editor
- MDPI
- Location
- Basel
- Date
- 2017
- Language
- English
- License
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03842-425-3
- Size
- 17.0 x 24.4 cm
- Pages
- 476
- Keywords
- Entropy, Coding Theory, Maximum entropy, Information geometry, Computational Information Geometry, Hessian Geometry, Divergence Geometry, Information topology, Cohomology, Shape Space, Statistical physics, Thermodynamics
- Categories
- Naturwissenschaften Physik