Seite - 440 - in Differential Geometrical Theory of Statistics
Bild der Seite - 440 -
Text der Seite - 440 -
Entropy2016,18, 110
structure of a given language. Their universalitymakes it possible to obtain comparisons, at the
syntactic level,betweenarbitrarypairsofnatural languages.
APCMwas introduced in [2]asaquantitativemethodinhistorical linguistics, forcomparisonof
languageswithinandacrosshistorical familiesat thesyntactic insteadof the lexical level. Evidence
wasgiven in [3,4] that thePCMgives reliable informationonthephylogenetic treeof the familyof
Indo-Europeanlanguages.
ThePCMreliesessentiallyonconstructingametriconafamilyof languagesbasedontherelative
Hammingdistancebetweenthesetsofparametersasameasureof relatedness. Thephylogenetic tree
is thenconstructedonthebasisof thisdatumofrelativedistances, see [3].
Moreworkonsyntacticphylogenetic reconstructions, involvinga larger setof languagesand
parameters isongoing, [5]. Syntacticparametersofworld languageshavealsobeenusedrecently for
investigationsonthe topologyandgeometryofsyntactic structuresandforstatisticalphysicsmodels
of languageevolution, [6–8].
Publiclyavailabledataofsyntacticparametersofworldlanguagescanbeobtainedfromdatabases
such as Syntactic Structures ofWorld Languages (SSWL) [9] or TerraLing [10] orWorldAtlas of
LanguageStructures (WALS)[11]. Thedataofsyntacticparametersusedinthepresentpaperaretaken
fromTableAof [3].
1.2. SyntacticParameters,CodesandCodeParameters
Our purpose in this paper is to connect the PCM approach to the mathematical theory of
error-correcting codes. We associate a code to anygroupof languages onewishes to analyze via
thePCM,whichhasonecodewordforeach language. Ifoneusesanumbernofsyntacticparameters,
thenthecodeCsits inthespaceFn2,wheretheelementsofF2={0,1}correspondtothetwo∓possible
valuesofeachparameter,andthecodewordofa language is thestringofvaluesof itsnparameters.
WealsoconsideraversionwithcodesonanalphabetF3 of three letterswhichallowsfor thepossibility
that someof theparametersmaybemade irrelevantbyentailment fromotherparameters. In thiscase
weuse the letter0∈F3 for the irrelevantparametersandthenonzerovalues±1for theparameters
thatareset in the language.
Inthetheoryoferror-correctingcodes,see[12],oneassignstoacodeC⊂Fnq twocodeparameters:
R= logq(#C)/n, the transmissionrateof thecode,andδ= d/n therelativeminimumdistanceof the
code,where d is themiminumHammingdistancebetweenpairsofdistinct codewords. It iswell
knownincodingtheory that“goodcodes”are those thatmaximizebothparameters, compatiblywith
several constraints relatingRandδ. Consider the function f :Cq→ [0,1]2 fromthespaceCqofq-ary
codes to theunit square, thatassigns toacodeC its codeparameters, f(C)= (δ(C),R(C)). Apoint
(δ,R) in therangeof f hasfinite (respectively, infinite)multiplicity if thepreimage f−1(δ,R) isafinite
set (respectively,an infiniteset). Itwasprovedin[13] that there isacurveR=αq(δ) in thespaceof
codeparameters, theasymptoticbound, that separatescodepoints thatfilladenseregionandthat
have infinitemultiplicity fromisolatedcodepoints thatonlyhavefinitemultiplicity. Thesebetterbut
moreelusivecodesare typicallyobtainedthroughalgebro-geometricconstructions, see [13–15]. The
asymptoticboundwasrelatedtoKolmogorovcomplexity in [16].
1.3. PositionwithRespect to theAsymptoticBound
Givenacollectionof languagesonewants tocompare throughtheir syntacticparameters,onecan
asknaturalquestionsabout thepositionof theresultingcode in thespaceofcodeparametersandwith
respect to theasymptoticbound. The theoryoferror correctingcodes tellsus that codesabove the
asymptoticboundareveryrare. Indeed,weconsideredvarioussetsof languages,andforeachchoice
ofasetof languagesweconsideredanassociatedcode,withacodewordforeach language in theset,
givenbyits listof syntacticparameters.Whencomputingthecodeparametersof theresultingcode,
onefinds that, inarangeofcaseswelookedat,whenthe languages in thechosensetbelongto the
samehistorical-linguistic family theresultingcode liesbelowtheasymptoticbound(andin factbelow
440
Differential Geometrical Theory of Statistics
- Titel
- Differential Geometrical Theory of Statistics
- Autoren
- Frédéric Barbaresco
- Frank Nielsen
- Herausgeber
- MDPI
- Ort
- Basel
- Datum
- 2017
- Sprache
- englisch
- Lizenz
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03842-425-3
- Abmessungen
- 17.0 x 24.4 cm
- Seiten
- 476
- Schlagwörter
- Entropy, Coding Theory, Maximum entropy, Information geometry, Computational Information Geometry, Hessian Geometry, Divergence Geometry, Information topology, Cohomology, Shape Space, Statistical physics, Thermodynamics
- Kategorien
- Naturwissenschaften Physik