Seite - 441 - in Differential Geometrical Theory of Statistics
Bild der Seite - 441 -
Text der Seite - 441 -
Entropy2016,18, 110
theGilbert–Varshamovcurve). Thisprovidesaprecisequantitativeboundtothepossiblespreadof
syntacticparameterscomparedto thesizeof the family, in termsof thenumberofdifferent languages
belongingto thesamehistorico-linguisticgroup.
However,wealsoshowthat, ifoneconsiderssetsof languages thatdonotbelongto thesame
historical-linguistic family, thenonecanobtaincodes that lieabovetheasymptoticbound,a fact that
reflects, in code theoretic terms, themuchgreatervariabilityof syntacticparameters. The result is
in itself not surprising, but thepointwewish tomake is that the theoryof error-correcting codes
provides a natural settingwhere quantitative statements of this sort can bemadeusingmethods
alreadydeveloped for thedifferent purposes of coding theory. We concludeby listing somenew
linguisticquestions thatarisebyconsideringtheparametriccomparisonmethodunder thiscoding
theoryperspective.
1.4. ComplexityofLanguagesandLanguageFamilies
Thestudyofnatural languagesfromthepointofviewofcomplexitytheoryhasbeenofsignificant
interest to linguists inrecentyears. Theapproaches typically followedfocusonassigningareasonable
measureofcomplexity to individual languagesandcomparingcomplexitiesacrossdifferent languages.
For example, anotionofmorphological complexitywas studied in [17]. Anapproach todefining
Kolmogorov complexity of languages on the basis of syntactic parameterswasdeveloped in [18].
A notion of language complexity based on the production rules of a generative grammar was
considered in [19], in the setting of (finite) formal languages. For amore general computational
perspectiveon thecomplexityofnatural languages, see [20]. The ideaofdistinguishing languages
bycomplexity isnotwithoutcontroversy inLinguistics.Avery interestinggeneraldiscussionof the
problemanditsevolution in thefieldcanbefoundin[21].
In thepresentpaper,weargue in favorofa somewhatdifferentperspective,whereweassign
anestimateofcomplexitynot to individual languagesbut togroupsof languages,andinparticular
(historical) languagefamilies.Ourversionofcomplexity ismeasuringhow“spreadout” thesyntactic
parameterscanbe,across the languages thatbelongto thesamefamily.Asweoutlinedintheprevious
subsections, this ismeasuredbyassigningto the languagefamilyacode,whosecodewordsrecordthe
syntacticparametersof the individual languages in thefamily, thencomputing itscodeparameters
andevaluating thepositionof the resultingcodepointswith respect to curves like theasymptotic
boundor theGilbert–Varshamovline. Thereasonwhythispositioncarriescomplexity information
lies in thesubtle relationbetweentheasymptoticboundandKolmogorovcomplexity, recentlyderived
byManinandtheauthor in [16],whichwewill reviewbriefly in thispaper.
2. LanguageFamiliesasCodes
ThePrinciplesandParametersmodelofLinguisticsassigns toeverynatural languageLasetof
binaryvaluesparameters thatdescribepropertiesof thesyntactic structureof the language.
LetFbea language family, bywhichwemeanafinitecollectionF= {L1, . . . ,Lm}of languages.
Thismaycoincidewitha family in thehistorical sense, suchas the Indo-Europeanfamily,orasmaller
subsetof languagesrelatedbyhistoricoriginanddevelopment (e.g., the Indo-Iranian,orBalto–Svalic
languages), or simplyanycollectionof languagesone is interested incomparingat theparametric
level, even if theyarespreadacrossdifferenthistorical families.
Wedenotebynbethenumberofparametersused in theparametriccomparisonmethod.Wedo
notfix,apriori, avalue forn, andweconsider itavariableof themodel.Wewilldiscussbelowhow
oneviews, inourperspective, the issueof the independenceofparameters.
Afterfixinganenumerationof theparameters, that is, abijectionbetweenthesetofparameters
andtheset{1,. . . ,n},weassociate toa language familyFacodeC=C(F) inFn2,withonecodeword
foreachlanguageL∈F,withthecodewordw=w(L)givenbythelistofparametersw=(x1, . . . ,xn),
xi∈F2 of the language. Forsimplicityofnotation,we justwriteL for thewordw(L) in the following.
441
Differential Geometrical Theory of Statistics
- Titel
- Differential Geometrical Theory of Statistics
- Autoren
- Frédéric Barbaresco
- Frank Nielsen
- Herausgeber
- MDPI
- Ort
- Basel
- Datum
- 2017
- Sprache
- englisch
- Lizenz
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03842-425-3
- Abmessungen
- 17.0 x 24.4 cm
- Seiten
- 476
- Schlagwörter
- Entropy, Coding Theory, Maximum entropy, Information geometry, Computational Information Geometry, Hessian Geometry, Divergence Geometry, Information topology, Cohomology, Shape Space, Statistical physics, Thermodynamics
- Kategorien
- Naturwissenschaften Physik