Page - 439 - in Differential Geometrical Theory of Statistics
Image of the Page - 439 -
Text of the Page - 439 -
entropy
Article
SyntacticParametersandaCodingTheory
PerspectiveonEntropyandComplexityof
LanguageFamilies
MatildeMarcolli
DepartmentofMathematics,California InstituteofTechnology,Pasadena,CA91125,USA;matilde@caltech.edu;
Tel.: +1-626-395-4326
AcademicEditors: FrédéricBarbaresco,FrankNielsenandKevinH.Knuth
Received: 14 January2016;Accepted: 18March2016;Published: 7April2016
Abstract: Wepresentasimplecomputationalapproachtoassigningameasureofcomplexityand
information/entropyto familiesofnatural languages,basedonsyntacticparametersandthe theory
oferrorcorrectingcodes.Weassociate toeach languageabinarystringofsyntacticparametersand
toa languagefamilyabinarycode,withcodewords thebinarystringassociatedtoeach language.
Wethenevaluate thecodeparameters (rateandrelativeminimumdistance)andthepositionof the
parameterswithrespect totheasymptoticboundoferrorcorrectingcodesandtheGilbertâVarshamov
bound. These bounds are, respectively, related to theKolmogorov complexity and the Shannon
entropy of the code and this gives us a computationally simpleway to obtain estimates on the
complexity and information, not of individual languages but of language families. This notion
of complexity is related, fromthe linguisticpoint ofview to thedegreeofvariabilityof syntactic
parameteracross languagesbelongingto thesame(historical) family.
Keywords: syntax;principlesandparameters;error-correctingcodes;asymptoticbound;Kolmogorov
complexity;GilbertâVarshamovbound;Shannonentropy
1. Introduction
Weproposeanapproach,basedonLongobardiâsparametriccomparisonmethod(PCM)andthe
theoryoferror-correctingcodes, toaquantitativeevaluationof theâcomplexityâofa languagefamily.
Oneassociates toacollectionof languages tobeanalyzedwith thePCMabinary (or ternary) code
withonecodewordforeach language in the familyandeachwordconsistingof thebinaryvaluesof
thesyntacticparametersof that language. Theternarycaseallowsforanadditionalparameterstate
that takes intoaccountcertainphenomenaofentailmentofparameters.Wethenconsideradifferent
kindofparameters: thecodeparametersof the resultingcode,which incoding theoryaccount for
theefïŹciencyof the codinganddecodingprocedures. These canbecomparedwith someclassical
boundsofcodingtheory: theasymptoticbound, theGilbertâVarshamov(GV)bound, etc. Theposition
of thecodeparameterswithrespect tosomeof theseboundsprovidesquantitative informationonthe
variabilityof syntacticparameterswithinandacrosshistorical-linguistic families.Whilecomputations
carriedout for languagesbelonging to the samehistorical familyyield codesbelowtheGVcurve,
comparisonsacrossdifferenthistorical familiescangiveexamplesof isolatedcodes lyingabovethe
asymptoticbound.
1.1. Principles andParameters
Thegenerativeapproach to linguistics relieson thenotionof aUniversalGrammar (UG)and
arelateduniversal listof syntacticparameters. In thePrinciplesandParametersmodel,developed
since [1], these are thoughtof asbinaryvaluedparametersor âswitchesâ that set thegrammatical
Entropy2016,18, 110 439 www.mdpi.com/journal/entropy
Differential Geometrical Theory of Statistics
- Title
- Differential Geometrical Theory of Statistics
- Authors
- Frédéric Barbaresco
- Frank Nielsen
- Editor
- MDPI
- Location
- Basel
- Date
- 2017
- Language
- English
- License
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03842-425-3
- Size
- 17.0 x 24.4 cm
- Pages
- 476
- Keywords
- Entropy, Coding Theory, Maximum entropy, Information geometry, Computational Information Geometry, Hessian Geometry, Divergence Geometry, Information topology, Cohomology, Shape Space, Statistical physics, Thermodynamics
- Categories
- Naturwissenschaften Physik