Page - 453 - in Differential Geometrical Theory of Statistics
Image of the Page - 453 -
Text of the Page - 453 -
Entropy2016,18, 110
has the advantage that it is very rich in linguistic information. However, it is at the same time
computationallyverydifficult torealize.
Whatweareproposinghere is amuchsimplerway toobtainanestimateof complexity for a
languagefamily{L1, . . . ,Lk},which isnotbasedonestimatingcomplexityof the individual languages
in the family,butwhich isaimedatdetectinghowspreadoutanddiversifiedthesyntacticparameters
areacross the family,byestimatingthepositionof thecodepoint (R(C),δ(C))of theassociatedcode
Cwithrespect to theasymptoticboundR=αq(δ). Thiscanbeestimatedintermsof thedistance to
othercurves in thespaceofcodeparameters (R,δ) thatconstrain theasymptoticboundfromabove
andbelow, suchas thePlotkinbound,Hammingbound, andGilbert–Varshamovbound, as in the
examplesdiscussed in theprevioussections.
4.Conclusions
Weproposedanapproachtoestimatingentropyandcomplexityofgroupsofnatural languages
(language families), basedon the linguisticparametric comparisonmethod(PCM)of [2,22]via the
mathematical theoryof error-correcting codes, by assigning a code to a family of languages to be
analyzedwith thePCM,andinvestigating itsposition in thespaceofcodeparameters,withrespect to
theasymptoticboundandtheGVbound.Wehaveshownthat thereareexamplesof languagesnot
belongingtothesamehistorical-linguistic familythatyield isolatedcodesabovetheasymptoticbound,
while languagesbelongingto thesamehistorical-linguistic familyappear togiverise tocodesbelow
thebound, thoughamoresystematicanalysiswouldbeneededtomapcodeparametersofdifferent
languagegroups.Wehavealsoshownthat, fromthesecodingtheoryperspective, it ispreferable to
excludefromthePCMall thoseparameters thatareentailedandmade irrelevantbyotherparameters,
as thosespoil thepropertiesof theresultingcodeandproducecodeparameters thatareartificially low
withrespect to theasymptoticboundandtheGVbound.
Theapproachproposedhere,basedonthePCMandthetheoryoferror-correctingcodes, suggests
a fewnewlinguisticquestions thatmaybesuitable for treatmentwithcodingtheorymethods:
1. Do languagesbelonging to thesamehistorical-linguistic familyalwaysyieldcodesbelowthe
asymptoticboundor theGVbound?Howoftendoes thesamehappenacrossdifferent linguistic
families?Howmuchcancodeparametersbe improvedbyeliminatingspoilingeffectscausedby
dependenciesandentailmentofsyntacticparameters?
2. Codesnear theGVcurveare typicallycomingfromtheShannonRandomCodeEnsemble,where
codewordsand lettersof codewordsbehave like independent randomvariables, see [26,27].
Are there familiesof languageswhoseassociatedcodesare locatednear theGVbound?Dotheir
syntacticparametersmimic theuniformPoissondistributionof randomcodes?
3. Theasymptoticboundforerror-correctingcodeswasrelated in [16] toKolmogorovcomplexity,
andthemeasureofcomplexity for language families thatweproposedhere isestimated in terms
of thepositionof thecodepointwithrespect to theasymptoticbound. Thereareothernotionsof
complexity,mostnotably the typeoforganizedcomplexitiesdiscussedin[33–35].Canthesebe
relatedto loci in thespaceofcodeparameters?Whatdotheserepresentwhenappliedtocodes
obtainedfromsyntacticparametersofasetofnatural languages?
4. Is thereamoredirect linguisticcomplexitymeasureassociatedtoa familyofnatural languages
thatwouldrelate to thepositionof theresultingcodeaboveorbelowtheasymptoticbound?
5. Codesandtheasymptoticboundinthespaceofcodeparameterswererecentlystudiedusing
methods fromquantumstatisticalmechanics, operator algebra and fractal geometry, [24,36].
Can some of these mathematical methods be employed in the linguistic parametric
comparisonmethod?
The observational results reported here are still preliminary. The following topics should
beconsolidated:
453
Differential Geometrical Theory of Statistics
- Title
- Differential Geometrical Theory of Statistics
- Authors
- Frédéric Barbaresco
- Frank Nielsen
- Editor
- MDPI
- Location
- Basel
- Date
- 2017
- Language
- English
- License
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03842-425-3
- Size
- 17.0 x 24.4 cm
- Pages
- 476
- Keywords
- Entropy, Coding Theory, Maximum entropy, Information geometry, Computational Information Geometry, Hessian Geometry, Divergence Geometry, Information topology, Cohomology, Shape Space, Statistical physics, Thermodynamics
- Categories
- Naturwissenschaften Physik