Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

Languages that originate from a common ancestor are genetically related, words are the core of any language and cognates are words sharing the same ancestor and etymology. The evolutionary history of language, therefore, may be discovered by cognate identification and estimated by phylogenetic inference. Using several techniques originally designed for biological sequence analysis, an orthographic learning system for measuring string similarity has been developed and successfully applied to these tasks. Using PAM-like matrices, the system has outperformed the best comparable phonetic and orthographic cognate identification models previously reported in the literature, with results statistically significant and remarkably stable, regardless of the variation of the training dataset dimension. The method has also inferred high-quality Indo-European phylogenies, which are compatible with the benchmark tree and reproduce correctly all the established major language groups and subgroups present in the dataset. This book focuses on computational historical linguistics, but its contribution is also relevant to computational linguistics and natural language processing.

More information




LAP LAMBERT Academic Publishing

Publication Date