Tracking Typological Traits of Uralic Languages in Distributed Language Representations

Johannes Bjerva , Isabelle Augenstein

1 Jan 2018

PDF Project Slides

Abstract

English Abstract: Although linguistic typology has a long history, computational approaches have only recently gained popularity. The use of distributed representations in computational linguistics has also become increasingly popular. A recent development is to learn distributed representations of language, such that typologically similar languages are spatially close to one another. Although empirical successes have been shown for such language representations, they have not been subjected to much typological probing. In this paper, we first look at whether this type of language representations are empirically useful for model transfer between Uralic languages in deep neural networks. We then investigate which typological features are encoded in these representations by attempting to predict features in the World Atlas of Language Structures, at various stages of fine-tuning of the representations. We focus on Uralic languages, and find that some typological traits can be automatically inferred with accuracies well above a strong baseline. Finnish Abstract: Vaikka kielitypologialla on pitkä historia, siihen liittyvät laskennalliset menetelmät ovat vasta viime aikoina saavuttaneet suosiota. Myös hajautettujen representaatioiden käyttö laskennallisessa kielitieteessä on tullut yhä suositummaksi. Viimeaikainen kehitys alalla on oppia kielestä hajautettu representaatio, joka esittää samankaltaiset kielet lähellä toisiaan. Vaikka kyseiset representaatiot nauttivatkin empiiristä menestystä, ei niitä ole huomattavasti tutkittu typologisesti. Tässä artikkelissa tutkitaan, ovatko tällaiset kielirepresentaatiot empiirisesti käyttökelpoisia uralilaisten kielten välisissä mallimuunnoksissa syvissä neuroverkoissa. Pyrkimällä ennustamaan piirteitä extit{World Atlas of Language Structures}-tietokannassa tutkimme, mitä typologisia ominaisuuksia nämä representaatiot sisältävät. Keskityimme uralilaisiin kieliin ja huomasimme, että jotkin typologiset ominaisuudet voidaan automaattisesti päätellä tarkkuudella, joka ylittää selvästi vahvan perustason.

Type

Conference paper

Publication

In Fourth International Workshop on Computational Linguistics for Uralic Languages.

Date

January, 2018