Multilingual Learning

27 Apr 2017

Multi-lingual learning is concerned with training models to work well for multiple languages, including low-resource ones. We research methods for enabling information sharing between multiple languages, and also study how to utilise typological knowledge bases to this end. We are currently involved in three larger funded projects on this.

Multi3Generation is a COST Action that funds collaboration of researchers in Europe and abroad. The project is coordinated by Isabelle Augenstein, and its goals are to study language generation using multi-task, multilingual and multi-modal signals.

We are further partner in a research project funded by the Swedish Research Council coordinated by Robert Östling. Its goals are to study structured multilinguality, i.e. the idea of using language representations and typological knowledge bases to guide which information to share between specific languages.

Lastly, Andrea Lekkas’ industrial PhD project with Ordbogen, supported by Innovation Fund Denmark, focuses on multilingual language modelling for developing writing assistants.

lld multilingual-learning

Publications

Quantifying Gender Bias Towards Politicians in Cross-Lingual Language Models

While the prevalence of large pre-trained language models has led to significant improvements in the performance of NLP systems, recent …

Karolina Stańczak, Sagnik Ray Choudhury, Tiago Pimentel, Ryan Cotterell, Isabelle Augenstein

PDF Project Project

Measuring Gender Bias in West Slavic Language Models

Pre-trained language models have been known to perpetuate biases from the underlying datasets to downstream tasks. However, these …

Sandra Martinková, Karolina Stańczak, Isabelle Augenstein

PDF Project Project

Probing Pre-Trained Language Models for Cross-Cultural Differences in Values

Language embeds information about social, cultural, and political values people hold. Prior work has explored social and potentially …

Arnav Arora, Lucie-Aimée Kaffee, Isabelle Augenstein

PDF Project Project

A Latent-Variable Model for Intrinsic Probing

The success of pre-trained contextualized representations has prompted researchers to analyze them for the presence of linguistic …

Karolina Stańczak, Lucas Torroba Hennigen, Adina Williams, Ryan Cotterell, Isabelle Augenstein

PDF Project Project

Same Neurons, Different Languages: Probing Morphosyntax in Multilingual Pre-trained Models

The success of multilingual pre-trained models is underpinned by their ability to learn representations shared by multiple languages …

Karolina Stańczak, Edoardo Ponti, Lucas Torroba Hennigen, Ryan Cotterell, Isabelle Augenstein

PDF Project Project

Multi-Sense Language Modelling

The effectiveness of a language model is influenced by its token representations, which must encode contextual information and handle …

Andrea Lekkas, Peter Schneider-Kamp, Isabelle Augenstein

PDF Project

Multi3Generation: Multi-task, Multilingual, Multi-Modal Language Generation

This paper presents the Multitask, Multilingual, Multimodal Language Generation COST Action – Multi3Generation (CA18231), an …

Anabela Barreiro, José G. C. de Souza, Albert Gatt, Mehul Bhatt, Elena Lloret, Aykut Erdem, Dimitra Gkatzia, Helena Moniz, Irene Russo, Fabio Kepler, Iacer Calixto, Marcin Paprzycki, François Portet, Isabelle Augenstein, Mirela Alhasani

PDF Project Project

Few-Shot Cross-Lingual Stance Detection with Sentiment-Based Pre-Training

The goal of stance detection is to determine the viewpoint expressed in a piece of text towards a target. These viewpoints or contexts …

Momchil Hardalov, Arnav Arora, Preslav Nakov, Isabelle Augenstein

PDF Project Project Project

Inducing Language-Agnostic Multilingual Representations

Cross-lingual representations have the potential to make NLP techniques available to the vast majority of languages in the world. …

Wei Zhao, Steffen Eger, Johannes Bjerva, Isabelle Augenstein

PDF Code Project Project

Does Typological Blinding Impede Cross-Lingual Sharing?

Bridging the performance gap between high- and low-resource languages has been the focus of much previous work. Typological features …

Johannes Bjerva, Isabelle Augenstein

PDF Project Project

Zero-Shot Cross-Lingual Transfer with Meta Learning

Learning what to share between tasks has been a topic of high importance recently, as strategic sharing of knowledge has been shown to …

Farhad Nooralahzadeh, Giannis Bekoulis, Johannes Bjerva, Isabelle Augenstein

PDF Code Project Project Project Project

SIGTYP 2020 Shared Task: Prediction of Typological Features

Typological knowledge bases (KBs) such as WALS contain information about linguistic properties of the world’s languages. They …

Johannes Bjerva, Elizabeth Salesky, Sabrina J. Mielke, Aditi Chaudhary, Giuseppe G. A. Celano, Edoardo M. Ponti, Ekaterina Vylomova, Ryan Cotterell, Isabelle Augenstein

PDF Code Dataset Project Project

2kenize: Tying Subword Sequences for Chinese Script Conversion

We propose a novel Chinese character conversion model that can disambiguate between mappings and convert between the two scripts. The …

Pranav A, Isabelle Augenstein

PDF Dataset Project Video

X-WikiRE: A Large, Multilingual Resource for Relation Extraction as Machine Comprehension

Although the vast majority of knowledge bases KBs are heavily biased towards English, Wikipedias do cover very different topics in …

Mostafa Abdou, Cezar Sas, Rahul Aralikatte, Isabelle Augenstein, Anders Søgaard

PDF Project Project Project Project

Uncovering Probabilistic Implications in Typological Knowledge Bases

The study of linguistic typology is rooted in the implications we find between linguistic features, such as the fact that languages …

Johannes Bjerva, Yova Kementchedjhieva, Ryan Cotterell, Isabelle Augenstein

PDF Project Project

A Probabilistic Generative Model of Linguistic Typology

In the Principles and Parameters framework, the structural features of languages depend on parameters that may be toggled on or off, …

Johannes Bjerva, Yova Kementchedjhieva, Ryan Cotterell, Isabelle Augenstein

PDF Project Project Slides Video

What do Language Representations Really Represent?

A neural language model trained on a text corpus can be used to induce distributed representations of words, such that similar words …

Johannes Bjerva, Robert Östling, Maria Han Veiga, Jörg Tiedemann, Isabelle Augenstein

PDF Project

Parameter sharing between dependency parsers for related languages

Previous work has suggested that parameter sharing between transition-based neural dependency parsers for related languages can lead to …

Miryam de Lhoneux, Johannes Bjerva, Isabelle Augenstein, Anders Søgaard

PDF Project Project Poster

Copenhagen at CoNLL--SIGMORPHON 2018: Multilingual Inflection in Context with Explicit Morphosyntactic Decoding

This paper documents the Team Copenhagen system which placed first in the CoNLL–SIGMORPHON 2018 shared task on universal …

Yova Kementchedjhieva, Johannes Bjerva, Isabelle Augenstein

PDF Project Project

From Phonology to Syntax: Unsupervised Linguistic Typology at Different Levels with Language Embeddings

A core part of linguistic typology is the classification of languages according to linguistic properties, such as those detailed in the …

Johannes Bjerva, Isabelle Augenstein

PDF Code Project

Tracking Typological Traits of Uralic Languages in Distributed Language Representations

Although linguistic typology has a long history, computational approaches have only recently gained popularity. The use of distributed …

Johannes Bjerva, Isabelle Augenstein

PDF Project Slides