Learning with Limited Labelled Data

Learning with limited labelled data is useful for small domains or languages with little resources. Methods we research to mitigate problems arising in these contexts include multi-task learning, weakly supervised and zero-shot learning.

This is a cross-cutting theme in most of our research. Two funded projects specifically addressing this are Multi3Generation and Andreas Nugaard Holm’s industrial PhD project with BASE Life Science, supported by Innovation Fund Denmark.

Multi3Generation is a COST Action that funds collaboration of researchers in Europe and abroad. The project is coordinated by Isabelle Augenstein, and its goals are to study language generation using multi-task, multilingual and multi-modal signals.

Andreas Nugaard Holm’s industrial PhD project focuses on transfer learning and domain adaptation for scientific text.

Publications

While state-of-the-art NLP explainability (XAI) methods focus on supervised, per-instance end or diagnostic probing task evaluation[4, …

A critical component of automatically combating misinformation is the detection of fact check-worthiness, i.e. determining if a piece …

Learning what to share between tasks has been a topic of high importance recently, as strategic sharing of knowledge has been shown to …

In this paper, we extend the task of semantic textual similarity to include sentences which contain emojis. Emojis are ubiquitous on …

Emotion lexica are commonly used resources to combat data poverty in automatic emotion detection. However, methodological issues emerge …

Language evolves over time in many ways relevant to natural language processing tasks. For example, recent occurrences of tokens …

Task oriented dialogue systems rely heavily on specialized dialogue state tracking (DST) modules for dynamically predicting user intent …

Although the vast majority of knowledge bases KBs are heavily biased towards English, Wikipedias do cover very different topics in …

Multi-task learning and self-training are two common ways to improve a machine learning model’s performance in settings with …

Studying to what degree the language we use is gender-specific has long been an area of interest in socio-linguistics. Studies have …

When assigning quantitative labels to a dataset, different methodologies may rely on different scales. In particular, when assigning …

In online discussion fora, speakers often make arguments for or against something, say birth control, by highlighting certain aspects …

Multi-task learning (MTL) allows deep neural networks to learn from related tasks by sharing parameters with other networks. In …

Previous work has suggested that parameter sharing between transition-based neural dependency parsers for related languages can lead to …

This paper documents the Team Copenhagen system which placed first in the CoNLL–SIGMORPHON 2018 shared task on universal …

Neural part-of-speech (POS) taggers are known to not perform well with little training data. As a step towards overcoming this problem, …

We combine multi-task learning and semisupervised learning by inducing a joint embedding space between disparate label spaces and …

We take a multi-task learning approach to the shared Task 1 at SemEval-2018. The general idea concerning the model structure is to use …

Keyphrase boundary classification (KBC) is the task of detecting keyphrases in scientific articles and labelling them with respect to …