Knowledge Base Population

Information extraction is concerned with extracting information about entities, phrases and relations between them from text to populate knowledge bases, such as extracting “employee-at” relations. Within this context, we have worked on automatic knowledge base completion, knowledge base cleansing and detecting scientific keyphrases in text, as well as automatic completion of typological knowledge bases.

We are currently involved in one longer-term project related to this, namely a research project funded by the Swedish Research Council coordinated by Robert Östling. Its goals are to study structured multilinguality, i.e. the idea of using language representations and typological knowledge bases to guide which information to share between specific languages.

Publications

Learning what to share between tasks has been a topic of high importance recently, as strategic sharing of knowledge has been shown to …

Although the vast majority of knowledge bases KBs are heavily biased towards English, Wikipedias do cover very different topics in …

The study of linguistic typology is rooted in the implications we find between linguistic features, such as the fact that languages …

In the Principles and Parameters framework, the structural features of languages depend on parameters that may be toggled on or off, …

Automatic summarisation is a popular approach to reduce a document to its main arguments. Recent research in the area has focused on …

Named Entity Recognition (NER) is a key NLP task, which is all the more challenging on Web and user-generated content with their …

Keyphrase boundary classification (KBC) is the task of detecting keyphrases in scientific articles and labelling them with respect to …

We describe the SemEval task of extracting keyphrases and relations between them from scientific documents, which is crucial for …

We propose a novel similarity measure able to cope with unbalanced population of schema elements, an unsupervised technique to …