Biomedical relation extraction with pre-trained language representations and minimal task-specific architecture

Home
About us
Publications
Biomedical relation extraction with pre-trained language representations and minimal task-specific architecture

26 Sep 2019

EMNLP 2019

Share this page

Authors: Ashok Thillaisundaram, Theodosia Togia

Abstract

This paper presents our participation in the AGAC Track from the 2019 BioNLP Open Shared Tasks. We provide a solution for Task 3, which aims to extract "gene - function change - disease" triples, where "gene" and "disease" are mentions of particular genes and diseases respectively and "function change" is one of four pre-defined relationship types. Our system extends BERT (Devlin et al., 2018), a state-of-the-art language model, which learns contextual language representations from a large unlabelled corpus and whose parameters can be fine-tuned to solve specific tasks with minimal additional architecture. We encode the pair of mentions and their textual context as two consecutive sequences in BERT, separated by a special symbol. We then use a single linear layer to classify their relationship into five classes (four pre-defined, as well as 'no relation'). Despite considerable class imbalance, our system significantly outperforms a random baseline while relying on an extremely simple setup with no specially engineered features.

Publication

Blog

Share this page

Back to publications

Latest publications

09 Oct 2023

FRONTIERS IN GENETICS

Learning the kernel for rare variant genetic association test

24 Aug 2023

ELSEVIER

Associating biological context with protein-protein interactions through text mining at PubMed scale

07 Dec 2022

NeurIPS 2022

sEHR-CE: Language modelling of structured EHR data for efficient and generalizable patient cohort expansion

All publications