Research: Biomedical relation extraction (BERT)

In natural language processing, words must be given numerical representations before they can be passed as input to a machine learning model.

A word is usually represented as a vector (a list of numbers). These word vectors must adequately capture the meaning of the words; semantically related words should have similar numbers. The better these representations, the stronger the performance of the machine learning model is likely to be.

Often, many methods of learning word vectors give a word the same vector regardless of the context that it appears in. However, it is not unusual for a word to have several different meanings. A classic example of this is the word “bank” in the following two sentences:

“I walked along the river bank”,
“I deposited some money into my bank account”.

If the same vector is used to represent “bank” in both instances, this impairs the performance of the downstream machine learning model which takes these word vectors as input.

Over the last few years, a lot of research has been done on learning contextual word vectors; these are word vectors which vary depending on the context in which the word appears. This enables the same word to have different vectors depending on how it is used in a sentence. A recent paper (Devlin et al, 2018) introduced BERT (Bidirectional Encoder Representations from Transformers), a new way of learning contextual word vectors. BERT utilises a powerful encoder architecture which is capable of modelling longer range dependencies between words. It also proposed an innovative way of capturing the context both before and after the word in the sentence. With these better contextual vectors, BERT achieved state of the art performance on several tasks within natural language processing.

In a recent paper, we proposed a new relation extraction model built on top of BERT. Given any paragraph of text (for example, the abstract of a biomedical journal article), our model will extract all gene-disease pairs which exhibit a pre-specified relation. In our paper, the relations we were interested in concerned the function change experienced by a gene mutation which affects the disease progression. The word vectors supplied by BERT provide our model with a way of encoding the meaning expressed in the text in regard to our entities of interest. We then further fine-tune this encoding so that it can more accurately identify when a paragraph of text contains a gene-disease relation of interest. Such relation extraction models are crucial in drug discovery; there are too many journal articles published every day for a human to read and summarise. A machine learning model capable of automatically extracting relevant gene-disease pairs can greatly accelerate this process.

More Posts

You Might Also Like

BenevolentAI at ICML 2020
We are pleased to announce that BenevolentAI is sponsoring the virtual ICML 2020 conference from the 12th - 18th of July. You can find our team on the EXPO workshop and Women in ML stage - we hope to meet you there.
Jul 12, 2020
Clinical data validates BenevolentAI's AI predicted hypothesis for baricitinib as a potential treatment for COVID-19
Research published in EMBO Molecular Medicine confirms AI predictions for anti-viral and anti-cytokine signalling effects of baricitinib in critically hospitalised COVID-19 patients
Jul 1, 2020
COVID-19 and AI: An editorial review in EMBO from Michael B. Schultz, Daniel Vera, David A. Sinclair
David Sinclair and colleagues review our recent publication in the EMBO Molecular Medicine Journal in support of our AI-derived hypothesis for a potential treatment of COVID-19.
Jul 1, 2020
New tech brings new hope in our new normal | A Keynote from Joanna Shields at WeAreTechWomen
In this talk, Joanna Shields will demonstrate how human intelligence partnered with technology promises to accelerate new discoveries, new treatments, and new hope for patients.
Jun 26, 2020
Transforming Biopharmaceutical Research with AI at The HESI Annual Meeting 2020.
Our Executive board director, Jackie Hunter, will take to the virtual stage to demonstrate how AI can be used right at the beginning of the R&D process to improve target identification and validation as well as being used to identify key pathways in disease
Jun 9, 2020
How do we get the next 10 years right? | CogX Opening Keynote w/ Joanna Shields
“We need leaders with empathy who care about their fellow citizens and are prepared to work to end injustice and create opportunities for all". Our CEO Joanna Shields opens CogX 2020 sharing inspiration for the next decade.
Jun 8, 2020