07 Dec 2022

EMNLP 2022

Authors: Maciej Wiatrak, Eirini Arvaniti, Angus Brayne, Jonas Vetterle, Aaron Sim

Abstract

A recent advancement in the domain of biomedical Entity Linking is the development of powerful two-stage algorithms, an initial candidate retrieval stage that generates a shortlist of entities for each mention, followed by a candidate ranking stage. However, the effectiveness of both stages are inextricably dependent on computationally expensive components. Specifically, in candidate retrieval via dense representation retrieval it is important to have hard negative samples, which require repeated forward passes and nearest neighbour searches across the entire entity label set throughout training. In this work, we show that pairing a proxy-based metric learning loss with an adversarial regularizer provides an efficient alternative to hard negative sampling in the candidate retrieval stage. In particular, we show competitive performance on the recall@1 metric, thereby providing the option to leave out the expensive candidate ranking step. Finally, we demonstrate how the model can be used in a zero-shot setting to discover out of knowledge base biomedical entities.


Back to publications

Latest publications

09 Oct 2023
FRONTIERS IN GENETICS
Learning the kernel for rare variant genetic association test
Read more
24 Aug 2023
ELSEVIER
Associating biological context with protein-protein interactions through text mining at PubMed scale
Read more
07 Dec 2022
NeurIPS 2022
sEHR-CE: Language modelling of structured EHR data for efficient and generalizable patient cohort expansion
Read more