26 Oct 2020


Authors: Saee Paliwal, Alex de Giorgio, Daniel Neil, Jean-Baptiste Michel, Alix M.B Lacoste


Incorrect drug target identification is a major obstacle in drug discovery. Only 15% of drugs advance from Phase II to approval, with ineffective targets accounting for over 50% of these failures. Advances in data fusion and computational modeling have independently progressed towards addressing this issue. Here, we capitalize on both these approaches with Rosalind, a comprehensive gene prioritization method that combines heterogeneous knowledge graph construction with relational inference via tensor factorization to accurately predict disease‑gene links. Rosalind demonstrates an increase in performance of 18%‑50% over five comparable state‑of‑the‑art algorithms. On historical data, Rosalind prospectively identifies 1 in 4 therapeutic relationships eventually proven true. Beyond efficacy, Rosalind is able to accurately predict clinical trial successes (75% recall at rank 200) and distinguish likely failures (74% recall at rank 200). Lastly, Rosalind predictions were experimentally tested in a patient‑derived in-vitro assay for Rheumatoid arthritis (RA), which yielded 5 promising genes, one of which is unexplored in RA.

Back to publications

Latest publications

09 Oct 2023
Learning the kernel for rare variant genetic association test
Read more
24 Aug 2023
Associating biological context with protein-protein interactions through text mining at PubMed scale
Read more
07 Dec 2022
NeurIPS 2022
sEHR-CE: Language modelling of structured EHR data for efficient and generalizable patient cohort expansion
Read more