23 Jul 2022

ICML 2022

Authors: Adam Foster, Arpi Vezer, Craig A. Glastonbury, Paidi Creed, Sam Abujudeh, Aaron Sim

Abstract

Learning meaningful representations of data that can address challenges such as batch effect correction and counterfactual inference is a central problem in many domains including computational biology. Adopting a Conditional VAE framework, we show that marginal independence between the representation and a condition variable plays a key role in both of these challenges. We propose the Contrastive Mixture of Posteriors (CoMP) method that uses a novel misalignment penalty defined in terms of mixtures of the variational posteriors to enforce this independence in latent space. We show that CoMP has attractive theoretical properties compared to previous approaches, and we prove counterfactual identifiability of CoMP under additional assumptions. We demonstrate state-of-the-art performance on a set of challenging tasks including aligning human tumour samples with cancer cell-lines, predicting transcriptome-level perturbation responses, and batch correction on single-cell RNA sequencing data. We also find parallels to fair representation learning and demonstrate that CoMP is competitive on a common task in the field.


Back to publications

Latest publications

07 Dec 2022
NeurIPS 2022
sEHR-CE: Language modelling of structured EHR data for efficient and generalizable patient cohort expansion
Read more
07 Dec 2022
EMNLP 2022
Proxy-based Zero-Shot Entity Linking by Effective Candidate Retrieval
Read more
03 Nov 2022
AKBC 2022
Pseudo-Riemannian Embedding Models for Multi-Relational Graph Representations
Read more