09 Oct 2023


Isak Falk, Millie Zhao, Juba Nait Saada, Qi Guo

Compared to Genome-Wide Association Studies (GWAS) for common variants, single-marker association analysis for rare variants is underpowered. Set-based association analyses for rare variants are powerful tools that capture some of the missing heritability in trait association studies.We extend the convex-optimized SKAT (cSKAT) test set procedure which learns from data the optimal convex combination of kernels, to the full Generalised Linear Model (GLM) setting with arbitrary non-genetic covariates. We call this extended cSKAT (ecSKAT) and show that the resulting optimization problem is a quadratic programming problem which can be solved at no additional cost compared to cSKAT. ecSKAT enables correcting for important confounders in association studies such as age, sex or population structure for both quantitative and binary traits.We show that a modified objective upper bounds the p-value through a decreasing exponential term in the objective, indicating that optimizing this objective is a principled way of learning the combination of kernels. We evaluate the performance of the proposed method on continuous and binary traits using simulation studies and illustrate its application using UK Biobank Whole Exome Sequencing (WES) data on hand grip strength and systemic lupus erythematosus rare variant association analysis.

Back to publications

Latest publications

24 Aug 2023
Associating biological context with protein-protein interactions through text mining at PubMed scale
Read more
07 Dec 2022
NeurIPS 2022
sEHR-CE: Language modelling of structured EHR data for efficient and generalizable patient cohort expansion
Read more
07 Dec 2022
EMNLP 2022
Proxy-based Zero-Shot Entity Linking by Effective Candidate Retrieval
Read more