Seminar Series: Dr. Osvaldo Espin-Garcia
Handling Uncertain Auxiliary Covariates in Two-Phase Study Design and Analysis
Osvaldo Espin-Garcia
Assistant Professor
Department of Epidemiology and Biostatistics
Western University
Assistant Professor
Dalla Lana School of Public Health,
Department of Statistical Sciences
University of Toronto
Principal Biostatistician
Department of Biostatistics
University Health Network
Short Biography:
Dr. Osvaldo Espin-Garcia is Assistant Professor at the Department of Epidemiology and Biostatistics, University of Western Ontario (UWO). He holds a PhD in Biostatistics from the Dalla Lana School of Public Health, University of Toronto ( UofT ) an MMath in Statistics-Biostatistics from the University of Waterloo and a BSc in Actuarial Sciences from the National Autonomous University of Mexico (UNAM). Prior to joining UWO, Dr. Espin -Garcia worked as Principal Biostatistician at the University Health Network. His research program focuses on developing statistical, machine learning and computational methods for statistical genetics, genetic epidemiology, and deep phenotyping of complex traits such as cancer, Crohn's disease, or osteoarthritis. Lastly, Dr. Espin -Garcia currently serves as a scientific member of the Ontario Cancer Research Ethics Board (OCREB) and is part of three training initiatives: first, the CANSSI Ontario Strategic Training for Advanced Genetic Epidemiology (STAGE); second, the Health Data Working Group at UofT ; and third, the Collaborative Specialization in Machine Learning in Health and Biomedical Sciences at UWO.
Abstract:
The two-phase study is cost-effective way to collect and analyze expensive predictors data by accruing data in two phases. In the first phase, observed outcomes and/or inexpensive (auxiliary) covariates for all subjects are used to identify a subset of informative subjects for expensive predictor measurement. In the second phase, all available data are analyzed by leveraging missing data methods.
Existing literature implicitly assume that auxiliary covariates are certain, i.e., are known and well-characterized (e.g., assume a single auxiliary covariate relates linearly to the outcome and/or the expensive predictor), while work on uncertain auxiliary covariates has received little to no attention. Here, I present two approaches that challenge this assumption.
The first one, motivated by post-genome-wide association studies (post-GWAS) and polygenic risk score (PRS) construction, consists of integrating multiple PRS methods for two-phase re-sequencing study design. The proposal solves a convex combination problem aiming to identify the PRS combination that minimizes the mean squared error. In non-edge cases, the resulting combination has the same residuals as a linear regression model with all PRS as covariates, i.e., a residual dependent sampling (RDS). The main advantage of the convex optimization approach is that the resulting PRS combination can be stratified to serve as a sole auxiliary covariate in maximum likelihood methods, whereas stratification in the model with all PRS as covariates remains unclear. The optimization method is evaluated against alternative RDS designs with single or both PRS methods via simulations and real data.
The second one, motivated by the potential of leveraging high-dimensional auxiliary covariates (HACs) in multi- omic studies, evaluates dimension reduction techniques in phase 1 HACs to identify informative subsamples for phase two analysis using RDS. We focus on four techniques: Principal Component Analysis, Uniform Manifold Approximation and Projection, Adaptive Lasso, and Meta-Visualization, to transform HACs into a univariable predictor with essential signals preserved. Altogether, this work aims to shed light on identifying appropriate dimension reduction approaches for handling HAC in two-phase studies. Comprehensive simulations and real data application are presented.
Area of Research:
Biostatistics, Statistical genetics and genomics, Cost-efficient Study design, Longitudinal data, Latent variable modelling, Mixture modelling, Missing data
Date: Friday, September 12
Time: 1:30 pm - 2:30 pm
Location: PHFM 3015 (Western Centre for Public Health and Family Medicine) or Zoom (request link by email epibio@uwo.ca)