YSPH Biostatistics: Colin White Memorial Lecture: "Scalable Analysis of Large Biobanks and Whole Genome Sequencing Studies”

Speaker: Dr. Xihong Lin, Professor, Departments of Biostatistics and Department of Statistics, Harvard University and Broad Institute

Title: “Scalable Analysis of Large Biobanks and Whole Genome Sequencing Studies”

Abstract: Big data from genome, exposome, and phenome are becoming available at a rapidly increasing rate. Examples include Whole Genome Sequencing data, smartphone data, wearable devices, and Electronic Health Records (EHRs). A rapidly increasing number of large scale national and institutional biobanks have emerged worldwide. Biobanks integrate genotype, electronic health records, and lifestyle data, and is the trend of health science research. In this talk, I will discuss several analytic issues in analysis of large scale biobanks and population-based Whole Genome Sequencing (WGS) studies of common and rare genetic variants and EHRs. I will discuss scalable mixed model analysis using sparse genetic related matrix to account for relatedness and population structure; estimation of the number of ancestry principal components using Bulk Eigenvalue Matching Analysis (BEMA); and geometric differences in PCA of multiple phenotypes and PCA of multiple genotypes. The discussions are illustrated using ongoing large scale whole genome sequencing studies of the Genome Sequencing Program of the National Human Genome Research Institute and the Trans-Omics Precision Medicine Program from the National Heart, Lung and Blood Institute, and the UK Biobank and FinnGen.


