Featured Publications
SDPRX: A statistical method for cross-population prediction of complex traits
Zhou G, Chen T, Zhao H. SDPRX: A statistical method for cross-population prediction of complex traits. American Journal Of Human Genetics 2022, 110: 13-22. PMID: 36460009, PMCID: PMC9892700, DOI: 10.1016/j.ajhg.2022.11.007.Peer-Reviewed Original ResearchConceptsStatistical methodsJoint distributionWide association study (GWAS) summary statisticsNon-European populationsReal traitsSummary statisticsCross-population predictionPrediction accuracyGenome-wide association study summary statisticsLinkage disequilibrium differencesPrediction performancePolygenic risk scoresComplex traitsStatisticsSimulationsApplicationsTraitsCharacterizing Spatiotemporal Transcriptome of the Human Brain Via Low-Rank Tensor Decomposition
Liu T, Yuan M, Zhao H. Characterizing Spatiotemporal Transcriptome of the Human Brain Via Low-Rank Tensor Decomposition. Statistics In Biosciences 2022, 14: 485-513. DOI: 10.1007/s12561-021-09331-5.Peer-Reviewed Original ResearchLow-rank tensor decompositionTensor decompositionPower iterationClassical principal component analysisStatistical performanceNumerical experimentsTensor unfoldingStatistical methodsGene expression dataEfficient algorithmData matrixExpression dataTensor principal componentsBrain expression dataPrincipal component analysisIterationDecompositionSpatiotemporal transcriptomeImplicit assumptionAlgorithmDynamicsTrajectoriesGuaranteesAssumptionSpatial patternsStatistical Methods for Analyzing Tree-Structured Microbiome Data
Wang T, Zhao H. Statistical Methods for Analyzing Tree-Structured Microbiome Data. Frontiers In Probability And The Statistical Sciences 2021, 193-220. DOI: 10.1007/978-3-030-73351-3_8.Peer-Reviewed Original ResearchStatistical methodsOnly relative informationMicrobiome data analysisMicrobiome dataEmpirical Bayes estimationCompositional predictorsBayes estimationComputational challengesRelative informationDimension reductionAbundance matrixTaxa countsMultinomial modelMicrobiome datasetsPhylogenetic informationMicrobial taxaPhylogenetic treeSequencing technologiesOriginal ecosystemMicrobial compositionOrders of magnitudeMatrixExperimental methodsLibrary sizeZeros
2024
Statistical methods for assessing the effects of de novo variants on birth defects
Xie Y, Wu R, Li H, Dong W, Zhou G, Zhao H. Statistical methods for assessing the effects of de novo variants on birth defects. Human Genomics 2024, 18: 25. PMID: 38486307, PMCID: PMC10938830, DOI: 10.1186/s40246-024-00590-z.Peer-Reviewed Original ResearchConceptsDe novo variantsAnalyzed de novo variantsDevelopment of next-generation sequencing technologiesNext-generation sequencing technologiesSequencing technologiesImprove statistical powerGenetic heterogeneitySequenced samplesStatistical powerBirth defectsDiseased individualsLow occurrenceCongenital heart diseaseVariantsGenesDeleterious effectsSequenceGeneral workflowStatistical methods
2020
A Set of Efficient Methods to Generate High-Dimensional Binary Data With Specified Correlation Structures
Jiang W, Song S, Hou L, Zhao H. A Set of Efficient Methods to Generate High-Dimensional Binary Data With Specified Correlation Structures. The American Statistician 2020, 75: 310-322. DOI: 10.1080/00031305.2020.1816213.Peer-Reviewed Original ResearchHigh-dimensional binary dataCommon correlation structuresCorrelation structureTime complexityUnequal probabilityStatistical methodsGeneral correlation matricesCorrelated binary dataBinary dataQuadratic time complexityMonte Carlo methodCorrelation matrixData simulationLinear time complexityIncrease of dimensionCarlo methodData generationEfficient algorithmTime costValidity conditionsComplexity methodBinary variablesSimulation methodR packageAlgorithmStatistical Methods in Genome-Wide Association Studies
Sun N, Zhao H. Statistical Methods in Genome-Wide Association Studies. Annual Review Of Biomedical Data Science 2020, 3: 1-24. DOI: 10.1146/annurev-biodatasci-030320-041026.Peer-Reviewed Original ResearchGenome-wide association studiesAssociation studiesTraits of interestGenetic architectureIdentification of variantsGWAS dataStatistical methodologyStatistical challengesGenetic risk prediction modelsGenetic markersStatistical methodsHuman diseasesPhenotype informationGenetic variantsTraitsGenotype informationScientific goalsRecent progressGenesVariantsTens of thousandsHundreds of thousandsPrediction modelPathwayThousands
2019
Sparse principal component analysis with missing observations
Park S, Zhao H. Sparse principal component analysis with missing observations. The Annals Of Applied Statistics 2019, 13: 1016-1042. DOI: 10.1214/18-aoas1220.Peer-Reviewed Original ResearchHigh-dimensional settingsPrincipal subspaceStep estimation procedureRate of convergenceSparse principal component analysisDimensional settingSimulated examplesMissing observationsStatistical methodsEstimation procedureSparse PCA methodsSingle-cell dataSubspacePCA methodSingle-cell RNA-sequencing dataNumber of featuresCompetitive performancePrincipal component analysisConvergenceSample sizeEstimationWide rangeComponent analysis
2015
Introduction to statistical methods in genome-wide association studies
Yang C, Li C, Chung D, Chen M, Gelernter J, Zhao H. Introduction to statistical methods in genome-wide association studies. 2015, 26-52. DOI: 10.1017/cbo9781107337459.005.Peer-Reviewed Original ResearchGenome-wide association studiesAssociation studiesPopulation genomic studiesQuantitative trait lociComplex diseasesTrait lociGenomic studiesSingle nucleotide polymorphismsGenetic basisNumber variantsNucleotide polymorphismsPharmaceutical potentialUnique resourceGenomicsLociBroad impactHaplotypesStatistical methodsPolymorphismVariantsPharmacogenomics
2014
Statistical Methods for the Analysis of Next Generation Sequencing Data from Paired Tumor-Normal Samples
Chen M, Hou L, Zhao H. Statistical Methods for the Analysis of Next Generation Sequencing Data from Paired Tumor-Normal Samples. Frontiers In Probability And The Statistical Sciences 2014, 379-404. DOI: 10.1007/978-3-319-07212-8_19.Peer-Reviewed Original ResearchStatistical methodsSingle nucleotide alterationsSequencing dataNext-generation sequencing technologiesGeneration sequencing technologyNext-generation sequencing dataGeneration sequencing dataGenomic eraNormal sequencing dataTumor-normal samplesCancer genomesSequencing technologiesSomatic variationNucleotide alterationsNumber alterationsUnprecedented resolutionDNA levelsGenomeAlterations
2013
Application of Bayesian Sparse Factor Analysis Models in Bioinformatics
Ma H, Zhao H. Application of Bayesian Sparse Factor Analysis Models in Bioinformatics. 2013, 350-365. DOI: 10.1017/cbo9781139226448.018.Peer-Reviewed Original ResearchFactor analysis modelClassical factor analysis modelLatent variable modelStatistical methodsInferential methodsVariable modelComputational biologyLarge data setsGeometrical procedureObserved variablesCorrelated variablesAnalysis modelGeneral approachLatent variablesFactor modelingLatent factorsStrong prior beliefsUnderlying structureData setsPrincipal component analysisModelVariablesRegulatory networksLarge numberPrior beliefs
2006
Statistical Methods in Proteomics
Yu W, Wu B, Huang T, Li X, Williams K, Zhao H. Statistical Methods in Proteomics. Springer Handbooks 2006, 623-638. DOI: 10.1007/978-1-84628-288-1_34.Peer-Reviewed Original Research
2001
Multipoint Genetic Mapping with Trisomy Data
Li J, Sherman S, Lamb N, Zhao H. Multipoint Genetic Mapping with Trisomy Data. American Journal Of Human Genetics 2001, 69: 1255-1265. PMID: 11704925, PMCID: PMC1235537, DOI: 10.1086/324578.Peer-Reviewed Original ResearchConceptsExpectation-maximization algorithmMultipoint genetic mappingAmount of computationProbability distributionTrisomy dataStatistical methodsFirst approachMarkov modelSecond approachProbabilityCrossover processComputationLarge numberSetModelApproachGeneral relationshipDistributionAlgorithmNumber of markersTest of Association for Quantitative Traits in General Pedigrees: The Quantitative Pedigree Disequilibrium Test
Zhang S, Zhang K, Li J, Sun F, Zhao H. Test of Association for Quantitative Traits in General Pedigrees: The Quantitative Pedigree Disequilibrium Test. Genetic Epidemiology 2001, 21: s370-s375. PMID: 11793701, DOI: 10.1002/gepi.2001.21.s1.s370.Peer-Reviewed Original ResearchConceptsQuantitative pedigree disequilibrium testPedigree disequilibrium testQuantitative traitsTraits of interestGenetic Analysis Workshop 12Disequilibrium testGeneral pedigreesSequence dataCandidate genesGenetic markersGenetic linkageQualitative traitsLinkage disequilibriumTraitsLarge pedigreePresence of linkagePedigreeStatistical methodsFamilyNuclear familiesTests of associationGenesUnrelated nuclear familiesLinkageDisequilibrium
2000
Transmission/Disequilibrium Tests Using Multiple Tightly Linked Markers
Zhao H, Zhang S, Merikangas K, Trixler M, Wildenauer D, Sun F, Kidd K. Transmission/Disequilibrium Tests Using Multiple Tightly Linked Markers. American Journal Of Human Genetics 2000, 67: 936-946. PMID: 10968775, PMCID: PMC1287895, DOI: 10.1086/303073.Peer-Reviewed Original ResearchMultipoint Genetic Mapping with Uniparental Disomy Data
Zhao H, Li J, Robinson W. Multipoint Genetic Mapping with Uniparental Disomy Data. American Journal Of Human Genetics 2000, 67: 851-861. PMID: 10958760, PMCID: PMC1287890, DOI: 10.1086/303072.Peer-Reviewed Original Research