Featured Publications
Set‐based tests for genetic association in longitudinal studies
He Z, Zhang M, Lee S, Smith J, Guo X, Palmas W, Kardia S, Diez Roux A, Mukherjee B. Set‐based tests for genetic association in longitudinal studies. Biometrics 2015, 71: 606-615. PMID: 25854837, PMCID: PMC4601568, DOI: 10.1111/biom.12310.Peer-Reviewed Original ResearchConceptsMulti-Ethnic Study of AtherosclerosisGenome-wide association studiesJoint effect of multiple variantsLinkage disequilibriumAssociation studiesEffects of multiple variantsMarkers of chronic diseaseGenetic variantsSet-based testGene-based testsLongitudinal outcomesMulti-Ethnic StudyGenetic association studiesStudy of AtherosclerosisChronic diseasesPhenotypic variationGenetic associationObservational studyLongitudinal analysisWithin-subject correlationMultiple variantsScore type testsJoint testJoint effectsMarker tests
2024
Improving prediction of linear regression models by integrating external information from heterogeneous populations: James–Stein estimators
Han P, Li H, Park S, Mukherjee B, Taylor J. Improving prediction of linear regression models by integrating external information from heterogeneous populations: James–Stein estimators. Biometrics 2024, 80: ujae072. PMID: 39101548, PMCID: PMC11299067, DOI: 10.1093/biomtc/ujae072.Peer-Reviewed Original ResearchMeSH KeywordsBiometryComputer SimulationData Interpretation, StatisticalHumansLeadLinear ModelsModels, StatisticalPatellaConceptsJames-Stein estimatorLinear regression modelsIndividual-level dataComprehensive simulation studyRegression modelsNumerical performanceSimulation studyShrinkage methodCoefficient estimatesPredictive meanReduced modelStudy population heterogeneityInternal modelEstimationStudy populationBlood lead levelsInternational studiesCovariatesPatella bonePublished literatureLead levelsExternal studiesSummary informationPopulationSubsets
2023
An inverse probability weighted regression method that accounts for right‐censoring for causal inference with multiple treatments and a binary outcome
Yu Y, Zhang M, Mukherjee B. An inverse probability weighted regression method that accounts for right‐censoring for causal inference with multiple treatments and a binary outcome. Statistics In Medicine 2023, 42: 3699-3715. PMID: 37392070, DOI: 10.1002/sim.9826.Peer-Reviewed Original ResearchMeSH KeywordsComputer SimulationHumansMaleModels, StatisticalProbabilityPropensity ScoreProstatic NeoplasmsRegression AnalysisTreatment OutcomeConceptsRight censoringWeighted score functionCausal treatment effectsAverage treatment effectAsymptotic propertiesCensored componentPre-specified time windowEstimation consistencyRobustness propertiesSimulation studyBinary outcomesPresence of confoundersCensoringScoring functionInverse probabilityTreatment effectsEstimationSources of biasInferenceLetter CComparative effectiveness researchTreatment switchRegression methodLogistic regression modelsInsurance claims database
2022
Methods for large‐scale single mediator hypothesis testing: Possible choices and comparisons
Du J, Zhou X, Clark‐Boucher D, Hao W, Liu Y, Smith J, Mukherjee B. Methods for large‐scale single mediator hypothesis testing: Possible choices and comparisons. Genetic Epidemiology 2022, 47: 167-184. PMID: 36465006, PMCID: PMC10329872, DOI: 10.1002/gepi.22510.Peer-Reviewed Original ResearchConceptsNull hypothesisTest statisticsMediation hypothesis testingComposite null hypothesisHypothesis testingClasses of methodsFalse positive rateAlternative hypothesisSimulation studyHypothesis testing methodContinuous mediatorReference distributionSobel test statisticsContinuous outcomesExposure-mediator interactionMulti-Ethnic Study of AtherosclerosisDNA methylation sitesClassCRANMethylation sitesIncorporating family disease history and controlling case–control imbalance for population-based genetic association studies
Zhuang Y, Wolford B, Nam K, Bi W, Zhou W, Willer C, Mukherjee B, Lee S. Incorporating family disease history and controlling case–control imbalance for population-based genetic association studies. Bioinformatics 2022, 38: 4337-4343. PMID: 35876838, PMCID: PMC9477535, DOI: 10.1093/bioinformatics/btac459.Peer-Reviewed Original ResearchMeSH KeywordsCase-Control StudiesComputer SimulationGenome-Wide Association StudyPhenotypePolymorphism, Single NucleotideConceptsEmpirical saddlepoint approximationFamily disease historyCase-control imbalanceSaddlepoint approximationGenome-wide association analysisPopulation-based genetic association studiesGenetic association testsVariant-phenotype associationsDisease historyGenetic association studiesLow detection powerType I error inflationCorrelation of phenotypesWhite British sampleSupplementary dataAssociation studiesPopulation-based biobanksIncreased phenotypic correlationsKorean GenomeSimulation studyPhenotype distributionPhenotypeAssociation TestBioinformaticsPhenotypic correlations
2021
A comparison of five epidemiological models for transmission of SARS-CoV-2 in India
Purkayastha S, Bhattacharyya R, Bhaduri R, Kundu R, Gu X, Salvatore M, Ray D, Mishra S, Mukherjee B. A comparison of five epidemiological models for transmission of SARS-CoV-2 in India. BMC Infectious Diseases 2021, 21: 533. PMID: 34098885, PMCID: PMC8181542, DOI: 10.1186/s12879-021-06077-9.Peer-Reviewed Original ResearchMeSH KeywordsBayes TheoremCommunicable Disease ControlComputer SimulationCOVID-19ForecastingHumansIndiaModels, StatisticalPandemicsEfficient mixed model approach for large-scale genome-wide association studies of ordinal categorical phenotypes
Bi W, Zhou W, Dey R, Mukherjee B, Sampson J, Lee S. Efficient mixed model approach for large-scale genome-wide association studies of ordinal categorical phenotypes. American Journal Of Human Genetics 2021, 108: 825-839. PMID: 33836139, PMCID: PMC8206161, DOI: 10.1016/j.ajhg.2021.03.019.Peer-Reviewed Original ResearchMeSH KeywordsBiological Specimen BanksChildComputer SimulationFemaleGenome-Wide Association StudyHumansMaleModels, GeneticPhenotypeResearch DesignUnited KingdomConceptsOrdinal categorical phenotypesGenome-wide association studiesCategorical phenotypesGenome-wide significant variantsRare variantsPhenotype distributionControlled type I error ratesType I error rateMixed model approachArray genotypingAssociation studiesCommon variantsQuantitative traitsSignificant variantsLogistic mixed modelsLack of analysis toolsUK BiobankLinear mixed model approachPhenotypeAssociation TestVariantsMixed modelsSignificance levelMAFTraitsA comparison of parametric propensity score‐based methods for causal inference with multiple treatments and a binary outcome
Yu Y, Zhang M, Shi X, Caram M, Little R, Mukherjee B. A comparison of parametric propensity score‐based methods for causal inference with multiple treatments and a binary outcome. Statistics In Medicine 2021, 40: 1653-1677. PMID: 33462862, DOI: 10.1002/sim.8862.Peer-Reviewed Original ResearchMeSH KeywordsBiasCausalityComparative Effectiveness ResearchComputer SimulationHumansMaleModels, StatisticalPropensity ScoreConceptsComparative effectiveness researchEstimation of causal effectsPropensity score-based methodsBinary outcomesInsurance networksCausal effectsPropensity score methodsPropensity-based methodsConfounding biasContinuous outcomesPharmacy claimsEffectiveness researchObservational studySimulation studyAdverse outcomesPropensity scoreEmergency roomRevisiting the genome-wide significance threshold for common variant GWAS
Chen Z, Boehnke M, Wen X, Mukherjee B. Revisiting the genome-wide significance threshold for common variant GWAS. G3: Genes, Genomes, Genetics 2021, 11: jkaa056. PMID: 33585870, PMCID: PMC8022962, DOI: 10.1093/g3journal/jkaa056.Peer-Reviewed Original ResearchConceptsGenome-wide significance thresholdP-value thresholdGWAS meta-analysesMeta-analysis consortiumExcessive false positive ratesSignificance thresholdGene set enrichmentBenjamini-Yekutieli procedureModest-sized studiesFDR-controlling proceduresGlobal lipidsMeta-analysesPathway analysisGWASReplication studyP-valueIncreased discoveryMultiple testing strategiesSample sizePositive discoveriesBenjamini-HochbergLipid levelsTesting strategiesDownstream workFDR
2020
Interaction analysis under misspecification of main effects: Some common mistakes and simple solutions
Zhang M, Yu Y, Wang S, Salvatore M, Fritsche L, He Z, Mukherjee B. Interaction analysis under misspecification of main effects: Some common mistakes and simple solutions. Statistics In Medicine 2020, 39: 1675-1694. PMID: 32101638, DOI: 10.1002/sim.8505.Peer-Reviewed Original ResearchConceptsType I error rateType I error inflationIndependence assumptionWald and score testsCorrect type I error ratesSandwich variance estimatorSandwich estimatorScore testVariance estimationSimulation studyMisspecificationMichigan Genomics InitiativeStatistical practiceBinary outcomesTested interactionsEmpirical factsFlexible modelData modelTest of interactionBiobank studyInflationAssumptionsContinuous outcomesEpidemiological literatureLinear regression models
2019
Estimating Outcome-Exposure Associations when Exposure Biomarker Detection Limits vary Across Batches.
Boss J, Mukherjee B, Ferguson K, Aker A, Alshawabkeh A, Cordero J, Meeker J, Kim S. Estimating Outcome-Exposure Associations when Exposure Biomarker Detection Limits vary Across Batches. Epidemiology 2019, 30: 746-755. PMID: 31299670, PMCID: PMC6677587, DOI: 10.1097/ede.0000000000001052.Peer-Reviewed Original ResearchConceptsBinary outcome dataLikelihood-based methodsComplete-case analysisDistributional assumptionsAssignment of samplesSuperior estimation propertiesSimulation studyComplete-caseMultiple imputation strategyExposure dataMultiple batchesBatch assignmentEstimated propertiesLimit-variablesSingle imputationMultiple imputationCohort study
2018
Selection of nonlinear interactions by a forward stepwise algorithm: Application to identifying environmental chemical mixtures affecting health outcomes
Narisetty N, Mukherjee B, Chen Y, Gonzalez R, Meeker J. Selection of nonlinear interactions by a forward stepwise algorithm: Application to identifying environmental chemical mixtures affecting health outcomes. Statistics In Medicine 2018, 38: 1582-1600. PMID: 30586682, PMCID: PMC7134269, DOI: 10.1002/sim.8059.Peer-Reviewed Original ResearchMeSH KeywordsAlgorithmsComputer SimulationEnvironmental ExposureEnvironmental PollutantsHazardous SubstancesHumansNonlinear DynamicsSubset-Based Analysis Using Gene-Environment Interactions for Discovery of Genetic Associations across Multiple Studies or Phenotypes
Yu Y, Xia L, Lee S, Zhou X, Stringham H, Boehnke M, Mukherjee B. Subset-Based Analysis Using Gene-Environment Interactions for Discovery of Genetic Associations across Multiple Studies or Phenotypes. Human Heredity 2018, 83: 283-314. PMID: 31132756, PMCID: PMC7034441, DOI: 10.1159/000496867.Peer-Reviewed Original ResearchMeSH KeywordsCase-Control StudiesCholesterolCohort StudiesComputer SimulationC-Reactive ProteinFinlandGene FrequencyGene-Environment InteractionGenetic Predisposition to DiseaseGenome-Wide Association StudyHumansLipoproteins, LDLMeta-Analysis as TopicModels, GeneticPhenotypePolymorphism, Single NucleotideConceptsPresence of G-E interactionsGenetic associationHeterogeneity of genetic effectsDiscovery of genetic associationsGene-environment (G-EMarginal genetic effectsG-E interactionsGenome-wide association studiesGene-environment interactionsGenetic effectsData examplesSimulation studySingle nucleotide polymorphismsGene-environmentAssociation studiesAssociation analysisScreening toolMarginal associationNucleotide polymorphismsPresence of heterogeneityAssociationEnvironmental factorsIncreased powerMultiple studiesG-E
2017
Robust distributed lag models using data adaptive shrinkage
Chen Y, Mukherjee B, Adar S, Berrocal V, Coull B. Robust distributed lag models using data adaptive shrinkage. Biostatistics 2017, 19: 461-478. PMID: 29040386, PMCID: PMC6454578, DOI: 10.1093/biostatistics/kxx041.Peer-Reviewed Original ResearchMeSH KeywordsAir PollutionBayes TheoremBiostatisticsComputer SimulationEnvironmental ExposureEpidemiologyHealth SurveysHumansModels, StatisticalConceptsDistributed lag modelsDistributed LagLag modelTime series dataEffects of air pollutionBias-variance trade-offGeneralized ridge regressionShrinkage methodAir pollution studiesHierarchical Bayes approachShrinkage approachTime seriesDl functionAir pollutionPollution studiesEffect estimatesTrade-offsExtensive simulation studyDependent variableShrinking coefficientsMean square errorLagSimulation studyBayes approachRidge regressionMeta‐analysis of gene‐environment interaction exploiting gene‐environment independence across multiple case‐control studies
Estes J, Rice J, Li S, Stringham H, Boehnke M, Mukherjee B. Meta‐analysis of gene‐environment interaction exploiting gene‐environment independence across multiple case‐control studies. Statistics In Medicine 2017, 36: 3895-3909. PMID: 28744888, PMCID: PMC5624850, DOI: 10.1002/sim.7398.Peer-Reviewed Original ResearchMeSH KeywordsAge FactorsAlpha-Ketoglutarate-Dependent Dioxygenase FTOBayes TheoremBiasBiometryBody Mass IndexCase-Control StudiesComputer SimulationDiabetes Mellitus, Type 2Gene-Environment InteractionHumansLogistic ModelsMeta-Analysis as TopicModels, GeneticModels, StatisticalPolymorphism, Single NucleotideRetrospective StudiesConceptsGene-environment independenceGene-environmentEmpirical Bayes estimatorsGene-environment interactionsCase-control studyMeta-analysis settingBayes estimatorsRetrospective likelihood frameworkShrinkage estimatorsMeta-analysisTesting gene-environment interactionsCombination of estimatesFactors body mass indexSimulation studyBody mass indexUnconstrained modelLikelihood frameworkInverse varianceMeta-analysis frameworkFTO geneMass indexGenetic markersEstimationStandard alternativeChatterjeeRobust Tests for Additive Gene-Environment Interaction in Case-Control Studies Using Gene-Environment Independence
Liu G, Mukherjee B, Lee S, Lee AW, Wu AH, Bandera EV, Jensen A, Rossing MA, Moysich KB, Chang-Claude J, Doherty JA, Gentry-Maharaj A, Kiemeney L, Gayther SA, Modugno F, Massuger L, Goode EL, Fridley BL, Terry KL, Cramer DW, Ramus SJ, Anton-Culver H, Ziogas A, Tyrer JP, Schildkraut JM, Kjaer SK, Webb PM, Ness RB, Menon U, Berchuck A, Pharoah PD, Risch H, Pearce CL, Consortium F. Robust Tests for Additive Gene-Environment Interaction in Case-Control Studies Using Gene-Environment Independence. American Journal Of Epidemiology 2017, 187: 366-377. PMID: 28633381, PMCID: PMC5860584, DOI: 10.1093/aje/kwx243.Peer-Reviewed Original ResearchExposure enriched outcome dependent designs for longitudinal studies of gene–environment interaction
Sun Z, Mukherjee B, Estes J, Vokonas P, Park S. Exposure enriched outcome dependent designs for longitudinal studies of gene–environment interaction. Statistics In Medicine 2017, 36: 2947-2960. PMID: 28497531, PMCID: PMC5523112, DOI: 10.1002/sim.7332.Peer-Reviewed Original ResearchConceptsLongitudinal cohort studyCohort studyCase-only designLongitudinal studyG x E interactionNormative Aging StudyComplete-case analysisGene-environmentSampling designCase-controlVeterans AdministrationComplex human diseasesE interactionExposure informationAging StudyOutcome trajectoriesStratified samplingRetrospective genotypingIndividual exposureCovariate dataExposure effectsJoint effectsOutcomesTime-varying outcomeEnvironmental factors
2016
Tests for Gene-Environment Interactions and Joint Effects With Exposure Misclassification
Boonstra P, Mukherjee B, Gruber S, Ahn J, Schmit S, Chatterjee N. Tests for Gene-Environment Interactions and Joint Effects With Exposure Misclassification. American Journal Of Epidemiology 2016, 183: 237-247. PMID: 26755675, PMCID: PMC4724093, DOI: 10.1093/aje/kwv198.Peer-Reviewed Original ResearchConceptsG-E interactionsPresence of exposure misclassificationExposure misclassificationImpact of exposure misclassificationGene-environment (G-EGene-environment interactionsGenome-wide levelGenome-wide searchGenome-wide testingGenetic susceptibility lociJoint testDisease-gene relationshipsGene-environmentGenetic risk factorsType I error rateFamily-wise type I error rateSusceptibility lociG-EGenetic associationRisk factorsStatistical powerJoint effectsSimulation studyMisclassificationPublished simulation studies
2014
Latent variable models for gene–environment interactions in longitudinal studies with multiple correlated exposures
Tao Y, Sánchez B, Mukherjee B. Latent variable models for gene–environment interactions in longitudinal studies with multiple correlated exposures. Statistics In Medicine 2014, 34: 1227-1241. PMID: 25545894, PMCID: PMC4355187, DOI: 10.1002/sim.6401.Peer-Reviewed Original ResearchMeSH KeywordsBiostatisticsChild, PreschoolComputer SimulationEnvironmental ExposureFemaleGene-Environment InteractionHemochromatosis ProteinHistocompatibility Antigens Class IHumansInfantInfant, NewbornLead PoisoningLongitudinal StudiesMembrane ProteinsMexicoModels, GeneticModels, StatisticalPolymorphism, Single NucleotidePregnancyPrenatal Exposure Delayed EffectsConceptsGene-environment interactionsOutcome measuresCohort studyHealth effects of environmental exposuresEnvironmental exposuresInvestigate health effectsGene-environment associationsEffects of environmental exposuresEarly life exposuresLV frameworkG x E effectsMultivariate exposuresGenotyped single nucleotide polymorphismsEffect modificationShrinkage estimatorsLife exposureExposure measurementsSingle nucleotide polymorphismsData-adaptive wayMultiple testingOutcome dataLongitudinal studyLongitudinal natureGenetic factorsNucleotide polymorphismsTesting departure from additivity in Tukey's model using shrinkage: application to a longitudinal setting
Ko Y, Mukherjee B, Smith J, Park S, Kardia S, Allison M, Vokonas P, Chen J, Diez‐Roux A. Testing departure from additivity in Tukey's model using shrinkage: application to a longitudinal setting. Statistics In Medicine 2014, 33: 5177-5191. PMID: 25112650, PMCID: PMC4227925, DOI: 10.1002/sim.6281.Peer-Reviewed Original ResearchMeSH KeywordsAgedAged, 80 and overAgingAtherosclerosisBone and BonesComputer SimulationEnvironmental ExposureEthnicityFemaleGene-Environment InteractionHumansIronLeadLeast-Squares AnalysisLikelihood FunctionsLongitudinal StudiesMaleMiddle AgedModels, GeneticUnited StatesUnited States Department of Veterans AffairsConceptsGene-environment interactionsMulti-Ethnic Study of AtherosclerosisModel of gene-environment interactionMulti-Ethnic StudyTukey's modelLongitudinal settingStudy of AtherosclerosisNormative Aging StudyCase-control studyIncreasing categoriesAging StudyTested interactionsLongitudinal studyCategorical variablesRobust to misspecificationInteraction termsTest departuresShrinkage estimatorsWald testInteraction estimatesIncreased powerOne-degree-of-freedom modelInteraction effectsSetsEnvironmental markers