Causal Inference
Baseball-QEs
Copy Link
Methods and analysis for the impact of rule changes on MLB and its players, using quasi-experimental methods and panel data.
Faculty: Lee Kennedy-Shaffer, PhD
Download: GitHub / Baseball-QEs Package
Platform: R, R Shiny
Reference: doi.org (Baseball-QEs)
BCOPS
Copy Link
We consider the multi-class classification problem when the training data and the out-of-sample test data may have different distributions and propose a method called BCOPS (balanced and conformal optimized prediction sets). BCOPS constructs a prediction set C(x) as a subset of class labels, possibly empty. It tries to optimize the out-of-sample performance, aiming to include the correct class as often as possible, but also detecting outliers x, for which the method returns no prediction (corresponding to C(x) equal to the empty set). The proposed method combines supervised-learning algorithms with the method of conformal prediction to minimize a misclassification loss averaged over the out-of-sample distribution. The constructed prediction sets have a finite-sample coverage guarantee without distributional assumptions. We also propose a method to estimate the outlier detection rate of a given method. We prove asymptotic consistency and optimality of our proposals under suitable assumptions and illustrate our methods on real data examples.
Faculty: Leying Guan, PhD
Download: GitHub / BCOPS package
Platform: R
Reference: doi.org (BCOPS)
cdcsis
Copy Link
Conditional distance correlation <doi:10.1080/01621459.2014.993081> is a novel conditional dependence measurement of two multivariate random variables given a confounding variable. This package provides conditional distance correlation, performs the conditional distance correlation sure independence screening procedure for ultrahigh dimensional data <;and conducts conditional distance covariance test for conditional independence assumption of two multivariate variable.
Faculty: Heping Zhang, PhD
Download: CranR / cdcsis package
Platform: R
Reference: doi.org (cdcsis)
CrossNetworks
Copy Link
Network-based analysis for cross-platform communications.
Faculty: Shuangge Steven Ma, PhD
Download: CrossNetworks package
Platform: R
credence-v2
Copy Link
"Credence uses state-of-the-art deep generative models such as variational auto-encoders (VAEs) to approximate the universe of complex datasets. These generative models are trained and validated on a collection of observed data sets. Credence uses these trained deep generative models to generate data that has analogous complexity to the observed data. Credence’s procedure enables users to have perfect knowledge about ground truth treatment effects of the intervention in the generated data. This allows the users to evaluate their method in a principled fashion without compromising on the complexity or the realness of the data they are evaluating the method on.
Credence learns a generative model by anchoring the level of endogeneity or treatment effect or anchoring both simultaneously. Anchoring the treatment effect and/or endogeneity is analogous to constraining the search space of potential data generators. Our approach can be conceptualized as projecting the true data-generative process to a constrained space of data-generators and finding the closest data-generator that conserves the joint distribution of X,Y,Z as close as possible to that of the observed data under the constraints."
Faculty: Harsh Parikh, PhD
Download: credence-v2 package
Platform: Python
Reference: arxiv.org (credence-v2)
REML-mediation
Copy Link
REML-mediation is an restricted-maximum-likelihood (REML)-based mediation analysis framework that adjusts for genetic confounding effects.
Faculty: Hongyu Zhao, PhD
Download: REML-mediation package
Platform: R
Reference: www.nature.com (REML-mediation)
MALTS
Copy Link
We introduce a flexible framework that produces high-quality almost-exact matches for causal inference. Most prior work in matching uses ad-hoc distance metrics, often leading to poor quality matches, particularly when there are irrelevant covariates. In this work, we learn an interpretable distance metric for matching, which leads to substantially higher quality matches. The learned distance metric stretches the covariate space according to each covariate’s contribution to outcome prediction: this stretching means that mismatches on important covariates carry a larger penalty than mismatches on irrelevant covariates. Our ability to learn flexible distance metrics leads to matches that are interpretable and useful for the estimation of conditional average treatment effects.
Faculty: Harsh Parikh, PhD
Download: MALTS package
Platform: Python, R
Reference: jmlr.org (MALTS)
mediateP
Copy Link
Functions for calculating the point and interval estimates of the natural indirect effect (NIE), total effect (TE), and mediation proportion (MP), based on the product approach.
Faculty: Fan Li, PhD; Donna Spiegelman, ScD
Download: Cran R / mediateP package
Platform: R
Reference: doi.org (mediateP)
msqm
Copy Link
A R Package for Analysis of Marginal Structural Quantile Models. Contains inverse probability weighting, iterative conditional regression, and doubly robust estimation of marginal structural quantile model.
Faculty: Fan Li, PhD
Download: Cran R / msqm package
Platform: R
Reference: doi.org (msqm)
%par
Copy Link
“What % of the cases would be prevented if it were possible to eliminate one or more risk factors from a target population?” The %PAR SAS macro is designed to answer questions such as this by estimating the population attributable risk (PAR) and its 95% confidence interval. We calculate the full PAR and partial PAR, as defined below. The variance formulas implemented here apply only to cohort studies. Currently, the confidence intervals are not valid for case-control studies. Please write to us if you have a case-control study. Models with interaction terms are acceptable. Population prevalences can be considered fixed (e.g. for sensitivity analysis), estimated from the same cohort from which the relative risks were estimated, or estimated from a population survey such as NHANES. • FULL PAR: all measured risk factors are considered eliminated. All members of the target population who are exposed switch to the lowest risk category of all measured risk factors. • PARTIAL PAR: One or more risk factors are considered eliminated, while others are allowed to remain unchanged. References: (Bruzzi et al.(1985), Spiegelman, Hertzmark, and Wand (2006)).
Faculty: Donna Spiegelman, ScD
Download: %par package
Platform: SAS
Reference: doi.org (%par)
%par
Copy Link
“What % of the cases would be prevented if it were possible to eliminate one or more risk factors from a target population?” The %PAR SAS macro is designed to answer questions such as this by estimating the population attributable risk (PAR) and its 95% confidence interval. We calculate the full PAR and partial PAR, as defined below. The variance formulas implemented here apply only to cohort studies. Currently, the confidence intervals are not valid for case-control studies. Please write to us if you have a case-control study. Models with interaction terms are acceptable. Population prevalences can be considered fixed (e.g. for sensitivity analysis), estimated from the same cohort from which the relative risks were estimated, or estimated from a population survey such as NHANES. • FULL PAR: all measured risk factors are considered eliminated. All members of the target population who are exposed switch to the lowest risk category of all measured risk factors. • PARTIAL PAR: One or more risk factors are considered eliminated, while others are allowed to remain unchanged. References: (Bruzzi et al.(1985), Spiegelman, Hertzmark, and Wand (2006)).
Faculty: Donna Spiegelman, ScD
Download: %par package
Platform: SAS
Reference: doi.org (%par)
PSweight
Copy Link
PSweight
Faculty: Fan Li, PhD; Guangyu Tong, PhD
Download: Cran R / PSweight package
Platform: R
Reference: The R Journal (PSweight)