Causal Inference

Baseball-QEs

Methods and analysis for the impact of rule changes on MLB and its players, using quasi-experimental methods and panel data.

Faculty: Lee Kennedy-Shaffer, PhD

Download: GitHub / Baseball-QEs Package

Platform: R, R Shiny

Reference: doi.org (Baseball-QEs)

BCOPS

We consider the multi-class classification problem when the training data and the out-of-sample test data may have different distributions and propose a method called BCOPS (balanced and conformal optimized prediction sets). BCOPS constructs a prediction set C(x) as a subset of class labels, possibly empty. It tries to optimize the out-of-sample performance, aiming to include the correct class as often as possible, but also detecting outliers x, for which the method returns no prediction (corresponding to C(x) equal to the empty set). The proposed method combines supervised-learning algorithms with the method of conformal prediction to minimize a misclassification loss averaged over the out-of-sample distribution. The constructed prediction sets have a finite-sample coverage guarantee without distributional assumptions. We also propose a method to estimate the outlier detection rate of a given method. We prove asymptotic consistency and optimality of our proposals under suitable assumptions and illustrate our methods on real data examples.

Faculty: Leying Guan, PhD

Download: GitHub / BCOPS package

Platform: R

Reference: doi.org (BCOPS)

cdcsis

Conditional distance correlation <doi:10.1080/01621459.2014.993081> is a novel conditional dependence measurement of two multivariate random variables given a confounding variable. This package provides conditional distance correlation, performs the conditional distance correlation sure independence screening procedure for ultrahigh dimensional data <;and conducts conditional distance covariance test for conditional independence assumption of two multivariate variable.

Faculty: Heping Zhang, PhD

Download: CranR / cdcsis package

Platform: R

Reference: doi.org (cdcsis)

CrossNetworks

Network-based analysis for cross-platform communications.

Faculty: Shuangge Steven Ma, PhD

Download: CrossNetworks package

Platform: R

credence-v2

"Credence uses state-of-the-art deep generative models such as variational auto-encoders (VAEs) to approximate the universe of complex datasets. These generative models are trained and validated on a collection of observed data sets. Credence uses these trained deep generative models to generate data that has analogous complexity to the observed data. Credence’s procedure enables users to have perfect knowledge about ground truth treatment effects of the intervention in the generated data. This allows the users to evaluate their method in a principled fashion without compromising on the complexity or the realness of the data they are evaluating the method on.

Credence learns a generative model by anchoring the level of endogeneity or treatment effect or anchoring both simultaneously. Anchoring the treatment effect and/or endogeneity is analogous to constraining the search space of potential data generators. Our approach can be conceptualized as projecting the true data-generative process to a constrained space of data-generators and finding the closest data-generator that conserves the joint distribution of X,Y,Z as close as possible to that of the observed data under the constraints."

Faculty: Harsh Parikh, PhD

Download: credence-v2 package

Platform: Python

Reference: arxiv.org (credence-v2)

REML-mediation

REML-mediation is an restricted-maximum-likelihood (REML)-based mediation analysis framework that adjusts for genetic confounding effects.

Faculty: Hongyu Zhao, PhD

Download: REML-mediation package

Platform: R

Reference: www.nature.com (REML-mediation)

MALTS

We introduce a flexible framework that produces high-quality almost-exact matches for causal inference. Most prior work in matching uses ad-hoc distance metrics, often leading to poor quality matches, particularly when there are irrelevant covariates. In this work, we learn an interpretable distance metric for matching, which leads to substantially higher quality matches. The learned distance metric stretches the covariate space according to each covariate’s contribution to outcome prediction: this stretching means that mismatches on important covariates carry a larger penalty than mismatches on irrelevant covariates. Our ability to learn flexible distance metrics leads to matches that are interpretable and useful for the estimation of conditional average treatment effects.

Faculty: Harsh Parikh, PhD

Download: MALTS package

Platform: Python, R

Reference: jmlr.org (MALTS)

mediateP

Functions for calculating the point and interval estimates of the natural indirect effect (NIE), total effect (TE), and mediation proportion (MP), based on the product approach.

Faculty: Fan Li, PhD; Donna Spiegelman, ScD

Download: Cran R / mediateP package

Platform: R

Reference: doi.org (mediateP)

msqm

A R Package for Analysis of Marginal Structural Quantile Models. Contains inverse probability weighting, iterative conditional regression, and doubly robust estimation of marginal structural quantile model.

Faculty: Fan Li, PhD

Download: Cran R / msqm package

Platform: R

Reference: doi.org (msqm)

%par

“What % of the cases would be prevented if it were possible to eliminate one or more risk factors from a target population?” The %PAR SAS macro is designed to answer questions such as this by estimating the population attributable risk (PAR) and its 95% confidence interval. We calculate the full PAR and partial PAR, as defined below. The variance formulas implemented here apply only to cohort studies. Currently, the confidence intervals are not valid for case-control studies. Please write to us if you have a case-control study. Models with interaction terms are acceptable. Population prevalences can be considered fixed (e.g. for sensitivity analysis), estimated from the same cohort from which the relative risks were estimated, or estimated from a population survey such as NHANES. • FULL PAR: all measured risk factors are considered eliminated. All members of the target population who are exposed switch to the lowest risk category of all measured risk factors. • PARTIAL PAR: One or more risk factors are considered eliminated, while others are allowed to remain unchanged. References: (Bruzzi et al.(1985), Spiegelman, Hertzmark, and Wand (2006)).

Faculty: Donna Spiegelman, ScD

Download: %par package

Platform: SAS

Reference: doi.org (%par)

%par

Faculty: Donna Spiegelman, ScD

Download: %par package

Platform: SAS

Reference: doi.org (%par)

PSweight

Supports propensity score weighting analysis of observational studies and randomized trials. Enables the estimation and inference of average causal effects with binary and multiple treatments using overlap weights (ATO), inverse probability of treatment weights (ATE), average treatment effect among the treated weights (ATT), matching weights (ATM) and entropy weights (ATEN), with and without propensity score trimming. These weights are members of the family of balancing weights introduced in Li, Morgan and Zaslavsky (2018) <doi:10.1080/01621459.2016.1260466> and Li and Li (2019) <doi:10.1214/19-AOAS1282>.

Faculty: Fan Li, PhD; Guangyu Tong, PhD

Download: Cran R / PSweight package

Platform: R

Reference: The R Journal (PSweight)