Skip to Main Content


The development of the software provided here has been supported by the following grants: NIH ES009411, CA050597, CA081345, and CA055075.
Software for measurement error and misclassification correction
  • %blinplus: Implementing Rosner B, Spiegelman S, Willett W. Correction of logistic regression relative risk estimates and confidence intervals for measurement error: the case of multiple covariates measured with error. American Journal of Epidemiology 1990;132: 734-735.
  • betacomp.f: Implementing Spiegelman D, Rosner B. Estimation and inference for binary data with covariate measurement error and misclassification for main study/validation study designs. Journal of the American Statistical Association, 2000; 95:51-61.
  • goodwin.f77: Implementing Crouch EAC, Spiegelman D. The evaluation of integrals of the form f(t)exp{-t2}dt: Application to logistic-normal models. Journal of the American Statistical Association 1990; 85: 464-469.
  • Multsurr method: Implementing Weller E, Milton D, Eisen E, Spiegelman D. Regression calibration for logistic regression with multiple surrogates for one exposure. Journal of Statistical Planning and Inference 2007; 137:449-461. An S plus version is also available.
  • %relibpls8: Implementing Rosner B, Spiegelman D, Willett W, Correction of logistic regression relative risk estimates and confidence intervals for random within person measurement error. American Journal of Epidemiology 1992; 136: 1400-1413
  • %rrc: Implementing the new method developed Liao X, Zucker D, Li Y, Spiegelman D. Survival analysis with error-prone time-varying covariates: a risk set calibration approach. Biometrics 2011 Mar; 67(1):50-58.
Software for study design/power calculation
  • ge_int.f: Implementing Foppa I and Spiegelman D. Power and sample size calculations for case-control studies of gene-environment interactions with a polytomous exposure variable. American Journal of Epidemiology 1997; 146:596-604.
  • ge_trend_v2: Implementing power and sample size calculations developed in Spiegelman D and Logan R. Power and sample size for case-control studies of gene-environment interactions: a new method with comparison to old. Submitted for publication, American Journal of Epidemiology, January, 2002.
  • holcroft.f77: Implementing Holcroft C, Spiegelman D. Design of validation studies for estimating the odds ratio of exposure-disease relationships when exposure is misclassified. Biometrics, 1999; 55:1193-1201.
  • OPTITXS.r: Implementing sample size and power calculations for longitudinal (repeated measures) studies method in The Design of Observational Longitudinal Studies.
  • swdpwr: Implementing power calculations for stepped wedge cluster randomized trials. Binary and continuous outcomes, and cross-sectional and cohort designs are included. For further information, please refer to: 1) Zhou X, Liao X, Kunz LM, Normand ST, Wang M, Spiegelman D. A maximum likelihood approach to power calculations for stepped wedge designs of binary outcomes. Biostatistics. 2020;21(1):102‐121. 2) Li F, Turner EL, Preisser JS. Sample size determination for GEE analyses of stepped wedge cluster randomized trials. Biometrics. 2018;74(4):1450-1458.
Software for studies of disease heterogeneity
  • %contrasttest: The %contrasttest macro conducts heterogeneity test for comparing the exposure-disease associations obtained from separate subtype-specific analysis based on the cohort or nested case-control studies.
  • %meta_subtype_trend: The %meta subtype trend macro tests whether the exposure-subtype association has a trend across the ordinal cancer subtypes. The user runs separate Cox (for cohort studies) or conditional logistic models (for nested case-control studies) for each subtype, and then tests the heterogeneity hypothesis using the outputs from the separate models, or the user takes the estimates (and standard errors) from the literature and test the heterogeneity hypothesis. In the subtype-specific analysis, the confounders-disease associations are allowed to be different among the subtypes.
  • %stepmetareg : A meta-regression method that can utilize existing statistical software for mixed model analysis. This method can be used to assess whether the exposure-subtype associations are different across subtypes defined by one marker while controlling for other markers, and to evaluate whether the difference in exposure-subtype association across subtyped defined by one marker depends on any other markers.
  • %subtype: Macro to examine whether the effects of the exposure vary by subtypes of a disease. It can be applied to data from the cohort studies, nested or matched case-control studies, unmatched case-control studies and case-case studies.
Software for meta-analysis
  • %metaanal: produces Laird-Der Simonian estimators for fixed and random effects models in meta- and pooled analysis.
  • %metadose: SAS macro for meta-analysis of dose-response. It is used when only limited data are available from research reports studying on the same dose-response relationship with different exposure or treatment levels. It is a two step macro: First, for each study, it uses the Greenland method (AJE, 1992) to get a single pooled estimate and its variance estimate across different exposure or treatment levels; Second, it does meta analysis for all relevant studies using the pooled numbers. Submitted for publication, 2010.
  • tcs: Implementing Takkouche B, Cadarso-Surez C, Spiegelman D. An evaluation of old and new tests for heterogeneity in meta-analysis for epidemiologic research. American Journal of Epidemiology, 1999;150:206-215.
Software for analysis/graphics
  • %glmcurv9: The %GLMCURV9 macro uses SAS PROC GENMOD and restricted cubic splines to test whether there is a nonlinear relation between a continuous exposure and an outcome variable. The macro can automatically select spline variables for a model. It produces a publication quality graph of the relationship.
  • %kmplot9: Makes publication-quality Kaplan-Meier plots of survival data, following JAMA guidelines. Produces numerical output of the censoring summary, as well as of tests among subgroups (e.g., log-rank).
  • %lefttrunc: macro makes publication-quality Kaplan-Meier-type curves using left-truncated data for a whole sample or for subgroups/strata.
  • %lgtphcurv9: Implementing Durrleman and Simon’s restricted cubic spline methodology to fit possibly non-linear exposure response curves in Cox and logistic regression models. Publication quality graphs are provided and a stepwise knot selection procedure is available to enhance the flexibility of the method. Govindarajulu U, Spiegelman D, Thurston SW, Eisen EA. Comparing smoothing techniques for modeling exposure-response curves in Cox models. Statistics in Medicine, 2007; 26:3735-3752
  • %mediate: Calculates the point and interval estimates of the percent of treatment (exposure) effect (PTE) explained by an intermediate variable.
  • %par: Computing full and partial population attributable risks and their confidence intervals, for cohort studies. Cancer Causes Control 2007 Jun;18(5):571-9
  • %relrisk9: Implementing log-binomial and log-Poisson models to get risk, prevalence and rate ratios and risk, prevalence and rate differences. Am J Epidemiol 2005;162:199–200.
  • %robreg9: Robust linear regression empirical standard errors and p-values for when reasonable to use PROC REG. Point and interval estimates of effect on the (unitless) percent change scale.
  • %table1: Produces publication quality MS Word table with a breakdown of study/cohort characteristics, typically by categories of an exposure variable.
  • %yoll: Uses PROC PHREG to compute the time from a specific start time (or age) to an outcome (expected time after the start time to the outcome) or the time from the outcome to a specific time (or age) (expected time lost before the end time.
  • %icc9: Intraclass correlation coefficients (ICC) and their 95 percent confidence intervals. Hankinson SE, Manson JE, Spiegelman D, Willett WC, Longcope C, Speizer FE. Reproducibility of plasma hormone levels in postmenopausal women over a two to three year period. Cancer Epidemiology, Biomarkers and Prevention 1995; 4:649-654.
  • %makespl: Makes restricted cubic spline variables for a continuous variable that allows estimating a possibly non-linear relationship with the outcome.
  • mediateP: An R package for implementing the product method for mediation analysis.
Miscellaneous/other software and tutorials
  • %int2way: Makes all the 2-way interaction variables from a list of variables.
  • %pctl9: This macro is intended to make any desired number of quantiles for a list of variables. It can also make quantile indicators and median-score trend variables. A subset of the data can be used to determine the quantile boundaries.
  • HIV-prevalence-screening-data: Code for implementing frequentist and Bayesian estimators.
  • Tutorial: How to control finely for confounding using continuous variables that may have a non-linear association with the outcome
  • Tutorial: How to run a longitudinal GEE model with very large datasets in a reasonable amount of CPU time

Need Accessibility Support?

If you need an accommodation to access this content, please contact:

Ericka Saracho