Miscellaneous
ANNI
Copy Link
Aligned Deep Neural Network for Integrative Analysis with High-dimensional Input.
Faculty: Shuangge Steven Ma, PhD
Download: ANNI package
Platform: Python
Ball
Copy Link
Hypothesis tests and sure independence screening (SIS) procedure based on ball statistics, including ball divergence <doi:10.1214/17-AOS1579>, ball covariance <doi:10.1080/01621459.2018.1543600>, and ball correlation <doi:10.1080/01621459.2018.1462709>, are developed to analyze complex data in metric spaces, e.g, shape, directional, compositional and symmetric positive definite matrix data. The ball divergence and ball covariance based distribution-free tests are implemented to detecting distribution difference and association in metric spaces <doi:10.18637/jss.v097.i06>. Furthermore, several generic non-parametric feature selection procedures based on ball correlation, BCor-SIS and all of its variants, are implemented to tackle the challenge in the context of ultra high dimensional data. A fast implementation for large-scale multiple K-sample testing with ball divergence <doi:10.1002/gepi.22423> is supported, which is particularly helpful for genome-wide association study.
Faculty: Heping Zhang, PhD
Download: CranR/ Ball package
Platform: R
Reference: doi.org (Ball)
BCRA
Copy Link
Understanding the genetic architecture of brain functions is essential to clarify the biological etiologies of behavioral and psychiatric disorders. Functional connectivity, representing pairwise correlations of neural activities between brain regions, is moderately heritable. Current methods to identify single nucleotide polymorphisms (SNPs) linked to functional connectivity either neglect the complex structure of functional connectivity or fail to control false discoveries. Therefore, we propose a SNP-set hypothesis test, Ball Covariance Ranking and Aggregation (BCRA), to select and test the significance of SNP sets related to functional connectivity, incorporating matrix structure and controlling false discovery rate. Additionally, we present subsample-BCRA, a faster version for large-scale datasets.
Faculty: Heping Zhang, PhD
Download: GitHub / BCRA package
Platform: R
Reference: doi.org (BCRA)
betacomp.f
Copy Link
Software for implementing Spiegelman D, Rosner B. “Estimation and inference for binary data with covariate measurement error and misclassification for main study/validation study designs.” Submitted for publication, J American Statist Assoc, June, 1997.
Faculty: Donna Spiegelman, ScD
Download: betacomp.f package
Platform: Fortran
Reference: doi.org (betacomp.f)
%blinplus
Copy Link
The macro %blinplus corrects for measurement error in one or more model covariates logistic regression coeÿcients, their standard errors, and odds ratios and 95% confidence intervals for a biologically meaningful dierence specified by the user (the ”weights”). Regression model parameters from Cox models (PROC PHREG) and linear regression models (PROC REG) can also be corrected. A validation study is required to empirically characterize the measurement error model. Options are given for main study/external validation study designs, and main study/internal validation study designs (Spiegelman, Carrol, Kipnis; 2001). Technical details are given in Rosner et al. (1989), Rosner et al. (1990), and Spiegelman et all (1997).
Faculty: Donna Spiegelman, ScD
Download: %blinplus package
Platform: SAS
Reference: doi.org (%blinplus)
BrainSubnetwork
Copy Link
Constructing Brain Subnetworks via a High-Dimensional Multi-Task Learning Model with Group-wise Mixtures
Faculty: Heping Zhang, PhD
Download: GitHub / BrainSubnetwork package
Platform: R
Reference: doi.org (BrainSubnetwork)
ge.int.f
Copy Link
Software for implementing Foppa I, Spiegelman D. “Power and sample size calculations for case-control studies of gene-environment interactions with a polytomous exposure variable”. American Journal of Epidemiology, 1997; 146:596-604.
Faculty: Donna Spiegelman, ScD
Download: ge.int.f package
Platform: Fortran
Reference: doi.org (ge.int.f)
%glmcurv9
Copy Link
The %GLMCURV9 macro uses SAS PROC GENMOD and restricted cubic splines to test whether there is nonlinear relation between a continuous exposure and an outcome variable. The macro can automatically select spline variables for a model. It produces a publication quality graph of the relationship.
Faculty: Donna Spiegelman, ScD
Download: %glmcurv9 package
Platform: SAS
goodwin.f77
Copy Link
goodwin.f77 Implementing Crouch EAC, Spiegelman D. The evaluation of integrals of the form f(t)exp{-t2}dt: Application to logistic-normal models. Journal of the American Statistical Association 1990; 85: 464-469.
Faculty: Donna Spiegelman, ScD
Download: goodwin.f77package
Platform: Fortran
Reference: doi.org (goodwin.f77)
insectDisease
Copy Link
David Onstad provided us with this insect disease database, sometimes referred to as the 'Ecological Database of the Worlds Insect Pathogens' or EDWIP. Files have been converted from 'SQL' to csv, and ported into 'R' for easy exploration and analysis. Thanks to the Macroecology of Infectious Disease Research Coordination Network (RCN) for funding and support. Data are also served online in a static format at <https://edwip.ecology.uga.edu/>;.
Faculty: Colin J. Carlson, PhD
Download: insectDisease package
Platform: R
Reference: doi.org (insectDisease)
%int2way
Copy Link
The %INT2WAY macro is a SAS macro that constructs all the 2-way interactions among a set of variables. It also makes a global macro variable that lists the new variables.
Faculty: Donna Spiegelman, ScD
Download: %int2way package
Platform: SAS
Jot: Journal Targeter
Copy Link
Jot: Journal Targeter
Jot builds upon the API of Jane (Journal/Author Name Estimator, https://jane.biosemantics.org/) to identify PubMed articles that are similar in content to a manuscript's title and abstract. Jot gathers these articles and their similarity scores together with manuscript citations and a journal metadata assembled from the National Library of Medicine (NLM) Catalog, the Directory of Open Access Journals (DOAJ), Sherpa Romeo, and impact metric databases. The result is a personalized, multi-dimensional data set that can be navigated through a series of linked, interactive plots and tables, allowing an author to sort and study journals according to the attributes most important to them.
Faculty: Jeffrey Townsend, PhD
Download: GitHub / Jot: Journal Targeter package
Platform: Python
Reference: doi.org (Jot: Journal Targeter)
%kmplot9
Copy Link
The %KMPLOT9 macro makes publication-quality Kaplan-Meier curves for a whole sample or for subgroups/strata. If there are subgroups/strata, it does the log-rank test.
Faculty: Donna Spiegelman, ScD
Download: %kmplot9 package
Platform: SAS
LaPreprint
Copy Link
A template for easily creating pretty, nicely formatted preprints in LaTeX.
Faculty: Colin J. Carlson, PhD
Download: LaPreprint package
Platform: LaTeX
LCP
Copy Link
We propose a new inference framework called localized conformal prediction. It generalizes the framework of conformal prediction by offering a single-test-sample adaptive construction that emphasizes a local region around this test sample, and can be combined with different conformal score constructions. The proposed framework enjoys an assumption-free finite sample marginal coverage guarantee, and it also offers additional local coverage guarantees under suitable assumptions. We demonstrate how to change from conformal prediction to localized conformal prediction using several conformal scores, and we illustrate a potential gain via numerical examples.
Faculty: Leying Guan, PhD
Download: GitHub / LCP package
Platform: R
Reference: doi.org (LCP)
%lgtphcurv9
Copy Link
The %LGTPHCURV9 macro fits restricted cubic splines to unconditional logistic, pooled logistic, conditional logistic, and proportional hazards regression models to examine non-parametrically the (possibly non-linear) relation between an exposure and the odds ratio (OR) or incidence rate ratio (IRR) of the outcome of interest. It allows for controlling for covariates. It also allows stepwise selection among spline variables. The output is the set of p-values from the likelihood ratio tests for non-linearity, a linear relation, and any relation, as well as a graph of the OR, IRR, predicted cumulative incidence or prevalence, or the predicted incidence rate (IR), with or without its confidence band. The confidence band can be shown as the bounds of the confidence band, or as a ”cloud” (gray area) around the OR/IRR/RR curve. In addition, the macro can display a smoothed histogram of the distribution of the exposure variable in the data being used.
Faculty: Donna Spiegelman, ScD
Download: %lgtphcurv9 package
Platform: SAS
Reference: doi.org (%lgtphcurv9)
%lefttrunc
Copy Link
The %LEFTTRUNC marco makes publication-ready Kaplan-Meier-type curves using left-truncated data for a whole sample or for subgroups/strata.
Faculty: Donna Spiegelman, ScD
Download: %lefttruncs package
Platform: SAS
%makespl
Copy Link
The %MAKESPL macro is a SAS macro that makes restricted cubic spline variables to be used in procedures. It is incorporated in several of the macros that test for non-linearity, but can also be used on its own to create spline variables for covariates (allowing better control for the covariates usually using up fewer degrees of freedom).
Faculty: Donna Spiegelman, ScD
Download: %makespl package
Platform: SAS
Reference: doi.org (%makespl)
Multsurr method
Copy Link
The SAS Macro %multisurr described in this documentation perform regression calibration for multiple surrogates with one exposure as discussed in the paper by Weller et al (submitted to Biostatistics 2004). This type of data is often encouraged in occupational studies where the measurement of exposure can be quite complex and is characterized by numerous factors of the workplace; therefore, multiple surrogates often describe one exposure.
Faculty: Donna Spiegelman, ScD
Download: Multsurr method package
Platform: SAS; S-PLUS
Reference: doi.org (Multsurr method)
NextDoor
Copy Link
We propose a simple method for evaluating the model that has been chosen by an adaptive regression procedure, our main focus being the lasso. This procedure deletes each chosen predictor and refits the lasso to get a set of models that are "close" to the chosen "base model," and compares the error rates of the base model with that of nearby models. If the deletion of a predictor leads to significant deterioration in the model's predictive power, the predictor is called indispensable; otherwise, the nearby model is called acceptable and can serve as a good alternative to the base model. This provides both an assessment of the predictive contribution of each variable and a set of alternative models that may be used in place of the chosen model. We call this procedure "Next-Door analysis" since it examines models "next" to the base model. It can be applied to supervised learning problems with ℓ1 penalization and stepwise procedures. We have implemented it in the R language as a library to accompany the well-known glmnet library.
Faculty: Leying Guan, PhD
Download: GitHub / NextDoor package
Platform: R
Reference: doi.org (NextDoor)
NVivotools
Copy Link
"A range of tools to help you get more out of NVivo(tm). Thanks to the wonderful Wooey some of these tools are now available for use on our server at http://wooey.barraqda.org. The core of NVivotools its ability to convert qualitative research data (sources, nodes, coding, etc.) into and out of NVivo's proprietary format. Some reasons why you might want to do this include:
- Freeing your work. Make your research data available to whomever your want (including your future self), not only those with their own current NVivo licence.
- Choose the tools you want to manipulate your data. NVivo's GUI isn't bad, but sometimes you'd prefer to be able to automate. Use some of the plethora of data management tools or your own coding skills to take charge of your data.
- Interface with the rest of your IT world. Make NVivo part of your tookit, not your whole world.
The core of NVivotools is its ability to make sense of NVivo's proprietary file structures. These files are, in face, relational database. The Windows version uses Microsoft SQL Server while the Mac version uses SQL Anywhere. NVivotools is able to minimise the difficulties of working with different database engines by using SQLAlchemy."
Faculty: Colin J. Carlson, PhD
Download: NVivotools package
Platform: Python
%pctl9
Copy Link
The %PCTL9 macro is intended to make any desired number of quantiles for a list of variables. It can also make quantile indicators and median-score trend variables. A subset of the data can be used to determine the quantile boundaries.
Faculty: Donna Spiegelman, ScD
Download: %pctl9 package
Platform: SAS
pharos-api
Copy Link
This repository is part of the Pharos project which is split into three repositories: Repository Purpose, pharos-frontend: Frontend application and deployment infrastructure, pharos-api: API and deployment infrastructure, pharos-database: SQL database and deployment infrastructure, pharos-documentation: Markdown files used to generate about pages
Faculty: Colin J. Carlson, PhD
Download: pharos-api package
Platform: Python
plasma
Copy Link
Tools for making Plasmodium maps.
Faculty: Colin J. Carlson, PhD
Download: plasma package
Platform: R
rangeshifts
Copy Link
A new package that tracks species range shifts. It's still super in development so don't expect anything here for a while. Drop me a line if you're interested in helping somehow. (Dave Matthews voice) ants marching
Faculty: Colin J. Carlson, PhD
Download: rangeshifts package
Platform: R
%relibpls8
Copy Link
The macro %relibpls calculates regression coefficient, their standard errors, and odds ratios, when relevant, and 95% confidence intervals for a biologically meaningful difference specified by the user (the “increments”), where all are corrected for measurement error in one or more model covariates. Linear (proc reg), logistic (proc logistic), survival and conditional logistic (proc purge) and mixed (proc mixed) models are implemented. A reliability study is required to empirically characterize the measurement error model. Details are given in Rosner et al. (1989), Rosner et al. (1990), and Rosner et al. (1992), including “real data” examples.
Faculty: Donna Spiegelman, ScD
Download: %relibpls8 package
Platform: SAS
Reference: doi.org (%relibpls8)
%rrc
Copy Link
The macro %rrc uses the risk set regression calibration (RRC) method to correct the point and interval estimate of the relative risk in the Cox proportional hazard regression model for bias due to measurement error in one or more baseline or time-varying exposures, including time-varying variables that are functions of the exposure history such as the 12-month moving average exposure, cumulative average exposure, cumulative total exposure, etc. An external and internal validation study designs are available to use this macro. Technical details are given in Liao et al. (2011) and Liao et al. (2018).
Faculty: Donna Spiegelman, ScD
Download: %rrc package
Platform: SAS
Reference: doi.org (%rrc)
r-reproducible-repo
Copy Link
The repository provides a template for reproducible R projects. It uses {targets} to manage code execution (among other things) and {renv} to manage packages. Compute environments are managed via docker and continuous integration happens in github actions.
Faculty: Colin J. Carlson, PhD
Download: r-reproducible-repo package
Platform: R
%robreg9
Copy Link
The %ROBREG9 macro is a SAS version 9 macro that runs robust linear regression models showing both the model-based (assuming normality) and empirical standard errors, for situations where it is reasonable to use PROC REG (i.e. no repeated measures, continuous dependent variable). This macro can also calculate point and interval estimates of eect on the (unitless) percent change scale, which is often more widely interpretable.
Faculty: Donna Spiegelman, ScD
Download: %robreg9 package
Platform: SAS
virosolver
Copy Link
Methods to infer epidemiologic incidence curves from viral load (PCR Ct value) data, sampled cross-sectionally or in repeated cross-sections.
Faculty: Lee Kennedy-Shaffer, PhD
Download: GitHub / virosolver package
Platform: R
Reference: doi.org (virosolver)
westbound
Copy Link
Some convenience tools for BART modeling
Faculty: Colin J. Carlson, PhD
Download: westbound package
Platform: R
wddsWizard
Copy Link
This is an R package for validating data against the Wildlife Disease Data Standard. It allows users to restructure and validate data sets.
Faculty: Colin J. Carlson, PhD
Download: wddsWizard package
Platform: R
%yoll
Copy Link
The SAS YOLL macro uses PROC PHREG to compute the time from a specific start time (or age) to an outcome (expected time after the start time to the outcome) or the time from the outcome to a specific time (or age) (expected time lost before the end time). It bootstraps to get the confidence bounds on these values. It computes these values for different levels of an exposure at specified values of the covariates.
Faculty: Donna Spiegelman, ScD
Download: %yoll package
Platform: SAS