Miscellaneous

ANNI

Aligned Deep Neural Network for Integrative Analysis with High-dimensional Input.

Faculty: Shuangge Steven Ma, PhD

Download: ANNI package

Platform: Python

Ball

Hypothesis tests and sure independence screening (SIS) procedure based on ball statistics, including ball divergence <doi:10.1214/17-AOS1579>, ball covariance <doi:10.1080/01621459.2018.1543600>, and ball correlation <doi:10.1080/01621459.2018.1462709>, are developed to analyze complex data in metric spaces, e.g, shape, directional, compositional and symmetric positive definite matrix data. The ball divergence and ball covariance based distribution-free tests are implemented to detecting distribution difference and association in metric spaces <doi:10.18637/jss.v097.i06>. Furthermore, several generic non-parametric feature selection procedures based on ball correlation, BCor-SIS and all of its variants, are implemented to tackle the challenge in the context of ultra high dimensional data. A fast implementation for large-scale multiple K-sample testing with ball divergence <doi:10.1002/gepi.22423> is supported, which is particularly helpful for genome-wide association study.

Faculty: Heping Zhang, PhD

Download: CranR/ Ball package

Platform: R

Reference: doi.org (Ball)

BCRA

Understanding the genetic architecture of brain functions is essential to clarify the biological etiologies of behavioral and psychiatric disorders. Functional connectivity, representing pairwise correlations of neural activities between brain regions, is moderately heritable. Current methods to identify single nucleotide polymorphisms (SNPs) linked to functional connectivity either neglect the complex structure of functional connectivity or fail to control false discoveries. Therefore, we propose a SNP-set hypothesis test, Ball Covariance Ranking and Aggregation (BCRA), to select and test the significance of SNP sets related to functional connectivity, incorporating matrix structure and controlling false discovery rate. Additionally, we present subsample-BCRA, a faster version for large-scale datasets.

Faculty: Heping Zhang, PhD

Download: GitHub / BCRA package

Platform: R

Reference: doi.org (BCRA)

betacomp.f

Software for implementing Spiegelman D, Rosner B. “Estimation and inference for binary data with covariate measurement error and misclassification for main study/validation study designs.” Submitted for publication, J American Statist Assoc, June, 1997.

Faculty: Donna Spiegelman, ScD

Download: betacomp.f package

Platform: Fortran

Reference: doi.org (betacomp.f)

%blinplus

The macro %blinplus corrects for measurement error in one or more model covariates logistic regression coeÿcients, their standard errors, and odds ratios and 95% confidence intervals for a biologically meaningful dierence specified by the user (the ”weights”). Regression model parameters from Cox models (PROC PHREG) and linear regression models (PROC REG) can also be corrected. A validation study is required to empirically characterize the measurement error model. Options are given for main study/external validation study designs, and main study/internal validation study designs (Spiegelman, Carrol, Kipnis; 2001). Technical details are given in Rosner et al. (1989), Rosner et al. (1990), and Spiegelman et all (1997).

Faculty: Donna Spiegelman, ScD

Download: %blinplus package

Platform: SAS

Reference: doi.org (%blinplus)

BrainSubnetwork

Constructing Brain Subnetworks via a High-Dimensional Multi-Task Learning Model with Group-wise Mixtures

Faculty: Heping Zhang, PhD

Download: GitHub / BrainSubnetwork package

Platform: R

Reference: doi.org (BrainSubnetwork)

ge.int.f

Software for implementing Foppa I, Spiegelman D. “Power and sample size calculations for case-control studies of gene-environment interactions with a polytomous exposure variable”. American Journal of Epidemiology, 1997; 146:596-604.

Faculty: Donna Spiegelman, ScD

Download: ge.int.f package

Platform: Fortran

Reference: doi.org (ge.int.f)

%glmcurv9

The %GLMCURV9 macro uses SAS PROC GENMOD and restricted cubic splines to test whether there is nonlinear relation between a continuous exposure and an outcome variable. The macro can automatically select spline variables for a model. It produces a publication quality graph of the relationship.

Faculty: Donna Spiegelman, ScD

Download: %glmcurv9 package

Platform: SAS

goodwin.f77

goodwin.f77 Implementing Crouch EAC, Spiegelman D. The evaluation of integrals of the form f(t)exp{-t2}dt: Application to logistic-normal models. Journal of the American Statistical Association 1990; 85: 464-469.

Faculty: Donna Spiegelman, ScD

Download: goodwin.f77package

Platform: Fortran

Reference: doi.org (goodwin.f77)

insectDisease

David Onstad provided us with this insect disease database, sometimes referred to as the 'Ecological Database of the Worlds Insect Pathogens' or EDWIP. Files have been converted from 'SQL' to csv, and ported into 'R' for easy exploration and analysis. Thanks to the Macroecology of Infectious Disease Research Coordination Network (RCN) for funding and support. Data are also served online in a static format at <https://edwip.ecology.uga.edu/>;.

Faculty: Colin J. Carlson, PhD

Download: insectDisease package

Platform: R

Reference: doi.org (insectDisease)

%int2way

The %INT2WAY macro is a SAS macro that constructs all the 2-way interactions among a set of variables. It also makes a global macro variable that lists the new variables.

Faculty: Donna Spiegelman, ScD

Download: %int2way package

Platform: SAS

Jot: Journal Targeter

Jot: Journal Targeter

Jot builds upon the API of Jane (Journal/Author Name Estimator, https://jane.biosemantics.org/) to identify PubMed articles that are similar in content to a manuscript's title and abstract. Jot gathers these articles and their similarity scores together with manuscript citations and a journal metadata assembled from the National Library of Medicine (NLM) Catalog, the Directory of Open Access Journals (DOAJ), Sherpa Romeo, and impact metric databases. The result is a personalized, multi-dimensional data set that can be navigated through a series of linked, interactive plots and tables, allowing an author to sort and study journals according to the attributes most important to them.

Faculty: Jeffrey Townsend, PhD

Download: GitHub / Jot: Journal Targeter package

Platform: Python

Reference: doi.org (Jot: Journal Targeter)

%kmplot9

The %KMPLOT9 macro makes publication-quality Kaplan-Meier curves for a whole sample or for subgroups/strata. If there are subgroups/strata, it does the log-rank test.

Faculty: Donna Spiegelman, ScD

Download: %kmplot9 package

Platform: SAS

LaPreprint

A template for easily creating pretty, nicely formatted preprints in LaTeX.

Faculty: Colin J. Carlson, PhD

Download: LaPreprint package

Platform: LaTeX

LCP

We propose a new inference framework called localized conformal prediction. It generalizes the framework of conformal prediction by offering a single-test-sample adaptive construction that emphasizes a local region around this test sample, and can be combined with different conformal score constructions. The proposed framework enjoys an assumption-free finite sample marginal coverage guarantee, and it also offers additional local coverage guarantees under suitable assumptions. We demonstrate how to change from conformal prediction to localized conformal prediction using several conformal scores, and we illustrate a potential gain via numerical examples.

Faculty: Leying Guan, PhD

Download: GitHub / LCP package

Platform: R

Reference: doi.org (LCP)

%lgtphcurv9

The %LGTPHCURV9 macro fits restricted cubic splines to unconditional logistic, pooled logistic, conditional logistic, and proportional hazards regression models to examine non-parametrically the (possibly non-linear) relation between an exposure and the odds ratio (OR) or incidence rate ratio (IRR) of the outcome of interest. It allows for controlling for covariates. It also allows stepwise selection among spline variables. The output is the set of p-values from the likelihood ratio tests for non-linearity, a linear relation, and any relation, as well as a graph of the OR, IRR, predicted cumulative incidence or prevalence, or the predicted incidence rate (IR), with or without its confidence band. The confidence band can be shown as the bounds of the confidence band, or as a ”cloud” (gray area) around the OR/IRR/RR curve. In addition, the macro can display a smoothed histogram of the distribution of the exposure variable in the data being used.

Faculty: Donna Spiegelman, ScD

Download: %lgtphcurv9 package

Platform: SAS

Reference: doi.org (%lgtphcurv9)

%lefttrunc

The %LEFTTRUNC marco makes publication-ready Kaplan-Meier-type curves using left-truncated data for a whole sample or for subgroups/strata.

Faculty: Donna Spiegelman, ScD

Download: %lefttruncs package

Platform: SAS

%makespl

The %MAKESPL macro is a SAS macro that makes restricted cubic spline variables to be used in procedures. It is incorporated in several of the macros that test for non-linearity, but can also be used on its own to create spline variables for covariates (allowing better control for the covariates usually using up fewer degrees of freedom).

Faculty: Donna Spiegelman, ScD

Download: %makespl package

Platform: SAS

Reference: doi.org (%makespl)

Multsurr method

The SAS Macro %multisurr described in this documentation perform regression calibration for multiple surrogates with one exposure as discussed in the paper by Weller et al (submitted to Biostatistics 2004). This type of data is often encouraged in occupational studies where the measurement of exposure can be quite complex and is characterized by numerous factors of the workplace; therefore, multiple surrogates often describe one exposure.

Faculty: Donna Spiegelman, ScD

Download: Multsurr method package

Platform: SAS; S-PLUS

Reference: doi.org (Multsurr method)

NextDoor

We propose a simple method for evaluating the model that has been chosen by an adaptive regression procedure, our main focus being the lasso. This procedure deletes each chosen predictor and refits the lasso to get a set of models that are "close" to the chosen "base model," and compares the error rates of the base model with that of nearby models. If the deletion of a predictor leads to significant deterioration in the model's predictive power, the predictor is called indispensable; otherwise, the nearby model is called acceptable and can serve as a good alternative to the base model. This provides both an assessment of the predictive contribution of each variable and a set of alternative models that may be used in place of the chosen model. We call this procedure "Next-Door analysis" since it examines models "next" to the base model. It can be applied to supervised learning problems with ℓ1 penalization and stepwise procedures. We have implemented it in the R language as a library to accompany the well-known glmnet library.

Faculty: Leying Guan, PhD

Download: GitHub / NextDoor package

Platform: R

Reference: doi.org (NextDoor)

NVivotools

"A range of tools to help you get more out of NVivo(tm). Thanks to the wonderful Wooey some of these tools are now available for use on our server at http://wooey.barraqda.org. The core of NVivotools its ability to convert qualitative research data (sources, nodes, coding, etc.) into and out of NVivo's proprietary format. Some reasons why you might want to do this include:

Freeing your work. Make your research data available to whomever your want (including your future self), not only those with their own current NVivo licence.
Choose the tools you want to manipulate your data. NVivo's GUI isn't bad, but sometimes you'd prefer to be able to automate. Use some of the plethora of data management tools or your own coding skills to take charge of your data.
Interface with the rest of your IT world. Make NVivo part of your tookit, not your whole world.

The core of NVivotools is its ability to make sense of NVivo's proprietary file structures. These files are, in face, relational database. The Windows version uses Microsoft SQL Server while the Mac version uses SQL Anywhere. NVivotools is able to minimise the difficulties of working with different database engines by using SQLAlchemy."

Faculty: Colin J. Carlson, PhD

Download: NVivotools package

Platform: Python

%pctl9

The %PCTL9 macro is intended to make any desired number of quantiles for a list of variables. It can also make quantile indicators and median-score trend variables. A subset of the data can be used to determine the quantile boundaries.

Faculty: Donna Spiegelman, ScD

Download: %pctl9 package

Platform: SAS

pharos-api

This repository is part of the Pharos project which is split into three repositories: Repository Purpose, pharos-frontend: Frontend application and deployment infrastructure, pharos-api: API and deployment infrastructure, pharos-database: SQL database and deployment infrastructure, pharos-documentation: Markdown files used to generate about pages

Faculty: Colin J. Carlson, PhD

Download: pharos-api package

Platform: Python

plasma

Tools for making Plasmodium maps.

Faculty: Colin J. Carlson, PhD

Download: plasma package

Platform: R

rangeshifts

A new package that tracks species range shifts. It's still super in development so don't expect anything here for a while. Drop me a line if you're interested in helping somehow. (Dave Matthews voice) ants marching

Faculty: Colin J. Carlson, PhD

Download: rangeshifts package

Platform: R

%relibpls8

The macro %relibpls calculates regression coefficient, their standard errors, and odds ratios, when relevant, and 95% confidence intervals for a biologically meaningful difference specified by the user (the “increments”), where all are corrected for measurement error in one or more model covariates. Linear (proc reg), logistic (proc logistic), survival and conditional logistic (proc purge) and mixed (proc mixed) models are implemented. A reliability study is required to empirically characterize the measurement error model. Details are given in Rosner et al. (1989), Rosner et al. (1990), and Rosner et al. (1992), including “real data” examples.

Faculty: Donna Spiegelman, ScD

Download: %relibpls8 package

Platform: SAS

Reference: doi.org (%relibpls8)

%rrc

The macro %rrc uses the risk set regression calibration (RRC) method to correct the point and interval estimate of the relative risk in the Cox proportional hazard regression model for bias due to measurement error in one or more baseline or time-varying exposures, including time-varying variables that are functions of the exposure history such as the 12-month moving average exposure, cumulative average exposure, cumulative total exposure, etc. An external and internal validation study designs are available to use this macro. Technical details are given in Liao et al. (2011) and Liao et al. (2018).

Faculty: Donna Spiegelman, ScD

Download: %rrc package

Platform: SAS

Reference: doi.org (%rrc)

r-reproducible-repo

The repository provides a template for reproducible R projects. It uses {targets} to manage code execution (among other things) and {renv} to manage packages. Compute environments are managed via docker and continuous integration happens in github actions.

Faculty: Colin J. Carlson, PhD

Download: r-reproducible-repo package

Platform: R

%robreg9

The %ROBREG9 macro is a SAS version 9 macro that runs robust linear regression models showing both the model-based (assuming normality) and empirical standard errors, for situations where it is reasonable to use PROC REG (i.e. no repeated measures, continuous dependent variable). This macro can also calculate point and interval estimates of eect on the (unitless) percent change scale, which is often more widely interpretable.

Faculty: Donna Spiegelman, ScD

Download: %robreg9 package

Platform: SAS

virosolver

Methods to infer epidemiologic incidence curves from viral load (PCR Ct value) data, sampled cross-sectionally or in repeated cross-sections.

Faculty: Lee Kennedy-Shaffer, PhD

Download: GitHub / virosolver package

Platform: R

Reference: doi.org (virosolver)

westbound

Some convenience tools for BART modeling

Faculty: Colin J. Carlson, PhD

Download: westbound package

Platform: R

wddsWizard

This is an R package for validating data against the Wildlife Disease Data Standard. It allows users to restructure and validate data sets.

Faculty: Colin J. Carlson, PhD

Download: wddsWizard package

Platform: R

%yoll

The SAS YOLL macro uses PROC PHREG to compute the time from a specific start time (or age) to an outcome (expected time after the start time to the outcome) or the time from the outcome to a specific time (or age) (expected time lost before the end time). It bootstraps to get the confidence bounds on these values. It computes these values for different levels of an exposure at specified values of the covariates.

Faculty: Donna Spiegelman, ScD

Download: %yoll package

Platform: SAS