Software
We have developed a variety of software that have been used for genetic, genomic and more generally, biomedical research. We have programs that are broadly distributed, have broad academic, research and commercial uses and perform reliably and precisely according to the computing demands of end-users. Additionally, we have programs that need more work and testing and others that are fully developed. Below are the programs th at we have developed with a brief description of each:
AAOT — Association Analysis of an Ordinal Trait
This is an SAS program that converts an ordinal trait to a quantitative trait, and prepares a data set that can be fed into FBAT for association analysis.
ABESS - A polynomial time algorithm for best-subset selection problem
ABESS is the R-package that implements a polynomial time algorithm to identify the best-subset model in linear regression.
CTMBR — Classfication Trees for Multiple Binary Responses
This is a program to construct classification trees for multiple binary responses. In Biomedical Research, many diagnoses are based on multiple items such as depression and anxiety. This program makes it possible to conduct analysis at the item level.
DIPM - Depth Importance in Precision Medicine
The Depth Importance in Precision Medicine (DIPM) method is a classification tree designed for the identification of subgroups relevant to the precision medicine setting.
diTARV – Tree-based Analysis of Rare Variants with Depth Importance
This program is a tree-based method with the use of depth-importance measures to explore the association between rare variants and human diseases and find potential gene-gene interactions.
eLASSO — Robust Variable Selection Using Exponential Squared Loss
A number of Matlab codes are provided for the implementation of eLASSO.
HapForest — Forest for Detecting Haplotypes and Interactions among Them in Association with a Disease
This program implements a forest-based approach to accommodate the haplotype uncertainties and variable importance to sort out significant haplotypes and their interactions in genomewide case-control association studies.
LOT - Linkage Analysis of Ordinal Traits
This program performs linkage analysis of ordinal traits for pedigree data. It implements a latent-variable proportional-odds logistic model that relates inheritance patterns to the distribution of the ordinal trait.
MASAL — Multivariate Adaptive Splines for Analysis of Longitudinal Data
The standalone program takes a data structure similar to that of "CTMBR", except that there is a time variable "t". We also have an R package.
modSaRa — Modified Screening and Ranking Algorithm to Detect Chromosome Copy Number Variations
This is a program using modified screening and ranking algorithm to detect chromosome copy number variations. It is an optimal and accurate approach solving practical issues regarding CNV detection.
modSaRa2 — Modified Screening and Ranking Algorithm to Detect Chromosome Copy Number Variations Version 2
modSaRa2 is a novel improvement of our previously developed method modified Screening and Ranking algorithm (modSaRa) by integrating the relative allelic intensity with prior empirical statistics.
multiSaRa — A Screening and Ranking Algorithm to Detect Chromosome Copy Number Variations in Multiple Sequences
This is a program that enhances the screening and ranking algorithm to detect chromosome copy number variations in multiple sequences.
pLASSO — prior LASSO
We provide two R functions to run pLASSO.
Pregnancy Calculator
RTREE — Classification Trees for Risk Profile and Diagnosis
Program that analyzes relative risk and conducts sib pair linkage analysis using tree-based methods. This program can be executed to automatically generate a tree structure or allow the user to construct a tree of his or her choice.
SaRa — Screening and Ranking Algorithm to Detect Chromosome Copy Number Variations
This is a program to detect chromosome copy number variations. It is fast and possesses optimal theoretical properties.
simuRare — Simulating Realistic Genomic Data with Rare Variants
simuRare a regression-based algorithm that imputes rare variants in currently available SNP array data, and performs a resampling approachto simulate samples that contain both common and rare SNPs.
SSSS — A Super Scalable Short Segment Detection Algorithm
STB-STC — Super-Taxon Approach for Human Microbes Association Studies
STB-STC is a statistical method that can identify joint effects of microbes in human disease considering the sparsity issue and utilizing the hierarchical information of taxonomy annotation. STB and STC yield better detection performance in situations where microbes are highly correlated compared to state-of-the-art differential abundance analysis approaches. We distribute one core R function (SVB-SVC.R) related to STB and STC. It performs the method (STB or STC) on a group of microbes. The script utilizes.R includes all necessary codes to support the running of SVB-SVC.R
STREE — Survival Analysis Trees
Represents one of the most popular uses of tree-based methods. This program identifies prognostic factors that are predictive of survival outcome and time to an event of interest. It partitions a study sample into strata to reveal distinct patt erns of survival among subgroups.
TARV — Tree-based Analysis of Rare Variants
TARV is a tree-based method to explore the association between rare variants and complex diseases, and find potential genetic and environmental factors and their interactions.
Twin Analysis — Twin Analysis Using SAS
This program uses SAS PROC NLMIXED and PROC MIXED to conduct twin analysis to estimate heritability of binary and quantitative traits.
Willows
This is a software package that includes three classifiers: classification tree, random forest, and deterministic forest. This package is built on RTREE mainly to implement the most efficient memory use for SNP