Heping Zhang, PhD
ABESS
Copy Link
Best-subset selection aims to find a small subset of predictors, so that the resulting linear model is expected to have the most desirable prediction accuracy. ABESS is the R-package that implements a polynomial time algorithm to identify the best-subset model in linear regression.
Faculty: Heping Zhang, PhD
Download: ABESS package
Platform: R
Reference: doi.org (ABESS)
CTMBR
Copy Link
This is a program to construct classification trees for multiple binary responses. In Biomedical Research, many diagnoses are based on multiple items such as depression and anxiety. This program makes it possible to conduct analysis at the item level.
Faculty: Heping Zhang, PhD
Download: CTMBR package
Platform: Unix
Reference: doi.org (CTMBR)
DIPM
Copy Link
An implementation by Chen, Li, and Zhang (2022) <doi:10.1093/bioadv/vbac041> of the Depth Importance in Precision Medicine (DIPM) method in Chen and Zhang (2022) <doi:10.1093/biostatistics/kxaa021> and Chen and Zhang (2020) <doi:10.1007/978-3-030-46161-4_16>. The DIPM method is a classification tree that searches for subgroups with especially poor or strong performance in a given treatment group.
Faculty: Heping Zhang, PhD
Download: CranR / DIPM package
Platform: R
Reference: doi.org (DIPM)
diTARV
Copy Link
diTARV is a tree-based method to explore the association between rare variants and certain human diseases, and find potential gene-gene interactions. It considers depth importance in the tree model to measure the strength of association of each variant. This program implements the method described in Hu J., Li T., Wang S., and Zhang H. Supervariants Identification for Breast Cancer, Genetic Epidemiology 44(8), 9324-947, 2020.
Faculty: Heping Zhang, PhD
Download: diTARV package
Platform: R
Reference: doi.org (diTARV)
eLASSO
Copy Link
This is a Matlab implementation of eLASSO. This zipped file contains 14 M-files, 7 of them are related to the optimization problem, which are translated from the block coordinate gradient descent(BGCD) method proposed by Paul Tseng and Sangwoon Yun. They are cgdsq.m, dirq.m, nz.m, signx.m, fnc.m, and grad.m. This program implements the method described in: Wang X., Jiang Y., Huang M., and Zhang H. Robust Variable Selection with Exponential Squared Loss. Journal of the American Statistical Association, 108: 632-643, 2013.
Faculty: Heping Zhang, PhD
Download: eLASSO package
Platform: Matlab
Reference: doi.org (eLASSO)
HapForest
Copy Link
This program implements a forest-based approach to accommodate the haplotype uncertainties and variable importance to sort out significant haplotypes and their interactions in genomewide case-control association studies.
Faculty: Heping Zhang, PhD
Download: HapForest package
Platform: Java
Reference: doi.org (HapForest)
LOT
Copy Link
This program performs linkage analysis of ordinal traits for pedigree data. It implements a latent-variable proportional-odds logistic model that relates inheritance patterns to the distribution of the ordinal trait.
Faculty: Heping Zhang, PhD
Download: LOT package
Platform: Java
Reference: doi.org (LOT)
MASAL
Copy Link
Multivariate Adaptive Splines for Analysis of Longitudinal Data. The standalone program takes a data structure similar to that of "CTMBR", except that there is a time variable "t". We also have an R package.
Faculty: Heping Zhang, PhD
Download: MASAL package
Platform: Unix; R
Reference: doi.org (MASAL)
modSaRa
Copy Link
The modified Screening and Ranking algorithm (modSaRa) can detect chromosome copy number variants with high sensitivity and specificity. For a sequence of intensity values, the modified SaRa will process it by quantile normalization, search for change-point candidates, eliminate unlikely change-points, and then output the potential CNV segments by presenting the start point and end point by SNP or CNV marker index.
Faculty: Heping Zhang, PhD
Download: modSaRa package
Platform: R
Reference: doi.org (modSaRa)
modSaRa2
Copy Link
Although it has been shown that the widely used change-point based methods can increase statistical power to identify variants, it remains challenging to effectively identify CNVs with weak signals due to the noisy nature of genotyping intensity data. modSaRa2 is a novel improvement of our previously developed method modified Screening and Ranking algorithm (modSaRa) by integrating the relative allelic intensity with prior empirical statistics. modSaRa2 markedly improved both sensitivity and specificity over existing methods. The improvement for detecting weak CNV signals is the most substantial, while simultaneously improving stability when CNV size varies.
Faculty: Heping Zhang, PhD
Download: modSaRa2 package
Platform: R
Reference: doi.org (modSaRa2)
pLASSO
Copy Link
pLASSO is a statistical method which incorporates prior information into the L1 penalized generalized linear models. We distribute here two R functions (function_linear.R and function_logistic.R) related to pLASSO. These two functions are for linear regression and logistic regression, respectively. Both functions can find all six estimators compared in Jiang, He, and Zhang (2014), i.e., LASSO, p, pLASSO; LASSO-A, p-A, pLASSO-A. The functions use cross validation to select the optimal tuning parameters. See the following paper for more details. Jiang Y, He Y, and Zhang H. (2014). Variable selection with prior information for generalized linear models via the pLASSO method.
Faculty: Heping Zhang, PhD
Download: pLASSO package
Platform: R
Reference: doi.org (pLASSO)
Pregnancy Calculator
Copy Link
This website provides online calculators for predicting certain pregnancy outcomes, particularly live birth rate. You are suggested to cite the references based on which these calculators are implemented when appropriate. This program is copyrighted by Heping Zhang, Yale University. Thanks to Jiuzhou Wang and Yajie Duan from SUSTech in China and Zhe Cai from Shanghai University for implementing the program. By using this program, you understand and agree that you are fully responsible for the use. Suggestions for improvement can be emailed to heping.zhang@yale.edu. You can select and proceed with a choice below if you accept this term for the usage.
Faculty: Heping Zhang, PhD
Website: Pregnancy Calculator
Platform: Website
RTREE
Copy Link
Program that analyzes relative risk and conducts sib pair linkage analysis using tree-based methods. This program can be executed to automatically generate a tree structure or allow the user to construct a tree of his or her choice.
Faculty: Heping Zhang, PhD
Download: RTREE package
Platform: Unix
Reference: doi.org (RTREE)
SaRa
Copy Link
The Screening and Ranking algorithm can detect chromosome copy number variants fastly and accurately with computational complexity in the order of O(n). This program implements the methods described in: Niu and Zhang. The screening and ranking algorithm to detect DNA copy number variations. Ann. Appl. Stat. 6,1306-1326, (2012). Hao, Niu and Zhang. Multiple change-point detection via a screening and ranking algorithm. Statistica Sinica 23 (2013).
Faculty: Heping Zhang, PhD
Download: SaRa package
Platform: R
Reference: doi.org (SaRa)
simuRare
Copy Link
simuRare a regression-based algorithm that imputes rare variants in currently available SNP array data, and performs a resampling approach to simulate samples that contain both common and rare SNPs.
Faculty: Heping Zhang, PhD
Download: simuRare package
Platform: R
Reference: doi.org (simuRare)
SSSS
Copy Link
SSSS
This package provides a fast nonparametric method for short segment detection.
Faculty: Heping Zhang, PhD
Download: SSSS package
Platform: R
STB-STC
Copy Link
STB-STC is a statistical method that can identify joint effects of microbes in human disease considering the sparsity issue and utilizing the hierarchical information of taxonomy annotation. STB and STC yield better detection performance in situations where microbes are highly correlated compared to state-of-the-art differential abundance analysis approaches. We distribute one core R function (SVB-SVC.R) related to STB and STC. It performs the method (STB or STC) on a group of microbes. The script utilizes.R includes all necessary codes to support the running of SVB-SVC.R
Faculty: Heping Zhang, PhD
Download: STB-STC package
Platform: R
STREE
Copy Link
Represents one of the most popular uses of tree-based methods. This program identifies prognostic factors that are predictive of survival outcome and time to an event of interest. It partitions a study sample into strata to reveal distinct patt erns of survival among subgroups.
Faculty: Heping Zhang, PhD
Download: STREE package
Platform: Unix
TARV
Copy Link
TARV is a tree-based method to explore the association between rare variants and complex diseases, and find potential genetic and environmental factors and their interactions.
Faculty: Heping Zhang, PhD
Download: TARV package
Platform: R
Reference: doi.org (TARV)
Twin Analysis
Copy Link
This program uses SAS PROC NLMIXED and PROC MIXED to conduct twin analysis to estimate heritability of binary and quantitative traits.
Faculty: Heping Zhang, PhD
Download: Twin Analysis package
Platform: SAS
Reference: doi.org (Twin Analysis)
Willows
Copy Link
Willows is a software package that includes three classifiers: classification tree, random forest, and deterministic forest. This package is built on the basis of Heping Zhang's RTREE program with two distinctive features. First, the cumulation of data on single nucletide polymorphisms (SNPs) has created data so huge that we have to take specific steps to improve the memory use of the existing software. Willows implements the most efficient memory use for SNP data, while maintaining its general functionality. The second important feature of Willows is a friendly graphical user interface.
Faculty: Heping Zhang, PhD
Download: Willows package
Platform: Unix
Reference: doi.org (Willows)
Ball
Copy Link
Hypothesis tests and sure independence screening (SIS) procedure based on ball statistics, including ball divergence <doi:10.1214/17-AOS1579>, ball covariance <doi:10.1080/01621459.2018.1543600>, and ball correlation <doi:10.1080/01621459.2018.1462709>, are developed to analyze complex data in metric spaces, e.g, shape, directional, compositional and symmetric positive definite matrix data. The ball divergence and ball covariance based distribution-free tests are implemented to detecting distribution difference and association in metric spaces <doi:10.18637/jss.v097.i06>. Furthermore, several generic non-parametric feature selection procedures based on ball correlation, BCor-SIS and all of its variants, are implemented to tackle the challenge in the context of ultra high dimensional data. A fast implementation for large-scale multiple K-sample testing with ball divergence <doi:10.1002/gepi.22423> is supported, which is particularly helpful for genome-wide association study.
Faculty: Heping Zhang, PhD
Download: CranR/ Ball package
Platform: R
Reference: doi.org (Ball)
cdcsis
Copy Link
Conditional distance correlation <doi:10.1080/01621459.2014.993081> is a novel conditional dependence measurement of two multivariate random variables given a confounding variable. This package provides conditional distance correlation, performs the conditional distance correlation sure independence screening procedure for ultrahigh dimensional data <;and conducts conditional distance covariance test for conditional independence assumption of two multivariate variable.
Faculty: Heping Zhang, PhD
Download: CranR / cdcsis package
Platform: R
Reference: doi.org (cdcsis)
sure
Copy Link
An implementation of the surrogate approach to residuals and diagnostics for ordinal and general regression models; for details, see Liu and Zhang (2017) <doi:10.1080/01621459.2017.1292915>. These residuals can be used to construct standard residual plots for model diagnostics (e.g., residual-vs-fitted value plots, residual-vs-covariate plots, Q-Q plots, etc.). The package also provides an 'autoplot' function for producing standard diagnostic plots using 'ggplot2' graphics. The package currently supports cumulative link models from packages 'MASS', 'ordinal', 'rms', and 'VGAM'. Support for binary regression models using the standard 'glm' function is also available.
Faculty: Heping Zhang, PhD
Download: CranR / sure package
Platform: R
Reference: doi.org (sure)
BrainSubnetwork
Copy Link
Constructing Brain Subnetworks via a High-Dimensional Multi-Task Learning Model with Group-wise Mixtures
Faculty: Heping Zhang, PhD
Download: GitHub / BrainSubnetwork package
Platform: R
Reference: doi.org (BrainSubnetwork)
BCRA
Copy Link
Understanding the genetic architecture of brain functions is essential to clarify the biological etiologies of behavioral and psychiatric disorders. Functional connectivity, representing pairwise correlations of neural activities between brain regions, is moderately heritable. Current methods to identify single nucleotide polymorphisms (SNPs) linked to functional connectivity either neglect the complex structure of functional connectivity or fail to control false discoveries. Therefore, we propose a SNP-set hypothesis test, Ball Covariance Ranking and Aggregation (BCRA), to select and test the significance of SNP sets related to functional connectivity, incorporating matrix structure and controlling false discovery rate. Additionally, we present subsample-BCRA, a faster version for large-scale datasets.
Faculty: Heping Zhang, PhD
Download: GitHub / BCRA package
Platform: R
Reference: doi.org (BCRA)
RAS
Copy Link
Genome-wide association studies (GWAS) are crucial for identifying numerous single nucleotide polymorphisms (SNPs) linked to various diseases. However, current methods struggle with regional associations due to small effects and the high number of variants, leading to suboptimal power and inflated type I error. To tackle these challenges, we propose a powerful and visualizable method which quantifies regional association strengths at individual SNPs, converts these into time series data, and uses change point detection algorithms to identify key association regions. Extensive simulations demonstrate that our method not only increases detection power but also maintains a significantly lower false positive rate compared to existing techniques, positioning it as a promising tool for regional association detection in GWAS.
Faculty: Heping Zhang, PhD; Yiran Jiang, PhD
Download: GitHub / RAS package
Platform: R
Reference: doi.org (RAS)