Skip to Main Content

Heping Zhang, PhD

ABESS

Best-subset selection aims to find a small subset of predictors, so that the resulting linear model is expected to have the most desirable prediction accuracy. ABESS is the R-package that implements a polynomial time algorithm to identify the best-subset model in linear regression.

Faculty: Heping Zhang, PhD

Download: ABESS package

Platform: R

Reference: doi.org (ABESS)


CTMBR

This is a program to construct classification trees for multiple binary responses. In Biomedical Research, many diagnoses are based on multiple items such as depression and anxiety. This program makes it possible to conduct analysis at the item level.

Faculty: Heping Zhang, PhD

Download: CTMBR package

Platform: Unix

Reference: doi.org (CTMBR)


DIPM

An implementation by Chen, Li, and Zhang (2022) <doi:10.1093/bioadv/vbac041> of the Depth Importance in Precision Medicine (DIPM) method in Chen and Zhang (2022) <doi:10.1093/biostatistics/kxaa021> and Chen and Zhang (2020) <doi:10.1007/978-3-030-46161-4_16>. The DIPM method is a classification tree that searches for subgroups with especially poor or strong performance in a given treatment group.

Faculty: Heping Zhang, PhD

Download: CranR / DIPM package

Platform: R

Reference: doi.org (DIPM)


diTARV

diTARV is a tree-based method to explore the association between rare variants and certain human diseases, and find potential gene-gene interactions. It considers depth importance in the tree model to measure the strength of association of each variant. This program implements the method described in Hu J., Li T., Wang S., and Zhang H. Supervariants Identification for Breast Cancer, Genetic Epidemiology 44(8), 9324-947, 2020.

Faculty: Heping Zhang, PhD

Download: diTARV package

Platform: R

Reference: doi.org (diTARV)


eLASSO

This is a Matlab implementation of eLASSO. This zipped file contains 14 M-files, 7 of them are related to the optimization problem, which are translated from the block coordinate gradient descent(BGCD) method proposed by Paul Tseng and Sangwoon Yun. They are cgdsq.m, dirq.m, nz.m, signx.m, fnc.m, and grad.m. This program implements the method described in: Wang X., Jiang Y., Huang M., and Zhang H. Robust Variable Selection with Exponential Squared Loss. Journal of the American Statistical Association, 108: 632-643, 2013.

Faculty: Heping Zhang, PhD

Download: eLASSO package

Platform: Matlab

Reference: doi.org (eLASSO)


HapForest

This program implements a forest-based approach to accommodate the haplotype uncertainties and variable importance to sort out significant haplotypes and their interactions in genomewide case-control association studies.

Faculty: Heping Zhang, PhD

Download: HapForest package

Platform: Java

Reference: doi.org (HapForest)


LOT

This program performs linkage analysis of ordinal traits for pedigree data. It implements a latent-variable proportional-odds logistic model that relates inheritance patterns to the distribution of the ordinal trait.

Faculty: Heping Zhang, PhD

Download: LOT package

Platform: Java

Reference: doi.org (LOT)


MASAL

Multivariate Adaptive Splines for Analysis of Longitudinal Data. The standalone program takes a data structure similar to that of "CTMBR", except that there is a time variable "t". We also have an R package.

Faculty: Heping Zhang, PhD

Download: MASAL package

Platform: Unix; R

Reference: doi.org (MASAL)


modSaRa

The modified Screening and Ranking algorithm (modSaRa) can detect chromosome copy number variants with high sensitivity and specificity. For a sequence of intensity values, the modified SaRa will process it by quantile normalization, search for change-point candidates, eliminate unlikely change-points, and then output the potential CNV segments by presenting the start point and end point by SNP or CNV marker index.

Faculty: Heping Zhang, PhD

Download: modSaRa package

Platform: R

Reference: doi.org (modSaRa)


modSaRa2

Although it has been shown that the widely used change-point based methods can increase statistical power to identify variants, it remains challenging to effectively identify CNVs with weak signals due to the noisy nature of genotyping intensity data. modSaRa2 is a novel improvement of our previously developed method modified Screening and Ranking algorithm (modSaRa) by integrating the relative allelic intensity with prior empirical statistics. modSaRa2 markedly improved both sensitivity and specificity over existing methods. The improvement for detecting weak CNV signals is the most substantial, while simultaneously improving stability when CNV size varies.

Faculty: Heping Zhang, PhD

Download: modSaRa2 package

Platform: R

Reference: doi.org (modSaRa2)


pLASSO

pLASSO is a statistical method which incorporates prior information into the L1 penalized generalized linear models. We distribute here two R functions (function_linear.R and function_logistic.R) related to pLASSO. These two functions are for linear regression and logistic regression, respectively. Both functions can find all six estimators compared in Jiang, He, and Zhang (2014), i.e., LASSO, p, pLASSO; LASSO-A, p-A, pLASSO-A. The functions use cross validation to select the optimal tuning parameters. See the following paper for more details. Jiang Y, He Y, and Zhang H. (2014). Variable selection with prior information for generalized linear models via the pLASSO method.

Faculty: Heping Zhang, PhD

Download: pLASSO package

Platform: R

Reference: doi.org (pLASSO)


Pregnancy Calculator

This website provides online calculators for predicting certain pregnancy outcomes, particularly live birth rate. You are suggested to cite the references based on which these calculators are implemented when appropriate. This program is copyrighted by Heping Zhang, Yale University. Thanks to Jiuzhou Wang and Yajie Duan from SUSTech in China and Zhe Cai from Shanghai University for implementing the program. By using this program, you understand and agree that you are fully responsible for the use. Suggestions for improvement can be emailed to heping.zhang@yale.edu. You can select and proceed with a choice below if you accept this term for the usage.

Faculty: Heping Zhang, PhD

Website: Pregnancy Calculator

Platform: Website


RTREE

Program that analyzes relative risk and conducts sib pair linkage analysis using tree-based methods. This program can be executed to automatically generate a tree structure or allow the user to construct a tree of his or her choice.

Faculty: Heping Zhang, PhD

Download: RTREE package

Platform: Unix

Reference: doi.org (RTREE)


SaRa

The Screening and Ranking algorithm can detect chromosome copy number variants fastly and accurately with computational complexity in the order of O(n). This program implements the methods described in: Niu and Zhang. The screening and ranking algorithm to detect DNA copy number variations. Ann. Appl. Stat. 6,1306-1326, (2012). Hao, Niu and Zhang. Multiple change-point detection via a screening and ranking algorithm. Statistica Sinica 23 (2013).

Faculty: Heping Zhang, PhD

Download: SaRa package

Platform: R

Reference: doi.org (SaRa)


simuRare

simuRare a regression-based algorithm that imputes rare variants in currently available SNP array data, and performs a resampling approach to simulate samples that contain both common and rare SNPs.

Faculty: Heping Zhang, PhD

Download: simuRare package

Platform: R

Reference: doi.org (simuRare)


SSSS

SSSS

This package provides a fast nonparametric method for short segment detection.

Faculty: Heping Zhang, PhD

Download: SSSS package

Platform: R


STB-STC

STB-STC is a statistical method that can identify joint effects of microbes in human disease considering the sparsity issue and utilizing the hierarchical information of taxonomy annotation. STB and STC yield better detection performance in situations where microbes are highly correlated compared to state-of-the-art differential abundance analysis approaches. We distribute one core R function (SVB-SVC.R) related to STB and STC. It performs the method (STB or STC) on a group of microbes. The script utilizes.R includes all necessary codes to support the running of SVB-SVC.R

Faculty: Heping Zhang, PhD

Download: STB-STC package

Platform: R


STREE

Represents one of the most popular uses of tree-based methods. This program identifies prognostic factors that are predictive of survival outcome and time to an event of interest. It partitions a study sample into strata to reveal distinct patt erns of survival among subgroups.

Faculty: Heping Zhang, PhD

Download: STREE package

Platform: Unix


TARV

TARV is a tree-based method to explore the association between rare variants and complex diseases, and find potential genetic and environmental factors and their interactions.

Faculty: Heping Zhang, PhD

Download: TARV package

Platform: R

Reference: doi.org (TARV)


Twin Analysis

This program uses SAS PROC NLMIXED and PROC MIXED to conduct twin analysis to estimate heritability of binary and quantitative traits.

Faculty: Heping Zhang, PhD

Download: Twin Analysis package

Platform: SAS

Reference: doi.org (Twin Analysis)


Willows

Willows is a software package that includes three classifiers: classification tree, random forest, and deterministic forest. This package is built on the basis of Heping Zhang's RTREE program with two distinctive features. First, the cumulation of data on single nucletide polymorphisms (SNPs) has created data so huge that we have to take specific steps to improve the memory use of the existing software. Willows implements the most efficient memory use for SNP data, while maintaining its general functionality. The second important feature of Willows is a friendly graphical user interface.

Faculty: Heping Zhang, PhD

Download: Willows package

Platform: Unix

Reference: doi.org (Willows)


Ball

Hypothesis tests and sure independence screening (SIS) procedure based on ball statistics, including ball divergence <doi:10.1214/17-AOS1579>, ball covariance <doi:10.1080/01621459.2018.1543600>, and ball correlation <doi:10.1080/01621459.2018.1462709>, are developed to analyze complex data in metric spaces, e.g, shape, directional, compositional and symmetric positive definite matrix data. The ball divergence and ball covariance based distribution-free tests are implemented to detecting distribution difference and association in metric spaces <doi:10.18637/jss.v097.i06>. Furthermore, several generic non-parametric feature selection procedures based on ball correlation, BCor-SIS and all of its variants, are implemented to tackle the challenge in the context of ultra high dimensional data. A fast implementation for large-scale multiple K-sample testing with ball divergence <doi:10.1002/gepi.22423> is supported, which is particularly helpful for genome-wide association study.

Faculty: Heping Zhang, PhD

Download: CranR/ Ball package

Platform: R

Reference: doi.org (Ball)


cdcsis

Conditional distance correlation <doi:10.1080/01621459.2014.993081> is a novel conditional dependence measurement of two multivariate random variables given a confounding variable. This package provides conditional distance correlation, performs the conditional distance correlation sure independence screening procedure for ultrahigh dimensional data <;and conducts conditional distance covariance test for conditional independence assumption of two multivariate variable.

Faculty: Heping Zhang, PhD

Download: CranR / cdcsis package

Platform: R

Reference: doi.org (cdcsis)


sure

An implementation of the surrogate approach to residuals and diagnostics for ordinal and general regression models; for details, see Liu and Zhang (2017) <doi:10.1080/01621459.2017.1292915>. These residuals can be used to construct standard residual plots for model diagnostics (e.g., residual-vs-fitted value plots, residual-vs-covariate plots, Q-Q plots, etc.). The package also provides an 'autoplot' function for producing standard diagnostic plots using 'ggplot2' graphics. The package currently supports cumulative link models from packages 'MASS', 'ordinal', 'rms', and 'VGAM'. Support for binary regression models using the standard 'glm' function is also available.

Faculty: Heping Zhang, PhD

Download: CranR / sure package

Platform: R

Reference: doi.org (sure)


BrainSubnetwork

Constructing Brain Subnetworks via a High-Dimensional Multi-Task Learning Model with Group-wise Mixtures

Faculty: Heping Zhang, PhD

Download: GitHub / BrainSubnetwork package

Platform: R

Reference: doi.org (BrainSubnetwork)


BCRA

Understanding the genetic architecture of brain functions is essential to clarify the biological etiologies of behavioral and psychiatric disorders. Functional connectivity, representing pairwise correlations of neural activities between brain regions, is moderately heritable. Current methods to identify single nucleotide polymorphisms (SNPs) linked to functional connectivity either neglect the complex structure of functional connectivity or fail to control false discoveries. Therefore, we propose a SNP-set hypothesis test, Ball Covariance Ranking and Aggregation (BCRA), to select and test the significance of SNP sets related to functional connectivity, incorporating matrix structure and controlling false discovery rate. Additionally, we present subsample-BCRA, a faster version for large-scale datasets.

Faculty: Heping Zhang, PhD

Download: GitHub / BCRA package

Platform: R

Reference: doi.org (BCRA)


RAS

Genome-wide association studies (GWAS) are crucial for identifying numerous single nucleotide polymorphisms (SNPs) linked to various diseases. However, current methods struggle with regional associations due to small effects and the high number of variants, leading to suboptimal power and inflated type I error. To tackle these challenges, we propose a powerful and visualizable method which quantifies regional association strengths at individual SNPs, converts these into time series data, and uses change point detection algorithms to identify key association regions. Extensive simulations demonstrate that our method not only increases detection power but also maintains a significantly lower false positive rate compared to existing techniques, positioning it as a promising tool for regional association detection in GWAS.

Faculty: Heping Zhang, PhD; Yiran Jiang, PhD

Download: GitHub / RAS package

Platform: R

Reference: doi.org (RAS)