This is an updated version of CTMBR software submitted to JASA.
Here, I used the h(t) defined in (4) of the paper only.
Please visit http://peace.med.yale.edu for future updates.
@Copyright Heping Zhang 1/24/03
Introduction to the files in this directory:
ctmbr.*: These are the executable codes compiled on various systems as
indicated by the suffix. For example, ctmbr.solaris8 was compiled
on a SUN SPARC Ultra 5 with Solaris8.
Save the corresponding code as, say, ctmbr and simply type ctmbr
at your command line and follow the instruction.
sample.dat:This is a sample data file. The 3 numbers in the first row are:
the number of study subjects (i.e., sample size), the number of
covariates, and the number of binary responses. The second
row indicates the type of covariates. 0 means deleting that
covariate from the analysis; 1 means a continuous or an ordinal
covariate; 3 means a nominal covariate; -1 means the outcome.
infor.*: The output file. The file consists of three parts:
Part I can be read as follows:
Column 1: node number. Node 0 is the root node.
Column 2: number of subjects in the node.
Column 3: left daughter node, e.g. node 1 is the left daughter
node of node 0.
Column 4: right daughter node, e.g. node 2 is the right daughter
node of node 0.
Column 5: The splitting variables, number starting from 1.
Column 6: The splitting value corresponds to the splitting
variable.
For example, the split for node 0 is whether variable
14 > 3.0.
See the remark below for categorical variable.
Part II records the series of complexity parameters and the nodes
where pruning has occurred.
Part III saves the cross-validation results for selecting the
final trees.
After you execute ctmbr, you will be asked to enter the data file name
(e.g., sample.dat), whether there is any missing data (0 for no and 1 for
yes), and what is the missing value if there is (I used -9). Finally,
you will be asked to enter the fold of cross validation. If you do not
want to wait so long, enter 2. Now, be patient. The computation may
take quite a while because the tree has to be grown many times during
cross validation.
There are some minor numerical differences using the program on different
platform. So, different infor files are provided here.
Remark: For a categorical variable, if your level starts with 1, an artificial
0 is added for convenience. Missing value is assigned to be 1 plus the maximum
level. The levels printed on the output point observations to the right
daughter node.