YSPH Biostatistics Seminar: “Marginal Structural Models for Causal Inference with Continuous-Time Treatment and Censored Survival Outcomes"

December 08, 2020

Information

Liangyuan Hu, PhD, Assistant Professor - Department of Population Health Science and Policy

Icahn School of Medicine at Mount Sinai

December 8, 2020

ID5980

To CiteDCA Citation Guide

00:00- We want to the last Biostatistics seminar
00:03for the fall series.
00:05It's my great pleasure to welcome our speaker,
00:07Dr. Liangyuan Hu.
00:09Dr. Hu is an Assistant Professor of Biostatistics
00:13in the Department of Population Health Sciences and Policy
00:16at Mount Sinai School of Medicine.
00:19She received her PhD in Biostatistics from Brown University.
00:23Her methods research focuses on causal inference
00:25with complex longitudinal and survival data
00:28and Bayesian machine learning.
00:30Her independent research has been funded by NIH
00:33and Patient Centered Outcomes Research Institute.
00:36And her paper in Biometrics has been selected to receive
00:39the 2019 Outstanding Statistical Application Award
00:45by the American Statistical Association.
00:48Today, she's going to share with us her recent work
00:50on developing a continuous time marginal structure of models
00:54for complex survival outcomes.
00:56Liangyuan, the floor is yours.
00:58- Well, thank you Li Fan.
01:00Thank you so much Fan for your introduction,
01:02for the invite also.
01:05Let me just share my slides full screen.
01:08I'm really excited to be here today
01:10to talk about some of the projects I've been working on
01:15in the causal inference field,
01:17namely, how do we use marginal structure models
01:21for more complex comparative effectiveness
01:26research questions involving continuous-time treatment
01:30and censored survival outcomes.
01:32So I'd like to first acknowledge my colleagues,
01:35especially Doctors Hogan and Daniels
01:37who had been instrumental to me
01:40during the time I was working on this project.
01:44And let me just shift to the bar a little if I can.
01:50Okay.
01:51So this is just for those who aren't very familiar
01:56with causal inference,
01:58and simple slide to introduce the concept.
02:01Some key concept.
02:02Suppose we are interested in estimating the causal effect
02:06of a binary treatment A on some outcome Y.
02:11Using the potential outcomes framework,
02:13we can define the average treatment effect
02:16as the difference between the mean
02:19of the two sets of potential outcomes.
02:22So, Y1 here is the potential outcome
02:26that would have been observed
02:27had everyone in the population received the treatment.
02:30Similarly, Y0 here is the potential outcome
02:33that would have been observed
02:34had no one in the population received the treatment.
02:38To estimate the causal effect,
02:39the gold standard is the randomized controlled file.
02:43So in an RCT, we would randomly allocate patients
02:48to receive either treatment or the control or placebo,
02:53the randomization would make the two groups of patients
02:57more or less very similar in terms of their characteristics.
03:01So in a sense that these two groups are exchangeable,
03:08so that an individual's potential outcome
03:11to either treatment or control
03:14would not depend on which treatment group
03:16this person was assigned to.
03:18But just depends on how the treatment works.
03:21And this way we can simply look at the difference
03:24and the mean of the observed outcome
03:30between the two treatment groups
03:31and just to estimate the causal effect.
03:34But in many, many situations,
03:36we cannot conduct an RCT
03:39and we have to rely on observational data
03:42to get the causal inference about treatment effects.
03:45So in these situations,
03:47the independence between the potential outcome
03:49and treatment assignment would no longer hold.
03:53Because there might be exists a confounder
03:56that is predictive of the outcome,
03:58such that the probability of receiving the treatment
04:01depends on the confounder.
04:03So for example, age might be such a confounder.
04:06For example, younger patients may be more likely
04:10to receive the treatment.
04:13So in this case,
04:13if you take the difference in the average
04:16of the observed outcome between the two groups,
04:20then this estimate would not bear a causal interpretation
04:24because the difference might be confounded by age.
04:29So we would have to use specialized
04:31causal inference techniques to remove the confounding.
04:35And there are just many, many techniques out there,
04:37but today I'm just gonna focus on marginal structure model,
04:41because it is simple to implement.
04:46It has good statistical properties,
04:49and it is versatile enough
04:51to accommodate many, many complications
04:54posed by observational data that I'll talk about later.
04:58So we can propose a marginal structure model
05:00relating the potential outcome to the treatment assignment.
05:04And here theta one would capture the causal effect.
05:08But in reality,
05:09we can only fit a model to the observer data.
05:12And as I talked earlier,
05:16the parameter estimator beta one here
05:19would not bear a causal interpretation,
05:23it just measures association.
05:26But we can get to causation
05:28if by solving the weighted estimating equation,
05:32using the weight as W inverse of conditional probability
05:40of treatment assignment given the measured covariance.
05:43And this works because the IP weighting
05:46or inverse probability weighting
05:47removes confounding by measured covariance X
05:51in the weighted pseudo-population.
05:54So that's just a simple example
05:57to illustrate the use of marginal structure model.
06:03And traditionally treatment assignment,
06:06treatment is assigned at baseline and it's time fixed.
06:10So it means that the treatment doesn't change over time,
06:14but with increased availability of healthcare data sets,
06:21there are increased demands for more refined
06:23causal inference methods to evaluate complex
06:28treatment regimens.
06:29So one example is that treatment initiation
06:34can actually depend on time, so it changes over time.
06:40In this case,
06:41it would just be impractical to conduct RCTs
06:44because there are just simply too many
06:46treatment initiation time points.
06:48So I'm going to use two motivating examples in this talk.
06:54The first example is about timing of treatment initiation
07:00for patients who present both HIV and TB, tuberculosis.
07:05For these patients,
07:07TB treatment will be initiated immediately
07:09after the diagnosis, but during the TB treatment,
07:13when is the optimal time to initiate the HIV treatment
07:17or ART, anti-retroviral therapy?
07:20That is a very important question to answer,
07:22because if you initiate the treatment too early,
07:25there might be drug interactions, drug toxicity,
07:29but if you delay the treatment too much,
07:31then there's also increased the mortality
07:33associated with AIDS.
07:36The second example is timing of HIV treatment
07:39for adolescents.
07:42The timing now is defined with respect
07:45to the evolving value of a biomarker CD4.
07:49And this is also an important question to answer
07:51because the WHO guideline is in the form of
07:57treat this person when the person's CD4 cell count
08:01drops below 350, for example,
08:04and for the population of adolescents
08:06currently there's no concrete evidence
08:10for supporting the optimal threshold.
08:15So to statistically formulate these two motivating examples,
08:20the first one,
08:21when is the best time to initiate a treatment?
08:25So this is actually a static treatment regimen
08:27with respect to time,
08:32and the initiation can occur on the continuous timescale.
08:35And second example is actually a dynamic treatment regimen.
08:39It's dynamic because it depends on the evolving history
08:45of treatment and a biomarker,
08:48but initiation can also occur on the continuous timescale.
08:54So marginal structure models are suitable
08:56for addressing a time dependent treatment,
08:59but in order to use the models,
09:02we have to overcome some statistical challenges.
09:05The first challenge is that we need
09:08to estimate the causal effect of the actual timing,
09:11not compare protocols defined by some specific intervals,
09:15which is a lot of existing studies did.
09:21And also a lot of RCT reported these kinds of results.
09:25Because as I said earlier,
09:27it's just impractical for RCTs
09:30to report continuous time causal effects.
09:34We would also need to address complications
09:36posed by observational data.
09:39This is something I'll talk about later.
09:41And also we are dealing with censored survival outcomes
09:45that adds another layer of complexity.
09:49So these are four sensory patterns observed in our data.
09:53So our goal is to estimate the causal effect of A,
09:57treatment initiation time and T, death time.
10:01And we have almost 5,000 patients
10:05and only a very small proportion of patients
10:07have both observed A and T.
10:10A lot of patients don't have observed T.
10:13So their death time is censored by C.
10:16And we have about 20% of our patients,
10:18they don't even have observed A.
10:20Their treatment initiation time
10:22can be censored by death time or censored by C, dropout,
10:27for example.
10:28So our goal is to estimate effect of A on T,
10:32but we only have about 300 patients
10:35have complete information.
10:36Most of the patients we have incomplete information
10:40on either A or T or both.
10:43How do we probably use these incomplete information
10:46to draw causal inference about A on T,
10:49the effect of A on T,
10:51that's a problem we solve in this project.
10:56So three challenges.
10:59First one, treatment initiation time,
11:02this is observational data, so it's not randomly allocated.
11:06We don't know the actual functional form of causal effect
11:10of initiation timing or mortality rate.
11:13And we see that, Oh, there's incomplete information
11:17on either exposure or outcome or both.
11:22The general solutions we proposed
11:26that we first formulate a flexible structural
11:29causal hazard model
11:31that can capture the effects of both timing and duration
11:36of the treatment.
11:37And then we can derive methods
11:39to consistently estimate the model parameters
11:44under non random allocation and complex censoring patterns.
11:48Using the model outputs we can estimate the functional form
11:53of the causal relationship between our initiation timing
11:56and mortality.
11:59So some notation before we introduce our approach,
12:02note that we have three time to events in our study,
12:06we have treatment initiation time, death time,
12:09censoring time.
12:10We'll use T sub cap A to denote death time
12:13associated with the actual treatment time.
12:17And potential outcomes T sub small A,
12:20this is the death time.
12:21If treatment initiated at time A,
12:24and we use T infinity to denote death time
12:28if treatment is initiated beyond sometime point
12:31of our interest.
12:33Because of all the censoring,
12:36all the three time to events can be censored by one another.
12:40We use T star to denote the minimum of T and C.
12:44Delta T is a corresponding event indicator.
12:47So A star is the minimum of the three time to events.
12:51Delta A is a corresponding event in the data.
12:54Adopting the convention in the causal inference literature,
12:58we use overbar to denote history.
13:00So overbar L of T here is a covariate history
13:06up to a time T.
13:08Putting everything together,
13:09we have a set of observed data.
13:12Now back to the censoring patterns.
13:15In case one, we observed both A and T.
13:18So we would observe A, we would observe T sub A.
13:23Case two T is censored by C,
13:25so we observe A, we just know TA
13:28is going to be greater than C.
13:29Case three,
13:31we will observe A,
13:32but we know A is greater than TA.
13:35And case four we don't observe A, we don't observe T
13:38but we know A is greater than C and TA is greater than C.
13:43Okay.
13:44So now we propose a structural causal
13:46proportional hazards model
13:49to capture the survival effect of treatment initiation time.
13:54Lambda AT here is a hazard function
13:56for the potential outcome T sub A,
13:59we start from lambda infinity T right here.
14:01This is a reference hazard for T infinity.
14:05So we start from here.
14:07Once the treatment is initiated at A,
14:10there is an instantaneous effect of treatment initiation
14:14captured by the G1 function here,
14:17and the effect of staying on the treatment
14:20at any given time point T,
14:22is captured by the G2 function of ART duration.
14:27And the G3 function here captures the interaction
14:30between treatment initiation and treatment duration.
14:35So we leave this structural model relatively flexible.
14:41First, the reference hazard is left unspecified
14:44and the 3G functions, we also left them
14:47as unspecified smooth function
14:49of treatment initiation time duration and their interaction.
14:54So now we can parametrize these three functions
14:58using natural cubic splines,
15:00and by rewriting the risk function of our structural model,
15:07we can use beta this parameter
15:09to include the causal effects of ART initiation time
15:13on mortality hazard.
15:15The problem here now,
15:17our goal is to how do we obtain a consistent
15:20estimate of beta using observed a data?
15:24Once we have obtained that
15:26we can use beta hat to estimate the 3G functions,
15:30to understand the relative contribution of timing
15:33versus duration and interactions.
15:37And we could also estimate the causal does-response
15:41of initiation time versus mortality
15:43by relating the survival function to the hazard function.
15:46We can derive this from our structural model.
15:52And now we can also estimate the model-based
15:55optimal initiation time
15:57that will lead to the maximal survival probability
16:01at say 52 weeks after diagnosis.
16:07Okay, how to obtain a consistent estimate of beta.
16:10So first let's assume if A is randomly allocated
16:15and both A and T are observed,
16:17then we can write the partial likelihood score function
16:22of our structural model.
16:24And this is a sample average of score function
16:28is an unbiased estimator of the expectation
16:32of the score function.
16:33So E sub R here is the expectation
16:37under the randomized treatment assignment.
16:40So this would be an unbiased estimator function,
16:47and solving this unbiased estimating equation
16:50would give us a consistent estimator of beta.
16:55Now, if A is still randomly allocated,
16:58but T can occur before A,
17:02so A may be censored by T.
17:05In this case,
17:06we would need to break the mean
17:08of an individual score contribution into two parts.
17:11In one part A is observed.
17:13The second part is A is not observed.
17:16And then we can apply the law of total expectation
17:19to the second part.
17:21The inner expectation would be conditioning
17:24on the observed information.
17:26Then using this strategy and taking in account
17:30the survival hazard structure,
17:32we can revise the estimating equation.
17:37And by solving this to obtain a consistent estimate of beta.
17:42In the case of non random allocation of treatment,
17:46then if we want to estimate the causal effect of A on T,
17:50then we would have to make a key assumption,
17:56ignore ability assumption.
17:57Essentially the assumption says
17:59that the initiation of treatment at any given time T
18:04is sequentially randomized in the sense
18:06that as a potential outcome beyond this time
18:09is independent of treatment initiation.
18:12Conditioning on all covariate history up to T.
18:16So with this assumption,
18:19we will be able to use observed data
18:21to derive the causal effect.
18:23So say PR is the data distribution under randomized A,
18:27and PO is the data distribution.
18:30And they're not random allocation of A.
18:33Note that in both settings,
18:35there is a same set of observed data.
18:39And as long as the observed data under PR
18:43is absolutely continues with the observed data under PO.
18:48Now we can derive a random-nikodym derivative.
18:52And so Murphy's 2001 paper
18:55developed a version of R-N derivative
18:58that connects the distribution of the observed data
19:01under PR and under PO for discrete time
19:05and ordinary GEE score.
19:07Johnson's 2005 paper extended this version of R-N derivative
19:12to continuous time still for ordinary GEE score.
19:15In this paper we extended the R-N derivative
19:20for time to event setting.
19:23So this is a version of R-N derivative
19:26for survival data.
19:28The reason why we wanted to use R-N derivative
19:32is that we can then use it
19:34to derive an unbiased estimating equation
19:37using some weighted version of the observed data.
19:41So we can estimate the causal effect.
19:43So now we want to use this R-N derivative for survival data.
19:49We want to apply that to Cox score
19:51and to derive S rated estimating equation.
19:55That's a little bit more complex than the GEE score,
20:00but we can observe that the Cox score
20:02can essentially be represented in three averages.
20:06The one in blue,
20:08the one in orange and the whole average.
20:13And each average converges to its expectation.
20:17And as I showed earlier,
20:19we can always break the expectation into two parts.
20:23In one part A is observed,
20:25second part is not observed.
20:27For the second part,
20:28we can apply the total law of expectation,
20:32the law of total expectation,
20:35and recognizing the survival structure
20:40to derive the second part.
20:43And then we can apply the R-N derivative for survival data
20:46to each piece separately,
20:48to construct the unbiased score equation.
20:53So after some derivation, we would arrive at the weights
20:59and actually the weights come down in a very neat form.
21:03Essentially, it suggests that for patients
21:06who have initiated treatment by time T,
21:10we would weight them by the marginals density function
21:13of A divided by the conditional density of A
21:18given their covariate history after time T.
21:23And for those who are censored,
21:25so not initiated by the time T,
21:28we would weight them by some survival function
21:31of the treatment initiation process.
21:36And then by applying this weighting scheme,
21:39we will be able to derive a weighted estimating equation.
21:43And just a note that we have to apply
21:46the same weighting scheme to the people
21:49who are still in the risk set at any time T.
21:54And so now that said, previously we have assumed
21:58there's no censoring.
21:59Now with censoring,
22:00we need to assume another similar assumption,
22:06similar to the ignore ability assumption,
22:09and then using the similar strategy
22:12to derive another set of weight for censoring.
22:16For those who stay, remain in the study,
22:18we would weight them by the survival function
22:23for censoring.
22:24And this would lead to the final modification
22:27of the estimating equation for beta.
22:29So censoring contributes information about the parameter
22:33in two ways,
22:35FC is observed as the person is actually censored.
22:40It contributes to the risk set up to C.
22:42If C is not observed, so C could be censored by T.
22:46If death's occurred,
22:47then it contributes to the individual partial likelihood
22:50to weight for C but evaluated at death time.
22:55Okay, now we know how to weight.
22:56Back to the four censoring patterns.
22:59The first one, both A and T are observed.
23:02We would weight them by the first set of weight for A
23:06evaluated at A,
23:08T occurred, so the weight for C but evaluated at T.
23:13Second case, T is not observed,
23:17A is observed.
23:19So first set of weight for A evaluated at A
23:23and C just contributes information to the risks set.
23:28Third case, A is not observed,
23:31so second weight for A evaluated at T.
23:35And weight for C, censoring evaluated at T.
23:40The fourth case or final case, A is not observed,
23:44again, second set of weight for A,
23:47but evaluated at C, and C also contributes to the risks set.
23:51Okay, so now we know how to weight.
23:54We would have to estimate the weights.
24:00The approach we used in the paper
24:02is that we model the intensity processes
24:06associated with the two counting processes,
24:09one for A, one for C.
24:12And then when we fit Cox proportional hazards models
24:15for the two intensity processes,
24:17we use fitted hazard to estimate the weights.
24:21We use empirical cumulative hazards
24:24to estimate the conditional density and function.
24:27And for the marginal density function,
24:29we use some nonparametric Nelson-Aalen estimator,
24:32and use similar fashion to estimate rates for censoring.
24:36Then we apply our methods to the AMPATH data.
24:39AMPATH is a large HIV care program based in West Kenya,
24:44our data has almost 5,000 patients
24:47and for covariates, we have demographic information
24:51and some disease-specific information.
24:54Some of them are time varying like, weight, the CD4,
24:56these are time varying variables.
24:59We categorize the baseline CD4 subgroups into two groups,
25:06the less than, or below 50 group,
25:08this is the highest risk group.
25:11So CD4 the higher, the better.
25:14So below 50, this is a highest risk group.
25:16And between 200 and 350,
25:19there's relatively healthy patients.
25:21The reason we categorize them into three groups
25:23is because the program guidelines
25:26are based on these subgroups
25:28and RCT is reported results for below 50 group.
25:33We want to compare our results to our CT findings.
25:37So this plot shows the three estimated G functions.
25:42The G1 A here suggests that the instantaneous effect
25:47of a treatment initiation has a U shape,
25:50achieving maximum benefit, or the lowest mortality hazard
25:53at just about 10 weeks.
25:56And after that, the longer the treatment is delayed,
26:00the less the benefit of the treatment initiation.
26:03And this is the effect of duration,
26:06in general, it says that the longer
26:08you stay on the treatment, the more benefit you get.
26:11There's an upward trend for the interaction effect.
26:15Essentially suggesting that delayed treatment initiation
26:19would reduce the benefit associated
26:22with long ART duration.
26:27And so the net causal effect of treatment initiation
26:31is summarized in this plot.
26:33Top panel shows the mortality rate at one year
26:38versus treatment initiation time.
26:40Bottom panel compares immediate initiation
26:44versus delayed initiation at A.
26:48So we can see that the benefit of early initiation
26:53is most pronounced for the CD4 below 50 group,
26:57or the highest risk group.
26:58And the curves here are pretty flat,
27:01suggesting that there's not much benefit
27:03of early initiation for relatively healthy patients.
27:09Several advantages for this approach.
27:12It's easy to get optimal initiation time
27:17based on the model outputs.
27:20And we could also use the model outputs
27:22to emulate comparisons between regimens reported in RCTs.
27:27So we could mimic random allocation
27:32of treatment initiation time to specific intervals
27:36by assuming a distribution for A,
27:39for treatment initiation time A,
27:41that is independent of covariates and outcome
27:44and compare interval specific mortality rates
27:49and draw inferences about treatment initiation.
27:53But with the continuous time marginal structure model,
27:56we'll also be able to conduct a higher resolution analysis
28:00that can potentially generate new insights
28:03in relation to a randomized control trial.
28:09For the sake of timing,
28:10I just gonna briefly talk about the simulation.
28:14We conduct simulation to examine
28:15the finite-sample properties of weighted estimators,
28:24we evaluate sensitivity of our estimators
28:27to the violations of the ignore ability,
28:30or no unmeasured confounding assumption,
28:32but we only considered confounding at baseline.
28:35So the sensitivity analysis strategy
28:39for time-varying confounding,
28:41especially with the censored survival outcome
28:44is kind of very complex topic,
28:48and we were still working on this project right now,
28:51but in this paper we just consider confounding at baseline.
28:56Under random allocation of treatment,
28:59our estimator produced a new zero bias
29:02and nominal coverage probability,
29:05in the presence of measured confounding,
29:07it eliminated nearly all the biases
29:09and provided close to nominal coverage probability,
29:13but in the presence of unmeasured confounding,
29:16there was bias in our estimator.
29:19And the biases were in proportion
29:22to the degree of measured confounding.
29:26Okay,
29:27so moving to the second example,
29:29this is a continuous time dynamic treatment regimen
29:34of the form,
29:34initiate treatment when a biomarker crosses a threshold.
29:40It's dynamic treatment regimen
29:42because it depends on evolving history of treatment
29:45and a tailoring variable.
29:47So in our case, CD4 is a tailoring variable.
29:50That means we make our treatment decision
29:53based on this variable.
29:55A little bit different from our previous motivating example.
30:00The outcome interest is different.
30:04This is a pediatric data.
30:05So for the kids, the mortality rate is very low
30:09and our data I think it's around 3%.
30:12And for kids, we're also interested
30:14in their CD4 measurements,
30:17because CD4 is important marker of immune system function
30:21and both outcomes, both mortality rate and CD4
30:24are sparsely measured in our data,
30:27but we are interested in both.
30:30Other than that, we also have complications
30:33posed by observational data.
30:36So this is a picture of nine randomly selected individuals
30:41from our data,
30:43X axis here, follow-up time in days,
30:46Y axis here square root of CD4,
30:49purple line is end of follow-up,
30:53two gray lines here mark one year
30:57and two years post diagnosis.
31:00Empty circles here mean that the patient
31:04has not been treated.
31:06Solid circles, mean that they're on the treatment.
31:09So we can see that there's a lot of variability
31:12in terms of the treatment initiation time.
31:16And some people are followed much longer
31:20than some other patients.
31:22And the follow-up time is pretty irregularly spaced
31:29and overall the CD4 measurements are quite sparse,
31:34and there's also incomplete information
31:36for example, these two they either died
31:41or were lost to follow up
31:44before they even got a chance to be treated.
31:47So there's also a lot of complication in the data.
31:51There's a continuous time measurement
31:54of the treatment initiation.
31:55It just happens all over the place.
31:58The longitudinal outcome of interest are sparsely measured,
32:03leading to incomplete data.
32:05There's also a censoring due to dropout or deaths.
32:09So our general solution is that we'll use weighting
32:11to handle time-varying confounding.
32:14And will show how to derive a continuous time versions
32:17of the weights.
32:19For the missing outcomes
32:21that is caused by sparse measurement and censoring
32:24we'll use imputations from a model of the joint distribution
32:28of CD4 and mortality.
32:30And because we're interested in both mortality status
32:33and CD4, we'll develop a composite outcome.
32:38So our general approach is to emulate a randomized trial
32:42in which we would randomize individuals
32:45to follow specific DTR Q.
32:48And Q equals zero means never treated,
32:51because CD4 can never drop below zero.
32:54Now, Q equals infinity means treat immediately.
32:58So after randomization,
32:59all the individuals will be followed
33:02for a fixed amount of time,
33:05at which point, say T star,
33:07both their mortality status.
33:10And among those who are alive at T star,
33:14their CD4 count will be assessed.
33:18So what define a composite outcome XQ,
33:21that is the product of the test indicator
33:25and the potential CD4.
33:28So the cumulative distribution of this composite outcome
33:32is a useful measure of treatment utility,
33:36because it has appointments at zero
33:39corresponding to mortality rate.
33:41Thereby capturing both mortality status
33:45and CD4 count among survivors at T star.
33:52So for example, the probability of a positive XQ,
33:56that's the survival fraction,
33:58and the probability of XQ greater than X,
34:02that's the fraction of survivors with CD4 above X.
34:08Okay, so similar to the first motivating example,
34:12we again have three timed events.
34:16Death time, censoring time, treatment initiation time.
34:19And now we have a tailoring variable, CD4 count.
34:23So the CD four process is defined for all continuous time,
34:27but it's just measured at discrete times.
34:30And we also have a P by one covariate process.
34:35Using a convention in the DTR literature,
34:38we assume that the treatment decision
34:40is always made after observing the covariate history
34:45and the CD4 count.
34:48Putting everything together,
34:50we have a history information indicator.
34:55For each individual, we'll have a observed a data process.
35:00And just note that each person
35:02can have a different lens of followup
35:04at different time points.
35:08Our goal is to evaluate the effect of DTRs,
35:12but we're dealing with observational data,
35:14so we'll have to map the observed treatment regimen
35:18to specific DTRs that we are interested in evaluating.
35:22Essentially we'll follow the deterministic function
35:28to create the mapping.
35:30Essentially there are three rules.
35:32First rule says not to treat the person
35:34if the person has not yet initiated treatment
35:38and their CD4 has not fallen below Q,
35:41or has not been observed.
35:44Second rule says, treat this person if their time T,
35:47CD4 has fallen below Q for the very first time.
35:51Once treated, always treat them.
35:54Following these three rules,
35:56we'll be able to create a regimen specific compliant process
36:00for each individual in the data.
36:02So essentially if the rule says treat,
36:05and if the person is actually treated by the time T,
36:09then this person is compliant at time T.
36:13If the rule says do not treat,
36:14and the person was not treated at the time T,
36:17so this person is still compliant to the rule.
36:20And so we'll be able to observe a compliant process
36:24for each person.
36:25Here a simple example to show you how to create the mapping.
36:29For example, we're interested in Q equals 350.
36:33This person came in at baseline,
36:35had a measurement 400 above the threshold.
36:39The rule says do not treat,
36:40the person was not treated.
36:41At this point, it's compliant with the rule.
36:44Next visit, no new CD4 observation.
36:48So the rule says do not treat,
36:50the person's still not treated,
36:52still compliant at this point.
36:53Third visit, the person's CD4 drops to 330,
36:58which is below the threshold for the very first time,
37:01the rules are start treating this person,
37:05the person was actually treated.
37:06So compliant at this point.
37:10Next visit the rule says once treated always treat them,
37:14the person kept being treated.
37:16So this person was compliant with the rule 350
37:20all throughout his or her followup.
37:23Next example, the first two rows are the same.
37:27The third visit, the person's CD4 jumps to 450,
37:33which is above the threshold.
37:35The rule says do not treat,
37:37but on the contrary, the person was actually treated
37:40and kept being treated.
37:42So from this time point onward,
37:46the person was not compliant with this rule.
37:51Okay, so that's just some simple example
37:55to show how to create the mapping.
37:58With missing outcomes for those alive
38:01at the target measurement time T star,
38:05the observed outcome XI is the CD4 measurement at T star.
38:11But because of CD4 is sparsely measured
38:14and irregularly spaced,
38:17Z of T star is directly observed
38:19only when the person's followup time is exactly at T star.
38:25So in this case, it is pretty common
38:27to predefine a interval and capture the CD4 that is measured
38:37at the time closest to the target measurement time.
38:41But even using this strategy,
38:43there's still a possibility that there is no measurement
38:48in predefined interval.
38:51Then we say this person has a missing outcome.
38:54And it's also possible that the person dropped out
38:57before TA.
38:59And so in this case, the outcome is also missing.
39:04For these missing outcomes, our general strategy
39:08is to use multiple imputation.
39:10So we would specify and fit model
39:12for the joint distribution of the CD4 process
39:16and the mortality process.
39:18For those known to be alive,
39:20but without a CD4 measurement,
39:23we would impute the CD4 count from the fitted CD4 sub-model.
39:28And for those missing the CD4,
39:31because of right censoring,
39:33we would calculate the mortality probability
39:37from the fitted survival sub-model,
39:39and then impute the death indicator
39:42from the Bernoulli distribution
39:44with this calculated probability.
39:46If the death indicator was imputed to be zero,
39:49then we further impute a CD4 count for this person.
39:52Otherwise we'll set X to be zero.
39:56And again, we would have to assume
39:59some standard causal inference assumptions
40:02in order to draw causal effects about the DTRQ
40:08using observational data.
40:09And we can estimate and compare DTRs along a continuum.
40:15We can formulate a causal model
40:17for the smooth effect of Q on the task quantile of XQ.
40:22This is our composite outcome with separate parameters
40:25capturing the effect of treat immediately,
40:29and the effect of never treat.
40:31And then we can parametrize the model using splines of Q
40:35for the third term here, to gain statistical efficiency.
40:41And we can obtain a consistent estimator of effect of Q
40:47by solving the weighted quantile regression
40:50estimating equation.
40:53So what should be the weights?
40:57First, we assume there's no dropout or death
40:59prior to the target measurement time.
41:02In the discrete time setting with common time point,
41:07the form of the weights have already been done.
41:10It has been derived in several papers.
41:14Essentially, the denominator of the weight
41:17is this conditional probability.
41:19It's a conditional probability of the person being compliant
41:22all throughout the follow up, given the covariate history.
41:29So if we have a common set of discrete time points,
41:34it's a cumulative a product of the conditional probability
41:38of this person being compliant at every time point.
41:42And essentially if the rule says treat,
41:46it's a condition of probability of the person
41:48actually being treated at this time point,
41:51if the rule says not treat,
41:53as a conditional probability of this person not treated
41:57by this time point.
41:58So in order to estimate this probability,
42:02we just need to model the observed
42:04treatment initiation process among those regimen compliers,
42:09but this is for discrete time setting.
42:12What would be the continuous time weights?
42:14We note that the occurrence of treatment initiation
42:18in a small time interval T and T plus TD
42:21is actually a Bernoulli trial with outcome DNA of T.
42:28So then we can rewrite this probability,
42:31this probability here,
42:33in the form of individual partial likelihood
42:37for the counting process of A.
42:40And now we note that when DT becomes smaller and smaller,
42:45this finite product approaches a product integral.
42:50So then this finite product can be rewritten
42:54as a final product over jump times of the counting process
42:58for A times the survival function.
43:02And then by recognizing that each individual
43:05had at most one jump at exactly AI.
43:10Now we can further reduce this probability to this form.
43:15Which suggests weighting scheme.
43:18Essentially it says for those who have been treated
43:23by a T star, we would weight them
43:25by the conditional density function of A.
43:28For those who haven't been treated by the time T star,
43:32we would weight them by the survival or function of A.
43:36So if you recall the weighting scheme
43:39for the first motivating example, this is exactly the same,
43:43the same rating scheme,
43:45but we took different approaches.
43:47The first example,
43:48we use a random Aalen derivatives
43:51to derive the weighting scheme.
43:52The second project we derive the limit
43:58of the finite product,
44:00but using different approaches,
44:02we arrive at the same weighting scheme.
44:06And so similarly we modeled the intensity process
44:10of treatment initiation.
44:12We estimate the weights.
44:15So if there was a censoring or death
44:18prior to target measurement time,
44:20we would have to assume once lost to follow up
44:23at a time prior to T star,
44:25the treatment and regimen status remain constant.
44:28And this way we will just estimate the weights
44:31up to a time point CI, and if the person died before T star,
44:37then we would only evaluate compliance
44:40and treatment initiation processes up to time TI.
44:45Okay, so for missing outcomes,
44:49we propose a joint modeling approach.
44:52We specify a two-level model for the observed CD4 process.
44:56The first level,
44:57the observed CD4 process is a true CD4 trajectory
45:02plus some arrow process.
45:04The second level,
45:05we relate the true CD4 trajectory
45:07to baseline characteristics and treatment initiation time,
45:13and some subject specific random effects,
45:16capturing subject-specific deviations
45:19from the mean trajectories.
45:22And now we propose a hazard model for deaths
45:26uses the true CD4 trajectory as a covariate
45:30linking the two processes,
45:33Linking the death process and linking with a CD4 process.
45:38Now we use the joint model
45:39to impute the missing outcomes
45:43and estimate the variance of the target estimator
45:46using Rubin's combination wall.
45:50So we applied this method to the IeDEA dataset.
45:54IeDEA is another HIV consortium based in West Kenya.
46:00So we have almost 2000 data.
46:03We see that the CD4 is pretty sparsely measured
46:07and death rate is low around three and 4%.
46:12Most of patients have been treated by one year
46:16and we have a set of covariates.
46:21Some of them are time varying, some of them are time fixed.
46:26We proposed three target estimators,
46:30so first we're interested in mortality proportion.
46:34We're also interested in the median of the distribution
46:38of the composite outcome XQ.
46:41We also looked at CD4 among survivors,
46:45but this estimator does not have a causal interpretation
46:49because it conditions on having survived two T star.
46:53So it only measures association,
46:55but the first two estimators have causal interpretations.
47:03So we first look at the effectiveness
47:05of five specific regimens for both one year and two years
47:10after diagnosis.
47:12We can see that the immediate treatment initiation
47:18lead to significant lower mortality rate
47:21and significantly higher median values
47:24of the composite alcohol compared to delayed treatment.
47:30And the never treat initiation
47:32will lead to a significantly higher mortality probability.
47:37And for those who do survive to T star,
47:41their CD4 count is higher.
47:44So resulting higher to theta Q2 and higher theta Q3
47:48compared to other delayed treatment regimen.
47:54So this may suggest that those who do survive
47:57to T-star without any treatment,
48:00maybe they are relatively healthier
48:02at the beginning of the followup.
48:05Okay, and then we also plot the dose response curve
48:10of the median value of the composite outcome
48:14versus DTR Q,
48:17also suggests that the immediate treatment
48:22would lead to significantly higher median values of XQ,
48:27and also as illustration
48:29of the gained statistical efficiency
48:30by modeling the smooth effect Q on the quantile of the XQ.
48:37The variance in the one year outcome
48:40associated with Q equals 350, achieved about 15% reduction
48:46compared to that from the regimen specific estimates.
48:51So we gain a bit of our statistical efficiency
48:55by modeling the smooth effect.
49:00So there are several strands of continuous time
49:02marginal structure model.
49:04We see that we can derive, using different approaches,
49:08closed form of weights for continuous-time treatment.
49:12It can handle complex dataset on its own terms
49:16without having to artificially align measurement times,
49:20which could possibly lead to loss of information.
49:24It is amenable to many different outcomes.
49:27We've used the survival outcomes,
49:28we've used composite outcomes.
49:31You can also handle many data complications
49:34introduced by various censoring patterns
49:37within the same marginal structure model.
49:41So these are the strengths,
49:43but there are also limitations with this approach of course.
49:46One notable limitation is extreme ways,
49:50which could possibly lead to unstable estimates.
49:54So how to address this issue,
49:57especially for time varying confounding
50:00with censored outcome, this would be a challenging task,
50:04but if we can solve this issue,
50:06it might be a very important contribution to the field.
50:10So this is something my colleagues and I
50:14have been thinking about and working on for some time.
50:18Another limitation is that we know
50:22that weighting-based estimator is less efficient
50:25than the so-called G methods.
50:28The G computation, G estimation,
50:30and both G methods require integrating
50:32over the space of longitudinal confounders.
50:35So the G methods are computationally
50:38much, much more expensive
50:40than the marginal structure model-based methods.
50:44And as far as I know,
50:48currently there's no continuous time version
50:50of the G computation methods.
50:53Judith Lok has a paper, back in 2008.
50:56She developed theory for continuous time G-estimation,
51:00but I have yet to see a practical implementation
51:03of this method.
51:05So this could be another avenue for future research,
51:11how to increase efficiency of the continuous time
51:15weighting-based methods.
51:18And here's some key references.
51:22Thank you.
51:24- Thank you Liangyuan for this very interesting
51:25and comprehensive presentation.
51:29Let's see if we have any questions from the audience.
51:32If there's any questions,
51:34please feel free to unmute yourself and speak
51:36or type in the chat.
51:43- [Donna] Thanks, it was a very interesting talk.
51:45This is Donna Spiegelman.
51:47- Hi, Donna.
51:48- Yeah, hi.
51:49I was wondering I might've missed it,
51:51but did you say much about estimating the variance?
51:55I see you have (indistinct) around the curve,
51:58so you must derive the variance.
52:01So I'm wondering if you could say a little bit about that
52:03or a little more about that
52:05if I missed what you did say.
52:07- Sure, sure, sure.
52:09So for this one, this is the second example,
52:12for this one we have multiple amputation
52:15and we also have weighting.
52:20So with weighting part,
52:22the variance was estimated using bootstrap
52:26for multiple amputation, and then we combined,
52:29so it's a bootstrap nested within multiple imputation.
52:33So then we use the Rubin's combination role
52:35to estimate the total variance.
52:38For the first example, we actually used a bootstrap,
52:45and the coverage probability was actually okay.
52:50It's good for the estimator.
52:52- Did you think about asymptotic variants derivations?
52:56- I did.
52:58It was a very difficult task,
53:03there's a story about our first paper
53:05found that about it.
53:10It was first submitted to Jaza
53:12and then they asked about the asymptotic variants
53:16about the estimator.
53:18And it's quite complex because they involve the splice
53:22and involves the survival data.
53:25And we have already approved as a consistency,
53:34and it also involves optimization.
53:38So it's just comes to-
53:40- What's the optimization piece.
53:43- Oh, it's the model based optimal treatment initiation time
53:47that will lead to the maximum survival
53:49at predefined time points.
53:54Right, so they are interested in the optimization.
53:58So the inference about the optimized
54:00treatment initiation time.
54:02We did some empirical evidence
54:04for like the largest sample convergence rate,
54:08but we weren't successful at deriving asymptotic variants.
54:14So that's another piece, I think maybe,
54:18I don't know.
54:19We had this discussion among colleagues
54:21and also my advisor at the time,
54:24we just not sure about whether it's worth the effort
54:28to go and do that route.
54:30- It's probably way more complex
54:32than just the usual derivation.
54:34'Cause you do have like two weighting models,
54:36which are also survival models,
54:40and also the derivation that these variances
54:42sometimes can be specific to the choice
54:44of these (indistinct) models.
54:46And so if you have a variance and the cup's model,
54:49it does not apply to other forms of models,
54:52I guess it's really a trade-off right?
54:54- Yeah, it is a trade off.
54:58It's still an open question and nobody had done it yet,
55:03but just, whether you're thinking it's was the effort
55:06just to devote a couple of years to work on that.
55:10- So was bootstrap time consuming for these datasets,
55:15for this data analysis, or they're pretty manageable.
55:18- They're pretty manageable.
55:20And it looks complicated because we have to weight everybody
55:23that had event.
55:24We also have to weight everywhere in the risk set
55:27at any time point.
55:28So it looks pretty complex, but still manageable.
55:35Another reason is because we use parametric models.
55:38If we wanted to,
55:41I'm not aware of any machine learning algorithm
55:46that can handle survival data,
55:48but also with time varying covariates,
55:52that's something I'm also thinking about.
55:54Like, if we use those algorithm
55:56might be more time consuming,
55:59but with just a parametric models, it's pretty manageable.
56:03- And when you're bootstrapped,
56:04you go back to the weight models
56:06and refit the weight models every time?
56:08- Yeah.
56:10- But the variable is pre-determined.
56:12So that's what you mentioned, machine learning.
56:15So the variables are predetermined
56:17and they're functional forms in the model,
56:19but the coefficients that correspond to them
56:22are re estimated for each bootstrap.
56:24- Very estimated.
56:26Right, right, right.
56:27Exactly.
56:28Yeah.
56:29- Great question.
56:30- Yeah.
56:31So a lot of open questions still.
56:35- So any other questions from the audience?
56:41- I have another comment.
56:44So by getting back to this,
56:47that you re estimated the coefficients
56:49for the weight models.
56:50So in sort of the standard marginal structural model,
56:54the variability due to those weight models is ignored.
56:59And the robust variance is used
57:01and said to be an overestimate,
57:02implying that if you took that variation into account,
57:06you'd get a smaller variance
57:08and you might see the same thing here with your bootstraps.
57:11If you took the weight models as fixed,
57:15you might find that you have a less efficient estimator,
57:18which is kind of interesting
57:20just in terms of say a methods paper to show,
57:23because there's different ways to do bootstraps,
57:26but here you're automatically taking the estimation
57:30of the weight models into account,
57:32which is not saying that say the classic paper
57:35by Hernan in epidemiology,
57:38that's ignored and the robust variance is recommended.
57:42- Hmm.
57:43It's a very great comment.
57:46Something I have to think about.
57:47So you're saying that in each bootstrap,
57:52when we estimate the weight model, we fix the weight model.
57:57So the coefficients from the weight model stay fixed-
58:01- Yeah, so you don't even do a bootstrap for that.
58:03You basically hold the weight model as a constant,
58:06and then you'd-
58:07- Robust variance.
58:09- Yeah, you use the robust variance,
58:11which I guess it's a little tricky
58:13because now you don't have the robust variance
58:15because you're not using it,
58:16but it seems the bootstrap analog of the approach taken
58:21would be to just fit the weight model once,
58:24treat that fixed unknown,
58:27and then only bootstrap on the outcome model.
58:31- Right, right.
58:32Yeah.
58:33- [Fan Li] Totally. Yeah.
58:35- Interesting.
58:36Take that in as a note.
58:39- So I do have a question as well.
58:42I think Liangyuan you had presented two applications
58:44at the HIV observational studies,
58:47do you see the application that these new methods
58:51to other areas as well
58:54to solve the other questions? - Yeah.
58:57Yeah, actually this is not pertaining to HIV area.
59:02It's actually in the public health areas.
59:06A lot of questions are involving
59:13this statistical formulation.
59:16So for example,
59:18I've been collaborating with an epidemiologist at Columbia.
59:23They are doing cardiovascular research.
59:27So one research question is that,
59:31I think it's blood pressure lowering intervention.
59:37So blood lowering innovation is very useful
59:40for preventing cardiovascular diseases,
59:45but they don't know.
59:47And there also a lack of randomized control trials.
59:50What is the optimal threshold
59:53to start giving the blood lowering treatment?
59:57So this is exactly the same form
59:59as our second motivating example.
01:00:01Like what is the optimal CD4 threshold
01:00:04to start the HIV treatment?
01:00:06And their question is what is the optimal threshold
01:00:09to start the blood lowering treatment?
01:00:13So I think there's a lot of possibility
01:00:20as to apply these kinds of methods
01:00:22in other health research area.
01:00:24- Yeah, it's a huge controversy
01:00:26in terms of the treatment of hypertension,
01:00:28what's the optimal blood pressure
01:00:31to start antihypertensives.
01:00:33And I think there was a very large trial
01:00:35that showed that it was better to start it
01:00:38at a much earlier threshold than what current practices.
01:00:42And it's very troublesome for people around the world
01:00:47because these medicines are expensive.
01:00:49And if you see now,
01:00:51like another like 40% of the population
01:00:54should now be initiated a antihypertensive medication,
01:00:58well, most countries can't even afford that.
01:01:01So the implications of these different thresholds
01:01:05is a very big topic of sort of substantive research
01:01:09and debate right now.
01:01:10- Well, that's great to know,
01:01:12there's urgent need for that.
01:01:14(indistinct)
01:01:16- Totally.
01:01:17All right, I think we are at the hour,
01:01:19so thanks Liangyuan again for your great presentation
01:01:25and if the audience has any questions,
01:01:28I'm sure Liangyuan is happy to take any questions offline
01:01:31by emails.
01:01:33And I think this is the final seminar of our fall series,
01:01:38and I hope to see everyone next spring,
01:01:40have a good holiday.
01:01:42Thank you.
01:01:43- Thank you.
01:01:44- Bye. - Bye.