# YSPH Biostatistics Seminar: “Marginal Structural Models for Causal Inference with Continuous-Time Treatment and Censored Survival Outcomes"

December 08, 2020## Information

Liangyuan Hu, PhD, Assistant Professor - Department of Population Health Science and Policy

Icahn School of Medicine at Mount Sinai

December 8, 2020

ID5980

To CiteDCA Citation Guide

- 00:00- We want to the last Biostatistics seminar
- 00:03for the fall series.
- 00:05It's my great pleasure to welcome our speaker,
- 00:07Dr. Liangyuan Hu.
- 00:09Dr. Hu is an Assistant Professor of Biostatistics
- 00:13in the Department of Population Health Sciences and Policy
- 00:16at Mount Sinai School of Medicine.
- 00:19She received her PhD in Biostatistics from Brown University.
- 00:23Her methods research focuses on causal inference
- 00:25with complex longitudinal and survival data
- 00:28and Bayesian machine learning.
- 00:30Her independent research has been funded by NIH
- 00:33and Patient Centered Outcomes Research Institute.
- 00:36And her paper in Biometrics has been selected to receive
- 00:39the 2019 Outstanding Statistical Application Award
- 00:45by the American Statistical Association.
- 00:48Today, she's going to share with us her recent work
- 00:50on developing a continuous time marginal structure of models
- 00:54for complex survival outcomes.
- 00:56Liangyuan, the floor is yours.
- 00:58- Well, thank you Li Fan.
- 01:00Thank you so much Fan for your introduction,
- 01:02for the invite also.
- 01:05Let me just share my slides full screen.
- 01:08I'm really excited to be here today
- 01:10to talk about some of the projects I've been working on
- 01:15in the causal inference field,
- 01:17namely, how do we use marginal structure models
- 01:21for more complex comparative effectiveness
- 01:26research questions involving continuous-time treatment
- 01:30and censored survival outcomes.
- 01:32So I'd like to first acknowledge my colleagues,
- 01:35especially Doctors Hogan and Daniels
- 01:37who had been instrumental to me
- 01:40during the time I was working on this project.
- 01:44And let me just shift to the bar a little if I can.
- 01:50Okay.
- 01:51So this is just for those who aren't very familiar
- 01:56with causal inference,
- 01:58and simple slide to introduce the concept.
- 02:01Some key concept.
- 02:02Suppose we are interested in estimating the causal effect
- 02:06of a binary treatment A on some outcome Y.
- 02:11Using the potential outcomes framework,
- 02:13we can define the average treatment effect
- 02:16as the difference between the mean
- 02:19of the two sets of potential outcomes.
- 02:22So, Y1 here is the potential outcome
- 02:26that would have been observed
- 02:27had everyone in the population received the treatment.
- 02:30Similarly, Y0 here is the potential outcome
- 02:33that would have been observed
- 02:34had no one in the population received the treatment.
- 02:38To estimate the causal effect,
- 02:39the gold standard is the randomized controlled file.
- 02:43So in an RCT, we would randomly allocate patients
- 02:48to receive either treatment or the control or placebo,
- 02:53the randomization would make the two groups of patients
- 02:57more or less very similar in terms of their characteristics.
- 03:01So in a sense that these two groups are exchangeable,
- 03:08so that an individual's potential outcome
- 03:11to either treatment or control
- 03:14would not depend on which treatment group
- 03:16this person was assigned to.
- 03:18But just depends on how the treatment works.
- 03:21And this way we can simply look at the difference
- 03:24and the mean of the observed outcome
- 03:30between the two treatment groups
- 03:31and just to estimate the causal effect.
- 03:34But in many, many situations,
- 03:36we cannot conduct an RCT
- 03:39and we have to rely on observational data
- 03:42to get the causal inference about treatment effects.
- 03:45So in these situations,
- 03:47the independence between the potential outcome
- 03:49and treatment assignment would no longer hold.
- 03:53Because there might be exists a confounder
- 03:56that is predictive of the outcome,
- 03:58such that the probability of receiving the treatment
- 04:01depends on the confounder.
- 04:03So for example, age might be such a confounder.
- 04:06For example, younger patients may be more likely
- 04:10to receive the treatment.
- 04:13So in this case,
- 04:13if you take the difference in the average
- 04:16of the observed outcome between the two groups,
- 04:20then this estimate would not bear a causal interpretation
- 04:24because the difference might be confounded by age.
- 04:29So we would have to use specialized
- 04:31causal inference techniques to remove the confounding.
- 04:35And there are just many, many techniques out there,
- 04:37but today I'm just gonna focus on marginal structure model,
- 04:41because it is simple to implement.
- 04:46It has good statistical properties,
- 04:49and it is versatile enough
- 04:51to accommodate many, many complications
- 04:54posed by observational data that I'll talk about later.
- 04:58So we can propose a marginal structure model
- 05:00relating the potential outcome to the treatment assignment.
- 05:04And here theta one would capture the causal effect.
- 05:08But in reality,
- 05:09we can only fit a model to the observer data.
- 05:12And as I talked earlier,
- 05:16the parameter estimator beta one here
- 05:19would not bear a causal interpretation,
- 05:23it just measures association.
- 05:26But we can get to causation
- 05:28if by solving the weighted estimating equation,
- 05:32using the weight as W inverse of conditional probability
- 05:40of treatment assignment given the measured covariance.
- 05:43And this works because the IP weighting
- 05:46or inverse probability weighting
- 05:47removes confounding by measured covariance X
- 05:51in the weighted pseudo-population.
- 05:54So that's just a simple example
- 05:57to illustrate the use of marginal structure model.
- 06:03And traditionally treatment assignment,
- 06:06treatment is assigned at baseline and it's time fixed.
- 06:10So it means that the treatment doesn't change over time,
- 06:14but with increased availability of healthcare data sets,
- 06:21there are increased demands for more refined
- 06:23causal inference methods to evaluate complex
- 06:28treatment regimens.
- 06:29So one example is that treatment initiation
- 06:34can actually depend on time, so it changes over time.
- 06:40In this case,
- 06:41it would just be impractical to conduct RCTs
- 06:44because there are just simply too many
- 06:46treatment initiation time points.
- 06:48So I'm going to use two motivating examples in this talk.
- 06:54The first example is about timing of treatment initiation
- 07:00for patients who present both HIV and TB, tuberculosis.
- 07:05For these patients,
- 07:07TB treatment will be initiated immediately
- 07:09after the diagnosis, but during the TB treatment,
- 07:13when is the optimal time to initiate the HIV treatment
- 07:17or ART, anti-retroviral therapy?
- 07:20That is a very important question to answer,
- 07:22because if you initiate the treatment too early,
- 07:25there might be drug interactions, drug toxicity,
- 07:29but if you delay the treatment too much,
- 07:31then there's also increased the mortality
- 07:33associated with AIDS.
- 07:36The second example is timing of HIV treatment
- 07:39for adolescents.
- 07:42The timing now is defined with respect
- 07:45to the evolving value of a biomarker CD4.
- 07:49And this is also an important question to answer
- 07:51because the WHO guideline is in the form of
- 07:57treat this person when the person's CD4 cell count
- 08:01drops below 350, for example,
- 08:04and for the population of adolescents
- 08:06currently there's no concrete evidence
- 08:10for supporting the optimal threshold.
- 08:15So to statistically formulate these two motivating examples,
- 08:20the first one,
- 08:21when is the best time to initiate a treatment?
- 08:25So this is actually a static treatment regimen
- 08:27with respect to time,
- 08:32and the initiation can occur on the continuous timescale.
- 08:35And second example is actually a dynamic treatment regimen.
- 08:39It's dynamic because it depends on the evolving history
- 08:45of treatment and a biomarker,
- 08:48but initiation can also occur on the continuous timescale.
- 08:54So marginal structure models are suitable
- 08:56for addressing a time dependent treatment,
- 08:59but in order to use the models,
- 09:02we have to overcome some statistical challenges.
- 09:05The first challenge is that we need
- 09:08to estimate the causal effect of the actual timing,
- 09:11not compare protocols defined by some specific intervals,
- 09:15which is a lot of existing studies did.
- 09:21And also a lot of RCT reported these kinds of results.
- 09:25Because as I said earlier,
- 09:27it's just impractical for RCTs
- 09:30to report continuous time causal effects.
- 09:34We would also need to address complications
- 09:36posed by observational data.
- 09:39This is something I'll talk about later.
- 09:41And also we are dealing with censored survival outcomes
- 09:45that adds another layer of complexity.
- 09:49So these are four sensory patterns observed in our data.
- 09:53So our goal is to estimate the causal effect of A,
- 09:57treatment initiation time and T, death time.
- 10:01And we have almost 5,000 patients
- 10:05and only a very small proportion of patients
- 10:07have both observed A and T.
- 10:10A lot of patients don't have observed T.
- 10:13So their death time is censored by C.
- 10:16And we have about 20% of our patients,
- 10:18they don't even have observed A.
- 10:20Their treatment initiation time
- 10:22can be censored by death time or censored by C, dropout,
- 10:27for example.
- 10:28So our goal is to estimate effect of A on T,
- 10:32but we only have about 300 patients
- 10:35have complete information.
- 10:36Most of the patients we have incomplete information
- 10:40on either A or T or both.
- 10:43How do we probably use these incomplete information
- 10:46to draw causal inference about A on T,
- 10:49the effect of A on T,
- 10:51that's a problem we solve in this project.
- 10:56So three challenges.
- 10:59First one, treatment initiation time,
- 11:02this is observational data, so it's not randomly allocated.
- 11:06We don't know the actual functional form of causal effect
- 11:10of initiation timing or mortality rate.
- 11:13And we see that, Oh, there's incomplete information
- 11:17on either exposure or outcome or both.
- 11:22The general solutions we proposed
- 11:26that we first formulate a flexible structural
- 11:29causal hazard model
- 11:31that can capture the effects of both timing and duration
- 11:36of the treatment.
- 11:37And then we can derive methods
- 11:39to consistently estimate the model parameters
- 11:44under non random allocation and complex censoring patterns.
- 11:48Using the model outputs we can estimate the functional form
- 11:53of the causal relationship between our initiation timing
- 11:56and mortality.
- 11:59So some notation before we introduce our approach,
- 12:02note that we have three time to events in our study,
- 12:06we have treatment initiation time, death time,
- 12:09censoring time.
- 12:10We'll use T sub cap A to denote death time
- 12:13associated with the actual treatment time.
- 12:17And potential outcomes T sub small A,
- 12:20this is the death time.
- 12:21If treatment initiated at time A,
- 12:24and we use T infinity to denote death time
- 12:28if treatment is initiated beyond sometime point
- 12:31of our interest.
- 12:33Because of all the censoring,
- 12:36all the three time to events can be censored by one another.
- 12:40We use T star to denote the minimum of T and C.
- 12:44Delta T is a corresponding event indicator.
- 12:47So A star is the minimum of the three time to events.
- 12:51Delta A is a corresponding event in the data.
- 12:54Adopting the convention in the causal inference literature,
- 12:58we use overbar to denote history.
- 13:00So overbar L of T here is a covariate history
- 13:06up to a time T.
- 13:08Putting everything together,
- 13:09we have a set of observed data.
- 13:12Now back to the censoring patterns.
- 13:15In case one, we observed both A and T.
- 13:18So we would observe A, we would observe T sub A.
- 13:23Case two T is censored by C,
- 13:25so we observe A, we just know TA
- 13:28is going to be greater than C.
- 13:29Case three,
- 13:31we will observe A,
- 13:32but we know A is greater than TA.
- 13:35And case four we don't observe A, we don't observe T
- 13:38but we know A is greater than C and TA is greater than C.
- 13:43Okay.
- 13:44So now we propose a structural causal
- 13:46proportional hazards model
- 13:49to capture the survival effect of treatment initiation time.
- 13:54Lambda AT here is a hazard function
- 13:56for the potential outcome T sub A,
- 13:59we start from lambda infinity T right here.
- 14:01This is a reference hazard for T infinity.
- 14:05So we start from here.
- 14:07Once the treatment is initiated at A,
- 14:10there is an instantaneous effect of treatment initiation
- 14:14captured by the G1 function here,
- 14:17and the effect of staying on the treatment
- 14:20at any given time point T,
- 14:22is captured by the G2 function of ART duration.
- 14:27And the G3 function here captures the interaction
- 14:30between treatment initiation and treatment duration.
- 14:35So we leave this structural model relatively flexible.
- 14:41First, the reference hazard is left unspecified
- 14:44and the 3G functions, we also left them
- 14:47as unspecified smooth function
- 14:49of treatment initiation time duration and their interaction.
- 14:54So now we can parametrize these three functions
- 14:58using natural cubic splines,
- 15:00and by rewriting the risk function of our structural model,
- 15:07we can use beta this parameter
- 15:09to include the causal effects of ART initiation time
- 15:13on mortality hazard.
- 15:15The problem here now,
- 15:17our goal is to how do we obtain a consistent
- 15:20estimate of beta using observed a data?
- 15:24Once we have obtained that
- 15:26we can use beta hat to estimate the 3G functions,
- 15:30to understand the relative contribution of timing
- 15:33versus duration and interactions.
- 15:37And we could also estimate the causal does-response
- 15:41of initiation time versus mortality
- 15:43by relating the survival function to the hazard function.
- 15:46We can derive this from our structural model.
- 15:52And now we can also estimate the model-based
- 15:55optimal initiation time
- 15:57that will lead to the maximal survival probability
- 16:01at say 52 weeks after diagnosis.
- 16:07Okay, how to obtain a consistent estimate of beta.
- 16:10So first let's assume if A is randomly allocated
- 16:15and both A and T are observed,
- 16:17then we can write the partial likelihood score function
- 16:22of our structural model.
- 16:24And this is a sample average of score function
- 16:28is an unbiased estimator of the expectation
- 16:32of the score function.
- 16:33So E sub R here is the expectation
- 16:37under the randomized treatment assignment.
- 16:40So this would be an unbiased estimator function,
- 16:47and solving this unbiased estimating equation
- 16:50would give us a consistent estimator of beta.
- 16:55Now, if A is still randomly allocated,
- 16:58but T can occur before A,
- 17:02so A may be censored by T.
- 17:05In this case,
- 17:06we would need to break the mean
- 17:08of an individual score contribution into two parts.
- 17:11In one part A is observed.
- 17:13The second part is A is not observed.
- 17:16And then we can apply the law of total expectation
- 17:19to the second part.
- 17:21The inner expectation would be conditioning
- 17:24on the observed information.
- 17:26Then using this strategy and taking in account
- 17:30the survival hazard structure,
- 17:32we can revise the estimating equation.
- 17:37And by solving this to obtain a consistent estimate of beta.
- 17:42In the case of non random allocation of treatment,
- 17:46then if we want to estimate the causal effect of A on T,
- 17:50then we would have to make a key assumption,
- 17:56ignore ability assumption.
- 17:57Essentially the assumption says
- 17:59that the initiation of treatment at any given time T
- 18:04is sequentially randomized in the sense
- 18:06that as a potential outcome beyond this time
- 18:09is independent of treatment initiation.
- 18:12Conditioning on all covariate history up to T.
- 18:16So with this assumption,
- 18:19we will be able to use observed data
- 18:21to derive the causal effect.
- 18:23So say PR is the data distribution under randomized A,
- 18:27and PO is the data distribution.
- 18:30And they're not random allocation of A.
- 18:33Note that in both settings,
- 18:35there is a same set of observed data.
- 18:39And as long as the observed data under PR
- 18:43is absolutely continues with the observed data under PO.
- 18:48Now we can derive a random-nikodym derivative.
- 18:52And so Murphy's 2001 paper
- 18:55developed a version of R-N derivative
- 18:58that connects the distribution of the observed data
- 19:01under PR and under PO for discrete time
- 19:05and ordinary GEE score.
- 19:07Johnson's 2005 paper extended this version of R-N derivative
- 19:12to continuous time still for ordinary GEE score.
- 19:15In this paper we extended the R-N derivative
- 19:20for time to event setting.
- 19:23So this is a version of R-N derivative
- 19:26for survival data.
- 19:28The reason why we wanted to use R-N derivative
- 19:32is that we can then use it
- 19:34to derive an unbiased estimating equation
- 19:37using some weighted version of the observed data.
- 19:41So we can estimate the causal effect.
- 19:43So now we want to use this R-N derivative for survival data.
- 19:49We want to apply that to Cox score
- 19:51and to derive S rated estimating equation.
- 19:55That's a little bit more complex than the GEE score,
- 20:00but we can observe that the Cox score
- 20:02can essentially be represented in three averages.
- 20:06The one in blue,
- 20:08the one in orange and the whole average.
- 20:13And each average converges to its expectation.
- 20:17And as I showed earlier,
- 20:19we can always break the expectation into two parts.
- 20:23In one part A is observed,
- 20:25second part is not observed.
- 20:27For the second part,
- 20:28we can apply the total law of expectation,
- 20:32the law of total expectation,
- 20:35and recognizing the survival structure
- 20:40to derive the second part.
- 20:43And then we can apply the R-N derivative for survival data
- 20:46to each piece separately,
- 20:48to construct the unbiased score equation.
- 20:53So after some derivation, we would arrive at the weights
- 20:59and actually the weights come down in a very neat form.
- 21:03Essentially, it suggests that for patients
- 21:06who have initiated treatment by time T,
- 21:10we would weight them by the marginals density function
- 21:13of A divided by the conditional density of A
- 21:18given their covariate history after time T.
- 21:23And for those who are censored,
- 21:25so not initiated by the time T,
- 21:28we would weight them by some survival function
- 21:31of the treatment initiation process.
- 21:36And then by applying this weighting scheme,
- 21:39we will be able to derive a weighted estimating equation.
- 21:43And just a note that we have to apply
- 21:46the same weighting scheme to the people
- 21:49who are still in the risk set at any time T.
- 21:54And so now that said, previously we have assumed
- 21:58there's no censoring.
- 21:59Now with censoring,
- 22:00we need to assume another similar assumption,
- 22:06similar to the ignore ability assumption,
- 22:09and then using the similar strategy
- 22:12to derive another set of weight for censoring.
- 22:16For those who stay, remain in the study,
- 22:18we would weight them by the survival function
- 22:23for censoring.
- 22:24And this would lead to the final modification
- 22:27of the estimating equation for beta.
- 22:29So censoring contributes information about the parameter
- 22:33in two ways,
- 22:35FC is observed as the person is actually censored.
- 22:40It contributes to the risk set up to C.
- 22:42If C is not observed, so C could be censored by T.
- 22:46If death's occurred,
- 22:47then it contributes to the individual partial likelihood
- 22:50to weight for C but evaluated at death time.
- 22:55Okay, now we know how to weight.
- 22:56Back to the four censoring patterns.
- 22:59The first one, both A and T are observed.
- 23:02We would weight them by the first set of weight for A
- 23:06evaluated at A,
- 23:08T occurred, so the weight for C but evaluated at T.
- 23:13Second case, T is not observed,
- 23:17A is observed.
- 23:19So first set of weight for A evaluated at A
- 23:23and C just contributes information to the risks set.
- 23:28Third case, A is not observed,
- 23:31so second weight for A evaluated at T.
- 23:35And weight for C, censoring evaluated at T.
- 23:40The fourth case or final case, A is not observed,
- 23:44again, second set of weight for A,
- 23:47but evaluated at C, and C also contributes to the risks set.
- 23:51Okay, so now we know how to weight.
- 23:54We would have to estimate the weights.
- 24:00The approach we used in the paper
- 24:02is that we model the intensity processes
- 24:06associated with the two counting processes,
- 24:09one for A, one for C.
- 24:12And then when we fit Cox proportional hazards models
- 24:15for the two intensity processes,
- 24:17we use fitted hazard to estimate the weights.
- 24:21We use empirical cumulative hazards
- 24:24to estimate the conditional density and function.
- 24:27And for the marginal density function,
- 24:29we use some nonparametric Nelson-Aalen estimator,
- 24:32and use similar fashion to estimate rates for censoring.
- 24:36Then we apply our methods to the AMPATH data.
- 24:39AMPATH is a large HIV care program based in West Kenya,
- 24:44our data has almost 5,000 patients
- 24:47and for covariates, we have demographic information
- 24:51and some disease-specific information.
- 24:54Some of them are time varying like, weight, the CD4,
- 24:56these are time varying variables.
- 24:59We categorize the baseline CD4 subgroups into two groups,
- 25:06the less than, or below 50 group,
- 25:08this is the highest risk group.
- 25:11So CD4 the higher, the better.
- 25:14So below 50, this is a highest risk group.
- 25:16And between 200 and 350,
- 25:19there's relatively healthy patients.
- 25:21The reason we categorize them into three groups
- 25:23is because the program guidelines
- 25:26are based on these subgroups
- 25:28and RCT is reported results for below 50 group.
- 25:33We want to compare our results to our CT findings.
- 25:37So this plot shows the three estimated G functions.
- 25:42The G1 A here suggests that the instantaneous effect
- 25:47of a treatment initiation has a U shape,
- 25:50achieving maximum benefit, or the lowest mortality hazard
- 25:53at just about 10 weeks.
- 25:56And after that, the longer the treatment is delayed,
- 26:00the less the benefit of the treatment initiation.
- 26:03And this is the effect of duration,
- 26:06in general, it says that the longer
- 26:08you stay on the treatment, the more benefit you get.
- 26:11There's an upward trend for the interaction effect.
- 26:15Essentially suggesting that delayed treatment initiation
- 26:19would reduce the benefit associated
- 26:22with long ART duration.
- 26:27And so the net causal effect of treatment initiation
- 26:31is summarized in this plot.
- 26:33Top panel shows the mortality rate at one year
- 26:38versus treatment initiation time.
- 26:40Bottom panel compares immediate initiation
- 26:44versus delayed initiation at A.
- 26:48So we can see that the benefit of early initiation
- 26:53is most pronounced for the CD4 below 50 group,
- 26:57or the highest risk group.
- 26:58And the curves here are pretty flat,
- 27:01suggesting that there's not much benefit
- 27:03of early initiation for relatively healthy patients.
- 27:09Several advantages for this approach.
- 27:12It's easy to get optimal initiation time
- 27:17based on the model outputs.
- 27:20And we could also use the model outputs
- 27:22to emulate comparisons between regimens reported in RCTs.
- 27:27So we could mimic random allocation
- 27:32of treatment initiation time to specific intervals
- 27:36by assuming a distribution for A,
- 27:39for treatment initiation time A,
- 27:41that is independent of covariates and outcome
- 27:44and compare interval specific mortality rates
- 27:49and draw inferences about treatment initiation.
- 27:53But with the continuous time marginal structure model,
- 27:56we'll also be able to conduct a higher resolution analysis
- 28:00that can potentially generate new insights
- 28:03in relation to a randomized control trial.
- 28:09For the sake of timing,
- 28:10I just gonna briefly talk about the simulation.
- 28:14We conduct simulation to examine
- 28:15the finite-sample properties of weighted estimators,
- 28:24we evaluate sensitivity of our estimators
- 28:27to the violations of the ignore ability,
- 28:30or no unmeasured confounding assumption,
- 28:32but we only considered confounding at baseline.
- 28:35So the sensitivity analysis strategy
- 28:39for time-varying confounding,
- 28:41especially with the censored survival outcome
- 28:44is kind of very complex topic,
- 28:48and we were still working on this project right now,
- 28:51but in this paper we just consider confounding at baseline.
- 28:56Under random allocation of treatment,
- 28:59our estimator produced a new zero bias
- 29:02and nominal coverage probability,
- 29:05in the presence of measured confounding,
- 29:07it eliminated nearly all the biases
- 29:09and provided close to nominal coverage probability,
- 29:13but in the presence of unmeasured confounding,
- 29:16there was bias in our estimator.
- 29:19And the biases were in proportion
- 29:22to the degree of measured confounding.
- 29:26Okay,
- 29:27so moving to the second example,
- 29:29this is a continuous time dynamic treatment regimen
- 29:34of the form,
- 29:34initiate treatment when a biomarker crosses a threshold.
- 29:40It's dynamic treatment regimen
- 29:42because it depends on evolving history of treatment
- 29:45and a tailoring variable.
- 29:47So in our case, CD4 is a tailoring variable.
- 29:50That means we make our treatment decision
- 29:53based on this variable.
- 29:55A little bit different from our previous motivating example.
- 30:00The outcome interest is different.
- 30:04This is a pediatric data.
- 30:05So for the kids, the mortality rate is very low
- 30:09and our data I think it's around 3%.
- 30:12And for kids, we're also interested
- 30:14in their CD4 measurements,
- 30:17because CD4 is important marker of immune system function
- 30:21and both outcomes, both mortality rate and CD4
- 30:24are sparsely measured in our data,
- 30:27but we are interested in both.
- 30:30Other than that, we also have complications
- 30:33posed by observational data.
- 30:36So this is a picture of nine randomly selected individuals
- 30:41from our data,
- 30:43X axis here, follow-up time in days,
- 30:46Y axis here square root of CD4,
- 30:49purple line is end of follow-up,
- 30:53two gray lines here mark one year
- 30:57and two years post diagnosis.
- 31:00Empty circles here mean that the patient
- 31:04has not been treated.
- 31:06Solid circles, mean that they're on the treatment.
- 31:09So we can see that there's a lot of variability
- 31:12in terms of the treatment initiation time.
- 31:16And some people are followed much longer
- 31:20than some other patients.
- 31:22And the follow-up time is pretty irregularly spaced
- 31:29and overall the CD4 measurements are quite sparse,
- 31:34and there's also incomplete information
- 31:36for example, these two they either died
- 31:41or were lost to follow up
- 31:44before they even got a chance to be treated.
- 31:47So there's also a lot of complication in the data.
- 31:51There's a continuous time measurement
- 31:54of the treatment initiation.
- 31:55It just happens all over the place.
- 31:58The longitudinal outcome of interest are sparsely measured,
- 32:03leading to incomplete data.
- 32:05There's also a censoring due to dropout or deaths.
- 32:09So our general solution is that we'll use weighting
- 32:11to handle time-varying confounding.
- 32:14And will show how to derive a continuous time versions
- 32:17of the weights.
- 32:19For the missing outcomes
- 32:21that is caused by sparse measurement and censoring
- 32:24we'll use imputations from a model of the joint distribution
- 32:28of CD4 and mortality.
- 32:30And because we're interested in both mortality status
- 32:33and CD4, we'll develop a composite outcome.
- 32:38So our general approach is to emulate a randomized trial
- 32:42in which we would randomize individuals
- 32:45to follow specific DTR Q.
- 32:48And Q equals zero means never treated,
- 32:51because CD4 can never drop below zero.
- 32:54Now, Q equals infinity means treat immediately.
- 32:58So after randomization,
- 32:59all the individuals will be followed
- 33:02for a fixed amount of time,
- 33:05at which point, say T star,
- 33:07both their mortality status.
- 33:10And among those who are alive at T star,
- 33:14their CD4 count will be assessed.
- 33:18So what define a composite outcome XQ,
- 33:21that is the product of the test indicator
- 33:25and the potential CD4.
- 33:28So the cumulative distribution of this composite outcome
- 33:32is a useful measure of treatment utility,
- 33:36because it has appointments at zero
- 33:39corresponding to mortality rate.
- 33:41Thereby capturing both mortality status
- 33:45and CD4 count among survivors at T star.
- 33:52So for example, the probability of a positive XQ,
- 33:56that's the survival fraction,
- 33:58and the probability of XQ greater than X,
- 34:02that's the fraction of survivors with CD4 above X.
- 34:08Okay, so similar to the first motivating example,
- 34:12we again have three timed events.
- 34:16Death time, censoring time, treatment initiation time.
- 34:19And now we have a tailoring variable, CD4 count.
- 34:23So the CD four process is defined for all continuous time,
- 34:27but it's just measured at discrete times.
- 34:30And we also have a P by one covariate process.
- 34:35Using a convention in the DTR literature,
- 34:38we assume that the treatment decision
- 34:40is always made after observing the covariate history
- 34:45and the CD4 count.
- 34:48Putting everything together,
- 34:50we have a history information indicator.
- 34:55For each individual, we'll have a observed a data process.
- 35:00And just note that each person
- 35:02can have a different lens of followup
- 35:04at different time points.
- 35:08Our goal is to evaluate the effect of DTRs,
- 35:12but we're dealing with observational data,
- 35:14so we'll have to map the observed treatment regimen
- 35:18to specific DTRs that we are interested in evaluating.
- 35:22Essentially we'll follow the deterministic function
- 35:28to create the mapping.
- 35:30Essentially there are three rules.
- 35:32First rule says not to treat the person
- 35:34if the person has not yet initiated treatment
- 35:38and their CD4 has not fallen below Q,
- 35:41or has not been observed.
- 35:44Second rule says, treat this person if their time T,
- 35:47CD4 has fallen below Q for the very first time.
- 35:51Once treated, always treat them.
- 35:54Following these three rules,
- 35:56we'll be able to create a regimen specific compliant process
- 36:00for each individual in the data.
- 36:02So essentially if the rule says treat,
- 36:05and if the person is actually treated by the time T,
- 36:09then this person is compliant at time T.
- 36:13If the rule says do not treat,
- 36:14and the person was not treated at the time T,
- 36:17so this person is still compliant to the rule.
- 36:20And so we'll be able to observe a compliant process
- 36:24for each person.
- 36:25Here a simple example to show you how to create the mapping.
- 36:29For example, we're interested in Q equals 350.
- 36:33This person came in at baseline,
- 36:35had a measurement 400 above the threshold.
- 36:39The rule says do not treat,
- 36:40the person was not treated.
- 36:41At this point, it's compliant with the rule.
- 36:44Next visit, no new CD4 observation.
- 36:48So the rule says do not treat,
- 36:50the person's still not treated,
- 36:52still compliant at this point.
- 36:53Third visit, the person's CD4 drops to 330,
- 36:58which is below the threshold for the very first time,
- 37:01the rules are start treating this person,
- 37:05the person was actually treated.
- 37:06So compliant at this point.
- 37:10Next visit the rule says once treated always treat them,
- 37:14the person kept being treated.
- 37:16So this person was compliant with the rule 350
- 37:20all throughout his or her followup.
- 37:23Next example, the first two rows are the same.
- 37:27The third visit, the person's CD4 jumps to 450,
- 37:33which is above the threshold.
- 37:35The rule says do not treat,
- 37:37but on the contrary, the person was actually treated
- 37:40and kept being treated.
- 37:42So from this time point onward,
- 37:46the person was not compliant with this rule.
- 37:51Okay, so that's just some simple example
- 37:55to show how to create the mapping.
- 37:58With missing outcomes for those alive
- 38:01at the target measurement time T star,
- 38:05the observed outcome XI is the CD4 measurement at T star.
- 38:11But because of CD4 is sparsely measured
- 38:14and irregularly spaced,
- 38:17Z of T star is directly observed
- 38:19only when the person's followup time is exactly at T star.
- 38:25So in this case, it is pretty common
- 38:27to predefine a interval and capture the CD4 that is measured
- 38:37at the time closest to the target measurement time.
- 38:41But even using this strategy,
- 38:43there's still a possibility that there is no measurement
- 38:48in predefined interval.
- 38:51Then we say this person has a missing outcome.
- 38:54And it's also possible that the person dropped out
- 38:57before TA.
- 38:59And so in this case, the outcome is also missing.
- 39:04For these missing outcomes, our general strategy
- 39:08is to use multiple imputation.
- 39:10So we would specify and fit model
- 39:12for the joint distribution of the CD4 process
- 39:16and the mortality process.
- 39:18For those known to be alive,
- 39:20but without a CD4 measurement,
- 39:23we would impute the CD4 count from the fitted CD4 sub-model.
- 39:28And for those missing the CD4,
- 39:31because of right censoring,
- 39:33we would calculate the mortality probability
- 39:37from the fitted survival sub-model,
- 39:39and then impute the death indicator
- 39:42from the Bernoulli distribution
- 39:44with this calculated probability.
- 39:46If the death indicator was imputed to be zero,
- 39:49then we further impute a CD4 count for this person.
- 39:52Otherwise we'll set X to be zero.
- 39:56And again, we would have to assume
- 39:59some standard causal inference assumptions
- 40:02in order to draw causal effects about the DTRQ
- 40:08using observational data.
- 40:09And we can estimate and compare DTRs along a continuum.
- 40:15We can formulate a causal model
- 40:17for the smooth effect of Q on the task quantile of XQ.
- 40:22This is our composite outcome with separate parameters
- 40:25capturing the effect of treat immediately,
- 40:29and the effect of never treat.
- 40:31And then we can parametrize the model using splines of Q
- 40:35for the third term here, to gain statistical efficiency.
- 40:41And we can obtain a consistent estimator of effect of Q
- 40:47by solving the weighted quantile regression
- 40:50estimating equation.
- 40:53So what should be the weights?
- 40:57First, we assume there's no dropout or death
- 40:59prior to the target measurement time.
- 41:02In the discrete time setting with common time point,
- 41:07the form of the weights have already been done.
- 41:10It has been derived in several papers.
- 41:14Essentially, the denominator of the weight
- 41:17is this conditional probability.
- 41:19It's a conditional probability of the person being compliant
- 41:22all throughout the follow up, given the covariate history.
- 41:29So if we have a common set of discrete time points,
- 41:34it's a cumulative a product of the conditional probability
- 41:38of this person being compliant at every time point.
- 41:42And essentially if the rule says treat,
- 41:46it's a condition of probability of the person
- 41:48actually being treated at this time point,
- 41:51if the rule says not treat,
- 41:53as a conditional probability of this person not treated
- 41:57by this time point.
- 41:58So in order to estimate this probability,
- 42:02we just need to model the observed
- 42:04treatment initiation process among those regimen compliers,
- 42:09but this is for discrete time setting.
- 42:12What would be the continuous time weights?
- 42:14We note that the occurrence of treatment initiation
- 42:18in a small time interval T and T plus TD
- 42:21is actually a Bernoulli trial with outcome DNA of T.
- 42:28So then we can rewrite this probability,
- 42:31this probability here,
- 42:33in the form of individual partial likelihood
- 42:37for the counting process of A.
- 42:40And now we note that when DT becomes smaller and smaller,
- 42:45this finite product approaches a product integral.
- 42:50So then this finite product can be rewritten
- 42:54as a final product over jump times of the counting process
- 42:58for A times the survival function.
- 43:02And then by recognizing that each individual
- 43:05had at most one jump at exactly AI.
- 43:10Now we can further reduce this probability to this form.
- 43:15Which suggests weighting scheme.
- 43:18Essentially it says for those who have been treated
- 43:23by a T star, we would weight them
- 43:25by the conditional density function of A.
- 43:28For those who haven't been treated by the time T star,
- 43:32we would weight them by the survival or function of A.
- 43:36So if you recall the weighting scheme
- 43:39for the first motivating example, this is exactly the same,
- 43:43the same rating scheme,
- 43:45but we took different approaches.
- 43:47The first example,
- 43:48we use a random Aalen derivatives
- 43:51to derive the weighting scheme.
- 43:52The second project we derive the limit
- 43:58of the finite product,
- 44:00but using different approaches,
- 44:02we arrive at the same weighting scheme.
- 44:06And so similarly we modeled the intensity process
- 44:10of treatment initiation.
- 44:12We estimate the weights.
- 44:15So if there was a censoring or death
- 44:18prior to target measurement time,
- 44:20we would have to assume once lost to follow up
- 44:23at a time prior to T star,
- 44:25the treatment and regimen status remain constant.
- 44:28And this way we will just estimate the weights
- 44:31up to a time point CI, and if the person died before T star,
- 44:37then we would only evaluate compliance
- 44:40and treatment initiation processes up to time TI.
- 44:45Okay, so for missing outcomes,
- 44:49we propose a joint modeling approach.
- 44:52We specify a two-level model for the observed CD4 process.
- 44:56The first level,
- 44:57the observed CD4 process is a true CD4 trajectory
- 45:02plus some arrow process.
- 45:04The second level,
- 45:05we relate the true CD4 trajectory
- 45:07to baseline characteristics and treatment initiation time,
- 45:13and some subject specific random effects,
- 45:16capturing subject-specific deviations
- 45:19from the mean trajectories.
- 45:22And now we propose a hazard model for deaths
- 45:26uses the true CD4 trajectory as a covariate
- 45:30linking the two processes,
- 45:33Linking the death process and linking with a CD4 process.
- 45:38Now we use the joint model
- 45:39to impute the missing outcomes
- 45:43and estimate the variance of the target estimator
- 45:46using Rubin's combination wall.
- 45:50So we applied this method to the IeDEA dataset.
- 45:54IeDEA is another HIV consortium based in West Kenya.
- 46:00So we have almost 2000 data.
- 46:03We see that the CD4 is pretty sparsely measured
- 46:07and death rate is low around three and 4%.
- 46:12Most of patients have been treated by one year
- 46:16and we have a set of covariates.
- 46:21Some of them are time varying, some of them are time fixed.
- 46:26We proposed three target estimators,
- 46:30so first we're interested in mortality proportion.
- 46:34We're also interested in the median of the distribution
- 46:38of the composite outcome XQ.
- 46:41We also looked at CD4 among survivors,
- 46:45but this estimator does not have a causal interpretation
- 46:49because it conditions on having survived two T star.
- 46:53So it only measures association,
- 46:55but the first two estimators have causal interpretations.
- 47:03So we first look at the effectiveness
- 47:05of five specific regimens for both one year and two years
- 47:10after diagnosis.
- 47:12We can see that the immediate treatment initiation
- 47:18lead to significant lower mortality rate
- 47:21and significantly higher median values
- 47:24of the composite alcohol compared to delayed treatment.
- 47:30And the never treat initiation
- 47:32will lead to a significantly higher mortality probability.
- 47:37And for those who do survive to T star,
- 47:41their CD4 count is higher.
- 47:44So resulting higher to theta Q2 and higher theta Q3
- 47:48compared to other delayed treatment regimen.
- 47:54So this may suggest that those who do survive
- 47:57to T-star without any treatment,
- 48:00maybe they are relatively healthier
- 48:02at the beginning of the followup.
- 48:05Okay, and then we also plot the dose response curve
- 48:10of the median value of the composite outcome
- 48:14versus DTR Q,
- 48:17also suggests that the immediate treatment
- 48:22would lead to significantly higher median values of XQ,
- 48:27and also as illustration
- 48:29of the gained statistical efficiency
- 48:30by modeling the smooth effect Q on the quantile of the XQ.
- 48:37The variance in the one year outcome
- 48:40associated with Q equals 350, achieved about 15% reduction
- 48:46compared to that from the regimen specific estimates.
- 48:51So we gain a bit of our statistical efficiency
- 48:55by modeling the smooth effect.
- 49:00So there are several strands of continuous time
- 49:02marginal structure model.
- 49:04We see that we can derive, using different approaches,
- 49:08closed form of weights for continuous-time treatment.
- 49:12It can handle complex dataset on its own terms
- 49:16without having to artificially align measurement times,
- 49:20which could possibly lead to loss of information.
- 49:24It is amenable to many different outcomes.
- 49:27We've used the survival outcomes,
- 49:28we've used composite outcomes.
- 49:31You can also handle many data complications
- 49:34introduced by various censoring patterns
- 49:37within the same marginal structure model.
- 49:41So these are the strengths,
- 49:43but there are also limitations with this approach of course.
- 49:46One notable limitation is extreme ways,
- 49:50which could possibly lead to unstable estimates.
- 49:54So how to address this issue,
- 49:57especially for time varying confounding
- 50:00with censored outcome, this would be a challenging task,
- 50:04but if we can solve this issue,
- 50:06it might be a very important contribution to the field.
- 50:10So this is something my colleagues and I
- 50:14have been thinking about and working on for some time.
- 50:18Another limitation is that we know
- 50:22that weighting-based estimator is less efficient
- 50:25than the so-called G methods.
- 50:28The G computation, G estimation,
- 50:30and both G methods require integrating
- 50:32over the space of longitudinal confounders.
- 50:35So the G methods are computationally
- 50:38much, much more expensive
- 50:40than the marginal structure model-based methods.
- 50:44And as far as I know,
- 50:48currently there's no continuous time version
- 50:50of the G computation methods.
- 50:53Judith Lok has a paper, back in 2008.
- 50:56She developed theory for continuous time G-estimation,
- 51:00but I have yet to see a practical implementation
- 51:03of this method.
- 51:05So this could be another avenue for future research,
- 51:11how to increase efficiency of the continuous time
- 51:15weighting-based methods.
- 51:18And here's some key references.
- 51:22Thank you.
- 51:24- Thank you Liangyuan for this very interesting
- 51:25and comprehensive presentation.
- 51:29Let's see if we have any questions from the audience.
- 51:32If there's any questions,
- 51:34please feel free to unmute yourself and speak
- 51:36or type in the chat.
- 51:43- [Donna] Thanks, it was a very interesting talk.
- 51:45This is Donna Spiegelman.
- 51:47- Hi, Donna.
- 51:48- Yeah, hi.
- 51:49I was wondering I might've missed it,
- 51:51but did you say much about estimating the variance?
- 51:55I see you have (indistinct) around the curve,
- 51:58so you must derive the variance.
- 52:01So I'm wondering if you could say a little bit about that
- 52:03or a little more about that
- 52:05if I missed what you did say.
- 52:07- Sure, sure, sure.
- 52:09So for this one, this is the second example,
- 52:12for this one we have multiple amputation
- 52:15and we also have weighting.
- 52:20So with weighting part,
- 52:22the variance was estimated using bootstrap
- 52:26for multiple amputation, and then we combined,
- 52:29so it's a bootstrap nested within multiple imputation.
- 52:33So then we use the Rubin's combination role
- 52:35to estimate the total variance.
- 52:38For the first example, we actually used a bootstrap,
- 52:45and the coverage probability was actually okay.
- 52:50It's good for the estimator.
- 52:52- Did you think about asymptotic variants derivations?
- 52:56- I did.
- 52:58It was a very difficult task,
- 53:03there's a story about our first paper
- 53:05found that about it.
- 53:10It was first submitted to Jaza
- 53:12and then they asked about the asymptotic variants
- 53:16about the estimator.
- 53:18And it's quite complex because they involve the splice
- 53:22and involves the survival data.
- 53:25And we have already approved as a consistency,
- 53:34and it also involves optimization.
- 53:38So it's just comes to-
- 53:40- What's the optimization piece.
- 53:43- Oh, it's the model based optimal treatment initiation time
- 53:47that will lead to the maximum survival
- 53:49at predefined time points.
- 53:54Right, so they are interested in the optimization.
- 53:58So the inference about the optimized
- 54:00treatment initiation time.
- 54:02We did some empirical evidence
- 54:04for like the largest sample convergence rate,
- 54:08but we weren't successful at deriving asymptotic variants.
- 54:14So that's another piece, I think maybe,
- 54:18I don't know.
- 54:19We had this discussion among colleagues
- 54:21and also my advisor at the time,
- 54:24we just not sure about whether it's worth the effort
- 54:28to go and do that route.
- 54:30- It's probably way more complex
- 54:32than just the usual derivation.
- 54:34'Cause you do have like two weighting models,
- 54:36which are also survival models,
- 54:40and also the derivation that these variances
- 54:42sometimes can be specific to the choice
- 54:44of these (indistinct) models.
- 54:46And so if you have a variance and the cup's model,
- 54:49it does not apply to other forms of models,
- 54:52I guess it's really a trade-off right?
- 54:54- Yeah, it is a trade off.
- 54:58It's still an open question and nobody had done it yet,
- 55:03but just, whether you're thinking it's was the effort
- 55:06just to devote a couple of years to work on that.
- 55:10- So was bootstrap time consuming for these datasets,
- 55:15for this data analysis, or they're pretty manageable.
- 55:18- They're pretty manageable.
- 55:20And it looks complicated because we have to weight everybody
- 55:23that had event.
- 55:24We also have to weight everywhere in the risk set
- 55:27at any time point.
- 55:28So it looks pretty complex, but still manageable.
- 55:35Another reason is because we use parametric models.
- 55:38If we wanted to,
- 55:41I'm not aware of any machine learning algorithm
- 55:46that can handle survival data,
- 55:48but also with time varying covariates,
- 55:52that's something I'm also thinking about.
- 55:54Like, if we use those algorithm
- 55:56might be more time consuming,
- 55:59but with just a parametric models, it's pretty manageable.
- 56:03- And when you're bootstrapped,
- 56:04you go back to the weight models
- 56:06and refit the weight models every time?
- 56:08- Yeah.
- 56:10- But the variable is pre-determined.
- 56:12So that's what you mentioned, machine learning.
- 56:15So the variables are predetermined
- 56:17and they're functional forms in the model,
- 56:19but the coefficients that correspond to them
- 56:22are re estimated for each bootstrap.
- 56:24- Very estimated.
- 56:26Right, right, right.
- 56:27Exactly.
- 56:28Yeah.
- 56:29- Great question.
- 56:30- Yeah.
- 56:31So a lot of open questions still.
- 56:35- So any other questions from the audience?
- 56:41- I have another comment.
- 56:44So by getting back to this,
- 56:47that you re estimated the coefficients
- 56:49for the weight models.
- 56:50So in sort of the standard marginal structural model,
- 56:54the variability due to those weight models is ignored.
- 56:59And the robust variance is used
- 57:01and said to be an overestimate,
- 57:02implying that if you took that variation into account,
- 57:06you'd get a smaller variance
- 57:08and you might see the same thing here with your bootstraps.
- 57:11If you took the weight models as fixed,
- 57:15you might find that you have a less efficient estimator,
- 57:18which is kind of interesting
- 57:20just in terms of say a methods paper to show,
- 57:23because there's different ways to do bootstraps,
- 57:26but here you're automatically taking the estimation
- 57:30of the weight models into account,
- 57:32which is not saying that say the classic paper
- 57:35by Hernan in epidemiology,
- 57:38that's ignored and the robust variance is recommended.
- 57:42- Hmm.
- 57:43It's a very great comment.
- 57:46Something I have to think about.
- 57:47So you're saying that in each bootstrap,
- 57:52when we estimate the weight model, we fix the weight model.
- 57:57So the coefficients from the weight model stay fixed-
- 58:01- Yeah, so you don't even do a bootstrap for that.
- 58:03You basically hold the weight model as a constant,
- 58:06and then you'd-
- 58:07- Robust variance.
- 58:09- Yeah, you use the robust variance,
- 58:11which I guess it's a little tricky
- 58:13because now you don't have the robust variance
- 58:15because you're not using it,
- 58:16but it seems the bootstrap analog of the approach taken
- 58:21would be to just fit the weight model once,
- 58:24treat that fixed unknown,
- 58:27and then only bootstrap on the outcome model.
- 58:31- Right, right.
- 58:32Yeah.
- 58:33- [Fan Li] Totally. Yeah.
- 58:35- Interesting.
- 58:36Take that in as a note.
- 58:39- So I do have a question as well.
- 58:42I think Liangyuan you had presented two applications
- 58:44at the HIV observational studies,
- 58:47do you see the application that these new methods
- 58:51to other areas as well
- 58:54to solve the other questions? - Yeah.
- 58:57Yeah, actually this is not pertaining to HIV area.
- 59:02It's actually in the public health areas.
- 59:06A lot of questions are involving
- 59:13this statistical formulation.
- 59:16So for example,
- 59:18I've been collaborating with an epidemiologist at Columbia.
- 59:23They are doing cardiovascular research.
- 59:27So one research question is that,
- 59:31I think it's blood pressure lowering intervention.
- 59:37So blood lowering innovation is very useful
- 59:40for preventing cardiovascular diseases,
- 59:45but they don't know.
- 59:47And there also a lack of randomized control trials.
- 59:50What is the optimal threshold
- 59:53to start giving the blood lowering treatment?
- 59:57So this is exactly the same form
- 59:59as our second motivating example.
- 01:00:01Like what is the optimal CD4 threshold
- 01:00:04to start the HIV treatment?
- 01:00:06And their question is what is the optimal threshold
- 01:00:09to start the blood lowering treatment?
- 01:00:13So I think there's a lot of possibility
- 01:00:20as to apply these kinds of methods
- 01:00:22in other health research area.
- 01:00:24- Yeah, it's a huge controversy
- 01:00:26in terms of the treatment of hypertension,
- 01:00:28what's the optimal blood pressure
- 01:00:31to start antihypertensives.
- 01:00:33And I think there was a very large trial
- 01:00:35that showed that it was better to start it
- 01:00:38at a much earlier threshold than what current practices.
- 01:00:42And it's very troublesome for people around the world
- 01:00:47because these medicines are expensive.
- 01:00:49And if you see now,
- 01:00:51like another like 40% of the population
- 01:00:54should now be initiated a antihypertensive medication,
- 01:00:58well, most countries can't even afford that.
- 01:01:01So the implications of these different thresholds
- 01:01:05is a very big topic of sort of substantive research
- 01:01:09and debate right now.
- 01:01:10- Well, that's great to know,
- 01:01:12there's urgent need for that.
- 01:01:14(indistinct)
- 01:01:16- Totally.
- 01:01:17All right, I think we are at the hour,
- 01:01:19so thanks Liangyuan again for your great presentation
- 01:01:25and if the audience has any questions,
- 01:01:28I'm sure Liangyuan is happy to take any questions offline
- 01:01:31by emails.
- 01:01:33And I think this is the final seminar of our fall series,
- 01:01:38and I hope to see everyone next spring,
- 01:01:40have a good holiday.
- 01:01:42Thank you.
- 01:01:43- Thank you.
- 01:01:44- Bye. - Bye.