Skip to Main Content

Addressing Bias in Causal Effects Estimated Under Mis Specified Interference Sets, with Application to HIV Prevention Trials

April 19, 2024
  • 00:01<v Laura>All right, let's get started.</v>
  • 00:03Thank you, everyone, for coming.
  • 00:05So let me introduce our speaker today.
  • 00:08Ariel Chao is a PhD student in the department
  • 00:11of biostatistics, advised by me and Donna Spiegelman.
  • 00:16So let me say few things about her, about Ariel.
  • 00:21So I've been working with Ariel for three years now
  • 00:24and I have to say it's been a real pleasure.
  • 00:27Ariel is an extraordinary student, very patient,
  • 00:30definitely her characteristic and independent.
  • 00:34I've always been impressed by her creativity
  • 00:37and the way she would always find solutions by herself.
  • 00:41We have been having issues
  • 00:42with getting data from our collaborators
  • 00:45and she never gave up and found ways to keep working
  • 00:48on what she had while waiting.
  • 00:51So she deeply cares about the applications
  • 00:53and she's working on and she has a great intuition.
  • 00:57I also been impressed
  • 00:59on how she can work in several things at the same time.
  • 01:03And as you will see today, she's very talented
  • 01:06and I wish her the best for her future career.
  • 01:10Before that, today, she will present her work
  • 01:13on addressing bias in causal effects,
  • 01:15estimated underspecified interference sets
  • 01:18with application to HIV prevention trials.
  • 01:22So let's give Ariel a more welcome.
  • 01:26Ariel, (crackling drowns out speaker).
  • 01:30<v Speaker>Let me just add,</v>
  • 01:32Ariel has a lot of material to present
  • 01:34so we decided to not take questions while she's talking
  • 01:38or she'll never get through the necessarily.
  • 01:40And then we're gonna allow
  • 01:42for around 10 minutes at the end for questions.
  • 01:44So write down the questions
  • 01:45and then we'll try to give as many people a chance
  • 01:48to ask her questions at the end.
  • 01:51<v Laura>I will keep track of it.</v>
  • 01:53<v Speaker>I'm just monitoring the chat.</v>
  • 01:55<v ->Oh yes, can someone, 'cause I don't think I can see you.</v>
  • 01:58<v Speaker>I can see you.</v>
  • 02:01<v ->All right.</v>
  • 02:02So thank you, Laura, and it's been a real pleasure
  • 02:05working with you as well.
  • 02:06So today, I'll be presenting on my dissertation research,
  • 02:10which is on addressing bias in causal effects
  • 02:14estimated under misspecified interference sets.
  • 02:16And we've applied our methods through the analysis
  • 02:19of HIV prevention trials.
  • 02:23So as an introduction, so interference
  • 02:26or spillover is often present in either randomized
  • 02:29or observational studies.
  • 02:31Whereby interference, we mean
  • 02:32that a participant's outcome can be determined
  • 02:35by not only their own exposure
  • 02:37but also the exposure of others.
  • 02:38So a common example is with vaccines.
  • 02:42So say, my disease status is not only affected
  • 02:44by my own vaccination status,
  • 02:46but also the vaccination status of others around me.
  • 02:49And in the context of HIV prevention trials,
  • 02:52it's been found in several network-based studies
  • 02:54that when only some participants of a network
  • 02:57are trained on say HIV knowledge
  • 03:00or safe practices, that the members
  • 03:03who are weren't trained in the network
  • 03:05also demonstrated increased knowledge
  • 03:07and reduced risk behaviors.
  • 03:09And this is known as disability effect.
  • 03:12So causal inference, that is conducted the presence
  • 03:17of interference is often done under assumptions
  • 03:19on the extent and mechanism of interference.
  • 03:22And typically, this will require a specification
  • 03:25of an interference set for each participant.
  • 03:28Whereby interference sets,
  • 03:29we mean that a group of individuals
  • 03:32who can affect the outcome of that participant.
  • 03:35And then to this interference set,
  • 03:37we also typically apply an exposure mapping function
  • 03:41that will take the exposure vector
  • 03:43observed in this interference set
  • 03:45and map it to some scaler quantity.
  • 03:47And we'll see some examples of this later.
  • 03:51So existing literature interference sets
  • 03:53are typically assumed to be correctly specified
  • 03:56so that the exposures
  • 03:57that are mapped from these interference sets
  • 03:59are also correctly measured.
  • 04:01But often, this correctly specifying
  • 04:04an interference set is challenging.
  • 04:06For example, networks can be mismeasured
  • 04:10and when interference sets are misspecified,
  • 04:12we show under various settings that causal effects estimated
  • 04:15by usual purchase are typically biased.
  • 04:18And there have been several publications
  • 04:20that have addressed this issue.
  • 04:22And the majority of these publications aim
  • 04:25to first estimate the true networks
  • 04:27and then using these estimated networks
  • 04:29to estimate the causal effects.
  • 04:31And there have also been methods proposed
  • 04:32for a sensitivity analysis as well.
  • 04:36However, we pursue a different approach where we assume
  • 04:39that we have a validation study
  • 04:41in which the true interference sets are measured alongside
  • 04:44the observed or surrogate ones for a subset
  • 04:46of the study sample.
  • 04:48And this will allow us
  • 04:49to empirically estimate the measurement error process
  • 04:52and use the estimated measurement error parameters
  • 04:55to bias correct causal effects.
  • 04:58So again, this dissertation is a collection of three papers
  • 05:01where we first consider the setting
  • 05:03of an egocentric network randomized trial
  • 05:06where at most one person per network
  • 05:08can receive the intervention.
  • 05:10Then we extend our methods
  • 05:12to consider cluster randomized trials
  • 05:13where multiple participants per cluster
  • 05:15can receive the intervention.
  • 05:17And we also consider general settings
  • 05:19where interference sets can be mismeasured
  • 05:22and the exposure is not necessarily randomized.
  • 05:27So I'll begin with the first paper
  • 05:29on egocentric network randomized trials.
  • 05:32So under this design, we have index participants
  • 05:34who are recruited into this study
  • 05:36and they're each asked to nominate a set of network members,
  • 05:40which can be their drug injection partners or sex partners,
  • 05:44and they form egocentric networks.
  • 05:46And the index participants are the ones in the study
  • 05:49who are randomized to receive their intervention.
  • 05:52And examples of this are typically
  • 05:57be peer education or behavioral-based.
  • 05:59And the index participants are asked to encourage
  • 06:03behavioral change to their network members.
  • 06:07So for some notation,
  • 06:08we have participant ik being the i participant
  • 06:11in the k network.
  • 06:13And we'll let i equal one denote the index participant
  • 06:16in each network and I incur then one
  • 06:18denote the network members.
  • 06:20We'll also define a network neighborhood for participant ik,
  • 06:24which comprises of participants
  • 06:26who share a network link with ik.
  • 06:30And then we also have a true membership matrix
  • 06:33which essentially represents whether a participant
  • 06:37is a network member of a certain index.
  • 06:39And we also have an intervention assignment indicator,
  • 06:45which again the intervention is randomized
  • 06:46and only received by the index member.
  • 06:51So we'll let, throughout this dissertation,
  • 06:54represent an individual exposure to the intervention.
  • 06:57So because in here in NNRT, only index participants
  • 07:01who are randomized to treatment can receive the treatment
  • 07:05and therefore, A is only equal to one for a treated index
  • 07:09and A is equal to zero for everyone else.
  • 07:13And so to define potential outcomes under interference,
  • 07:16we need to make assumptions on the interference structure.
  • 07:18So here, we assume neighborhood interference
  • 07:22with an exposure mapping function, which essentially says
  • 07:26that I case potential outcome is determined
  • 07:28by their own individual exposure
  • 07:31and the exposures of those in I case network neighborhood
  • 07:34and not anyone outside of it,
  • 07:36including participants from other networks
  • 07:39and out of study individuals.
  • 07:42So we further apply an exposure mapping function
  • 07:46to this network neighborhood
  • 07:48and in this paper, we consider an exposure mapping function
  • 07:51defined by the number of treated neighbors.
  • 07:54So under this assumption,
  • 07:58ik's potential outcome is given by y indexed by A and G,
  • 08:01which is their individual exposure
  • 08:02and the spillover exposure given by the number
  • 08:06of treated neighbors in their neighbor neighborhood.
  • 08:11So this figure is a representation
  • 08:14of two networks where in order
  • 08:16to define the spillover exposure,
  • 08:19we further make the assumption
  • 08:21that the networks are not overlapping.
  • 08:23And so this means that index participants
  • 08:25cannot be connected amongst themselves
  • 08:28and network members can only be connected
  • 08:30to one index participant.
  • 08:32And by making this assumption we can obtain G
  • 08:36by multiplying M and R, which is the membership matrix
  • 08:39and intervention assignment.
  • 08:41And so the spillover exposure
  • 08:42for each participant is only determined
  • 08:44by whether they are connected to a treated index member,
  • 08:47which is shown in this figure.
  • 08:53So the causal estimate of interest in this paper,
  • 08:56the average spillover effect which is the impact
  • 08:58of the intervention on the network members.
  • 09:01And we here, we define the spillover effect
  • 09:03as a risk difference and as a risk ratio.
  • 09:06And under assumptions of positivity
  • 09:08and unconfoundedness which is guaranteed in the ENRG design
  • 09:13under perfect treatment compliance,
  • 09:20we can identify the spillover effects
  • 09:22using observed outcomes and estimate them
  • 09:25using observed outcomes as well.
  • 09:29So under the unconfoundedness assumption,
  • 09:33if we had data on the true exposures, we would estimate
  • 09:36the spillover effect using sample average estimators
  • 09:40using the two by two table at the top.
  • 09:43The issue with this is that in ENRTs, the networks
  • 09:46that are observed, which are the ones
  • 09:47that are collected at study baseline,
  • 09:49they may not represent the true connections that take place
  • 09:52during the study period.
  • 09:54So for example, a network member can fall out of touch
  • 09:56with the index participant that they enrolled with
  • 09:59and they can also befriend another index of another network.
  • 10:04And so under these observed networks,
  • 10:06there's spillover exposure may also be misclassified.
  • 10:10And using these misclassified spillover exposures,
  • 10:14we will instead estimate the spillover effect
  • 10:16using the two by two table at the bottom of this slide.
  • 10:20And we show that the estimated spillover effects
  • 10:23under this table would be biased.
  • 10:27And here's a representation of the types
  • 10:29of network misclassification that can occur in ENRT.
  • 10:34So here, the black links represent
  • 10:36the correctly measured network ties.
  • 10:39The blue links represent network links that are observed
  • 10:42but are in fact not true.
  • 10:44And the red links represent the ones that are not observed
  • 10:47but in fact occur during the study period.
  • 10:50And because of these types of misclassification
  • 10:55person's truth, spillover exposure
  • 10:56can be different from their observed one.
  • 11:02In this paper, we further assume
  • 11:04non-differential misclassification,
  • 11:07which is that the misclassification process
  • 11:09doesn't depend on potential outcomes.
  • 11:12So under this assumption we derive an expression
  • 11:15for the bias using four parameters,
  • 11:17which are the baseline Malcolm rate,
  • 11:20the true spillover risk ratio, PM, which is the probability
  • 11:25of being classified into the correct network as well as PR,
  • 11:28which is the intervention allocation probability.
  • 11:32And using these expressions we can show
  • 11:34that there's no bias when PM is one,
  • 11:36which is when everyone is correctly classified
  • 11:38due to the correct network.
  • 11:41We can also show that the bias is always towards the null
  • 11:44under the non differential misclassification assumption.
  • 11:47So the ASP would always be underestimated
  • 11:50under this assumption
  • 11:51if spillover exposures were misclassified.
  • 11:57So in order to correct for this bias,
  • 11:59we use a validation study.
  • 12:01So again, this is where the true network
  • 12:03or spillover exposure is measured
  • 12:05alongside the mismeasured ones
  • 12:07for a subsample of the main study.
  • 12:09And then in this paper, we estimate the sensitivity
  • 12:12and specificity of spillover exposure classification
  • 12:15among network members and we assume that the parameters
  • 12:18that are estimated in the validation study
  • 12:20is generalizable to the main study.
  • 12:24We can show that the sensitivity
  • 12:26and specificity can be expressed as functions of PM and PR
  • 12:32where the intuition is that if a participant
  • 12:35or if a network member is observed to be connected
  • 12:38to a treated index, given that there really are connected
  • 12:43to a treated index,
  • 12:44this can be because they are correctly classified
  • 12:47or it could be because they were misclassified
  • 12:49but still connected
  • 12:50to a treated index just from another network.
  • 12:54We can also estimate the sensitivity
  • 12:55and specificity using the two by two table.
  • 12:58Here, when we assume that the misclassification process
  • 13:02doesn't depend on covariate.
  • 13:07So we propose three estimators in this paper.
  • 13:10The first is called the matrix method estimator.
  • 13:13And here, this estimator takes the estimated sensitivity
  • 13:17and specificity from the validation study
  • 13:20to bias correct accounts observed
  • 13:22from the two by two table in the main study.
  • 13:25And this would be the form of the bias corrected estimators
  • 13:27for this spillover effect.
  • 13:29And we can obtain its variance
  • 13:31by the multivariate delta method.
  • 13:33And if we believe that there is clustering in the study,
  • 13:36we can also adjust for this by a design effect inflation
  • 13:40or we can perform network bootstrapping
  • 13:43where we re-sample networks as whole.
  • 13:46And we know that to use this method
  • 13:49and for this method to perform well,
  • 13:52there needs to be constraints on the value of sensitivity
  • 13:55and specificity for the estimator to be stable
  • 13:58and to avoid estimating negative cell counts.
  • 14:02So when these constraints are not met,
  • 14:05we can instead consider an inverse matrix method estimator
  • 14:10which corrects the cell counts in the two at two table
  • 14:13in the main study using the positive
  • 14:15and negative predictive values
  • 14:16instead of the sensitivity and specificity.
  • 14:19And this method uses the PPV and MPV
  • 14:22estimated separately for those with and without the outcome.
  • 14:26And therefore the matrix method estimated
  • 14:28may be more efficient relative to this estimator
  • 14:31if the outcome is rare
  • 14:32and the validation study is small
  • 14:37and the last estimator we considered
  • 14:39was a likelihood-based estimator
  • 14:42because while the matrix
  • 14:43and inverse matrix estimators are easily implemented,
  • 14:47there's no clear way of directly incorporating the effect
  • 14:49of clustering into the entrance.
  • 14:52And therefore, we can specify an outcome model
  • 14:55including a network random effect
  • 15:00to account for clustering by networks
  • 15:02and using the likelihood specified here,
  • 15:06we can obtain the MLE of the ASP
  • 15:10and its variance by the inverse
  • 15:13of the observation information matrix.
  • 15:19Right, so I'll go over our application
  • 15:23of our methods using the HPTN 037 study,
  • 15:26which was an ENRT that was conducted in Philadelphia
  • 15:30and Chiang Mai Thailand
  • 15:31where in the study indexes were randomized
  • 15:34to receive an intervention that consisted
  • 15:36of a peer education training where they were encouraged
  • 15:42to disseminate HIV knowledge
  • 15:44and injection sexual risk reduction behaviors
  • 15:47with their network members.
  • 15:49In the study, we were interested in looking at the effect
  • 15:53of the intervention on any self-reported HIV risk behaviors
  • 15:57at one year after study enrollment.
  • 16:01Here, we define G star,
  • 16:03which is they observed spillover exposure
  • 16:05based on the intervention assigned to the networks
  • 16:07and receive by their index members.
  • 16:09And we define G the true spillover exposure
  • 16:17based on an exposure contamination survey
  • 16:19that was taken at six months post baseline.
  • 16:22So in the survey, participants were asked
  • 16:24to recall five specific terminologies associated
  • 16:28with the intervention training.
  • 16:31So we suppose that if a network member were able
  • 16:33to recall any of these five terms, that they were exposed
  • 16:36to the intervention through a treated index
  • 16:38and if they weren't able to recall
  • 16:40any of the terms, then they weren't exposed.
  • 16:43And because there was a possibility
  • 16:45that network members may be exposed to the training
  • 16:48but just didn't remember any of the terms
  • 16:51or that there may be network members
  • 16:53who were not exposed to the training
  • 16:56but just said that they remember terms
  • 16:57because of social desirability,
  • 16:59we only included network members
  • 17:01who recalled the positive control term which was exposed
  • 17:04to everybody regardless of their randomization arm
  • 17:08and none of the negative control terms,
  • 17:09which none of the participants were supposed to know
  • 17:14that only these participants
  • 17:15were included in the validation study
  • 17:16so we can more accurately estimate
  • 17:19the sensitivity and specificity.
  • 17:23So here are the effects of the intervention
  • 17:27or the spillover effects of the intervention
  • 17:29on risk behaviors were from the validation study we see
  • 17:34that there was indeed some degree
  • 17:36of network misclassification where the sensitivity was 60%
  • 17:41and specificity was 79%.
  • 17:44So then intent-to-treat estimator uses G star,
  • 17:47which is the intervention assigned to the networks.
  • 17:50And we do see already
  • 17:51that there was significant spillover effect
  • 17:54of the intervention on reducing risk behaviors.
  • 17:57And this effect was amplified
  • 17:59after applying our bias correction method.
  • 18:03So we applied the matrix method estimator as well as the MLE
  • 18:07and the inverse matrix method
  • 18:11was not an ideal choice in this study
  • 18:14because of our small validation study
  • 18:15and the number of participants
  • 18:18who had the outcome within the study.
  • 18:21Here, we also compared several standard errors were first,
  • 18:26we consider standard standards obtained
  • 18:27from the delta method and those inflated
  • 18:30by the design effect.
  • 18:32And we see that the confidence intervals here
  • 18:34were pretty wide, which is due
  • 18:36to the small validation study sample size.
  • 18:40However, when we consider network bootstrapping
  • 18:42or the likelihood base method,
  • 18:44we see that the confidence interval significantly narrowed
  • 18:47and we were able to see a significant spillover effect.
  • 18:54So as a summary of this first paper,
  • 18:56we proposed several bias correction estimators
  • 18:58for this spillover effect
  • 19:00to address network misclassification and NRTs.
  • 19:04So our methods here assume that both the exposure
  • 19:07and outcome are binary measures
  • 19:09and we did not consider covariate adjustment
  • 19:12because the intervention is randomized.
  • 19:15And so as a segue to the second paper,
  • 19:18we will be developing methods
  • 19:20for non-binary exposures outcomes
  • 19:23as well as allowing for covariate adjustment.
  • 19:26And we develop these methods in the setting
  • 19:27of cluster randomized trials.
  • 19:35So causal inference in cluster randomized trials
  • 19:38in CRTs often rely on the assumption
  • 19:41of partial interference,
  • 19:43which is that participants are separated
  • 19:46into non-overlapping clusters
  • 19:47and interference is assumed
  • 19:50to be only contained within these clusters
  • 19:52and not across clusters.
  • 19:54And this assumption is typically made
  • 19:56because there is an absence of social network data and CRTs.
  • 20:02So interference sets define other departure
  • 20:06interference assumptions are usually given
  • 20:07by the randomization clusters in the trial.
  • 20:10So this can be villages or communities
  • 20:16and the interference says they're given
  • 20:17by the randomization clusters can be measured with there
  • 20:20because they might be a lot larger than the true networks
  • 20:24if they were considered to be whole communities.
  • 20:26And also interactions can exist
  • 20:28across these communities as well.
  • 20:31So this figure was taken from a file genetic analysis
  • 20:35from BCPP where BCPP was in HIV prevention CRT
  • 20:40that was conducted in 30 communities in Botswana.
  • 20:44And so the randomization clusters were communities
  • 20:48but from the phylogenetic analysis where they sequenced
  • 20:52HIV genes viral sequences
  • 20:58that they saw that the viral transmission chains obtained
  • 21:00from the sequences, the majority of them
  • 21:04actually crossed two or more communities,
  • 21:06which was an indication
  • 21:07of high-end cluster mixing in this study.
  • 21:10And interference says that are defined
  • 21:11just by communities would be misspecified in this case.
  • 21:18So here we again have participant ik
  • 21:20as the i participant indicate cluster.
  • 21:24We have script one and to denote the study sample.
  • 21:28In this study, we first consider a two-stage CRT
  • 21:33where clusters are first randomized
  • 21:36to an intervention allocation strategy.
  • 21:38Here, we consider strategies alpha one and alpha two
  • 21:42and alpha one and alpha two are probabilities.
  • 21:44And under a balanced design,
  • 21:46half of the clusters would be assigned to alpha one
  • 21:48and half would be assigned to alpha alpha two.
  • 21:51Then after the first stage randomization,
  • 21:55participants within these clusters
  • 21:57would be randomized to receive the intervention
  • 21:59with the probability equal to the one
  • 22:01that was assigned to their cluster.
  • 22:04And we'll extend our methods to consider general CRTs,
  • 22:07which can be considered as a special case
  • 22:10of a two-stage design where alpha one
  • 22:12and alpha two are one and zero, which means
  • 22:15that clusters are randomized to intervention or control
  • 22:18and there isn't a second stage randomization
  • 22:20at the participant level.
  • 22:25So we again have to denote the individual exposure.
  • 22:29And to define potential outcomes in the setting,
  • 22:32we first define a subset of the study sample
  • 22:35for participant ik and we denote this by script I.
  • 22:40So here, we make the partial interference assumption
  • 22:45where ik's potential outcome is influenced
  • 22:48by their own exposure as well as the exposures
  • 22:52of the participants within this subset
  • 22:55and not anyone outside of this subset.
  • 22:57So because only the exposures of the participants
  • 23:00will affect the outcome of ik, we call this subset
  • 23:05for ik's interference set.
  • 23:08And we can further apply an exposure mapping function
  • 23:11to this interference set
  • 23:14to obtain a scaler quantity of a spillover exposure.
  • 23:17Here, we consider stratify interference,
  • 23:22which essentially assumes that spillover occurs
  • 23:25through the proportion of treated participants
  • 23:27in the interference set regardless of who they are.
  • 23:30So the spillover exposure would be given by disproportion
  • 23:34and as in the first paper,
  • 23:35we can index potential outcomes by A and G.
  • 23:41Here, we consider four causal effects
  • 23:45which the individual effect, spillover effect,
  • 23:48total effect and overall effect.
  • 23:50The individual effect is the effect
  • 23:53of the individual exposure under a fixed spillover exposure.
  • 23:57And on the other hand, the spillover effect is the effect
  • 24:01of the spillover exposure under a fixed individual exposure.
  • 24:05And then the total effect is the effect
  • 24:07of having both an individual exposure to the intervention
  • 24:10and some degree of spillover exposure
  • 24:12versus neither type of exposure.
  • 24:15And then the overall effect compares the effect
  • 24:17of being assigned to a cluster randomized
  • 24:21to treatment allocation strategy alpha versus alpha.
  • 24:27So these causal effects can again be identified
  • 24:32under the assumption the identifying assumptions
  • 24:35we made in the paper earlier
  • 24:37or as in the first paper,
  • 24:38which were the unconfounded assumption
  • 24:42which would hold under a two-stage design
  • 24:44given perfect treatment compliance.
  • 24:47And we estimate these effects
  • 24:49using a regression-based estimation approach,
  • 24:52which is consistent
  • 24:53and efficient under a correctly specified model
  • 24:56for the potential outcome.
  • 24:59So here, we consider an outcome model in this form
  • 25:06where we include a cluster random effect to account
  • 25:10for the effect of clustering and the inference
  • 25:12and we also have an interaction between A and G
  • 25:15so that we can allow the individual effect to vary
  • 25:18with G and the spillover effect to vary with A.
  • 25:23So once we have the estimated coefficients from this model,
  • 25:27we can estimate causal effects
  • 25:30using these estimated coefficients.
  • 25:36So again, in CRTs,
  • 25:38because we don't have data on social connections
  • 25:41and when we consider interference to be given
  • 25:43by randomization clusters, they can be measured with error.
  • 25:47And as a consequence, this spillover exposure
  • 25:49can also be measured with error.
  • 25:51So we have shown in this paper
  • 25:53that when the outcome model is fit with G star instead of G,
  • 25:57the estimated model coefficient will be biased
  • 26:00and the causal effects estimated
  • 26:01with these bias coefficients were therefore also be biased.
  • 26:08And to correct for the bias
  • 26:10in these regression coefficients,
  • 26:12we apply a regression calibration approach
  • 26:15which is developed under the assumption
  • 26:17that the measurement error is additive
  • 26:20and also the non differential measurement error
  • 26:22as in the previous paper.
  • 26:25So to apply this method,
  • 26:27we will first regress the outcome
  • 26:30on the mismeasured exposure
  • 26:31in the main study as we would
  • 26:34under the intent-to-treat analysis.
  • 26:36In the validation study
  • 26:37because we assumed that the measurement error is additive,
  • 26:41we fit a linear measurement error model of the true exposure
  • 26:45given the mismeasured spillover exposure.
  • 26:49And then we can obtain bias corrected regression
  • 26:51coefficients using the coefficients
  • 26:53obtained from these two models.
  • 26:56And we can obtain the variance
  • 26:59of these corrected coefficients using the delta.
  • 27:06We can also extend this approach to account
  • 27:09for covariate adjustment
  • 27:11and there may be several reasons why we need
  • 27:13to adjust for covariates.
  • 27:15First, if we step out of the two stage CRT setting
  • 27:18and we consider a general CRT where intervention is work,
  • 27:22clusters are assigned to either intervention or control.
  • 27:25And a lot of public health studies that the interventions
  • 27:29that are given to these clusters may be prone
  • 27:31to non-compliance.
  • 27:33And intervention uptake will depend
  • 27:36on individual characteristics
  • 27:38that may need to be accounted for.
  • 27:41So in the when there are confounders between the outcome
  • 27:46and the individual exposure, A or G,
  • 27:50we would need to assume conditional unconfoundedness
  • 27:55of the individual exposure exposures.
  • 27:57So when there are covariates or confounders between Y and A,
  • 28:02we would adjust them in the outcome model.
  • 28:05And if they were confounders between Y and G,
  • 28:09we would adjust for them in the outcome model
  • 28:11as well as in the measurement error model.
  • 28:15We might need to also make the non-differential
  • 28:20measurement error assumption conditional on covariates.
  • 28:25And in this case, because these covariates are related
  • 28:27to the measurement error as well as the outcome.
  • 28:30They would need to be adjusted in both models as well.
  • 28:34And lastly, we may only be able
  • 28:36to generalize the measurement error parameters estimated
  • 28:40in the validation study to the main study
  • 28:43conditional on covariates.
  • 28:44And in this case, we would adjust
  • 28:46for these covariates as well.
  • 28:50But regardless of the types of covariates that are adjusted
  • 28:53for the regression calibration estimators
  • 28:56and variance estimators
  • 29:01for the coefficients that are of interest that are used
  • 29:04to estimate the causal effects, they would not be changed
  • 29:08as in the case without the variates.
  • 29:14We've applied our methods to the BCPP study,
  • 29:19which is, which was a HIV prevention CRT
  • 29:22and 30 Botswana communities that was conducted
  • 29:26between 2013 and 2018.
  • 29:28And this trial was to assess whether an intervention package
  • 29:33will reduce HIV incidents.
  • 29:35So in this trial, 15 communities were randomized
  • 29:39to receive intervention package
  • 29:41that included HIV testing, linkage to care,
  • 29:45and early ART initiation for those who are HIV positive
  • 29:49as well as increased access
  • 29:50to voluntary medical male circumcision.
  • 29:53And the other 15 communities
  • 29:55were randomized to a standard of care.
  • 29:58So in the primary analysis they found
  • 30:00that there were decreased incident rates
  • 30:03and increased file suppression rates
  • 30:05in the intervention communities
  • 30:06compared to control communities.
  • 30:08And in our application, we are accounting for non-compliance
  • 30:12to the components where we analyzed the individual's
  • 30:15spillover, total, and over effects
  • 30:17of the package intervention
  • 30:18that was received on behavioral and clinical outcomes.
  • 30:23And here, we consider the communities
  • 30:26to be the misspecified interference sets
  • 30:29and we determined the true exposures
  • 30:33using phylogenetics data, which we consider
  • 30:36as our validation data.
  • 30:41So the phylogenetic data was obtained from the study shown
  • 30:45in the beginning of this section
  • 30:46where they found viral transmission chains
  • 30:49that crossed multiple clusters.
  • 30:53So here, they approached HIV positive individuals
  • 30:56in the study and obtained blood samples from them
  • 30:59and they were able to sequence their viral genomes
  • 31:03and construct HIV clusters
  • 31:06where a participants group in the same viral cluster
  • 31:09were implied to be from the same viral transmission chain.
  • 31:14So here, each viral cluster they found to be composed
  • 31:18of two to 27 participants who were
  • 31:19from one to 16 communities.
  • 31:23And there were several considerations that we had to make
  • 31:27by using the phylogenetic data as our validation data
  • 31:32because the viral clusters only captured participants
  • 31:35were infected by the same HIV strain.
  • 31:38And would not necessarily represent a participant's
  • 31:42entire true interference set.
  • 31:45So we had to make some assumptions
  • 31:47to obtain the true spillover exposure
  • 31:50using this phylogenetic data where the first,
  • 31:53we consider the connections observed
  • 31:56within the viral cluster were representative
  • 32:01of the participants
  • 32:03who were HIV positive in ik's true interference set.
  • 32:08And then we also assume transportability
  • 32:12of the measurement error process where we assume
  • 32:15that the inter-cluster interactions
  • 32:17that we observed from the viral clusters
  • 32:19would've been the same among those who were HIV negative.
  • 32:23And lastly, we considered
  • 32:25that because those who are HIV positive
  • 32:26might not have the same characteristics
  • 32:28for intervention uptake as those who are HIV negative.
  • 32:32We derived the true spillover exposure
  • 32:35based on a weighted average of cluster intervention uptake.
  • 32:40So for example, if there were five participants observed
  • 32:44in the viral cluster
  • 32:46from two different randomization communities,
  • 32:49then we take the intervention uptake
  • 32:52of these two communities
  • 32:53and waited by the portion of participants
  • 32:56from each community that were observed in the viral cluster.
  • 33:04And so here are some details on the intervention components
  • 33:09for the study.
  • 33:11I don't know, do I have enough to?
  • 33:13Okay, so basically,
  • 33:16there are four components to this intervention package
  • 33:19and these four components were eligible
  • 33:22to different study populations.
  • 33:24So here are the eligibility criteria that we had considered
  • 33:28for our application where for testing, we considered
  • 33:33that they were eligible for testing if they did not have
  • 33:35documented HIV positive status prior to baseline
  • 33:38and participants were eligible for HIV care
  • 33:41and ART initiation if they were HIV positive at baseline.
  • 33:47And for circumcision, we considered someone
  • 33:51to be eligible for this treatment
  • 33:54if they were an HIV negative male at baseline
  • 33:56who had not been circumcised.
  • 33:59And we also considered several definitions
  • 34:02of the individual exposure which were receiving
  • 34:05at least one of these intervention components
  • 34:08or receiving all eligible components
  • 34:10versus some or none of them.
  • 34:13In this paper, we also considered three outcomes.
  • 34:17First was a behavioral outcome that we defined
  • 34:20as a sexual risk behavior score
  • 34:22and these is defined as the number
  • 34:26of self-reported behaviors that they had reported
  • 34:30at their survey interview at one year post baseline.
  • 34:35And then we also looked at two clinical outcomes,
  • 34:37which were viral load at one year post baseline
  • 34:40and HIV incidents by the end of the study.
  • 34:46Before we looked at the effect
  • 34:48of receiving the individual components,
  • 34:50we first assessed the overall effect of being assigned
  • 34:54to an intervention cluster versus control
  • 34:57on these three outcomes where the ITT estimates
  • 35:01were conducted assuming that the interference sets
  • 35:06were communities.
  • 35:08And we see that there was a minimal overall effect
  • 35:15of cluster assignment on decreasing sexual risk behaviors.
  • 35:20But there was significant effect on viral load
  • 35:23and incidents where our findings echoed the ones
  • 35:28from the primary analysis where they found
  • 35:30increased viral suppression
  • 35:32and decreased incidents for clusters assigned
  • 35:34to intervention and versus control.
  • 35:37And after bias correction we see
  • 35:39that these effects are again amplified,
  • 35:43which was expected due to the high levels
  • 35:45of inter-cluster mixing where say,
  • 35:48some preventative measures from intervention communities
  • 35:52may have gone into the control communities
  • 35:54and some incidents observed in intervention communities
  • 35:57may have been attributable to control communities.
  • 36:03And we also looked at the effect
  • 36:05of receiving at least one component
  • 36:08on essential risk behavior score where we see
  • 36:12that after applying our bias correction method,
  • 36:15that there was a significant total
  • 36:17and overall effect of receiving at least one component
  • 36:21on decreased sexual risk behaviors.
  • 36:27And there was also a significant individual effect
  • 36:30of receiving both HIV care
  • 36:33and ART on decreased viral load which was expected.
  • 36:42So here, we proposed methods to bias correct
  • 36:47causal effects estimated underspecified interference
  • 36:50sets in a CRT, although our methods are not restricted
  • 36:54to the setting can be applied to broader settings as well.
  • 36:59And to use our regression calibration method,
  • 37:01we had to assume that both the measurement error
  • 37:04and outcome models were correctly specified.
  • 37:08And we also made some assumptions
  • 37:10on the measurement error structure.
  • 37:11So we proposed for a third paper and IPW-based method
  • 37:19where parametric assumptions on the outcome model
  • 37:22were not required and also we didn't need
  • 37:24to make assumptions on the additive
  • 37:26or non-differential nature of the measurement error process.
  • 37:34Okay, so propensity score based methods are widely used
  • 37:39to estimate intervention effects when characteristics
  • 37:43of the exposed and unexposed participants may be unbalanced,
  • 37:47which may be an observational setting
  • 37:50where the exposure is not randomized.
  • 37:55And in particular, we're focused on an IPW estimator
  • 37:58that has been previously extended
  • 38:00to estimate causal effects in the setting of interference.
  • 38:04And this is typically done assuming the interference sets
  • 38:07are known and true.
  • 38:09And in this paper, we show that when interference sets
  • 38:15are mismeasured and spillover exposures are mismeasured
  • 38:18as a consequence, there is an error
  • 38:19in not only the spillover exposure
  • 38:21but also in the propensity score estimates.
  • 38:28So for notations, here, we have,
  • 38:31we're outside of the network and cluster setting
  • 38:34so we have just i from one to end participants.
  • 38:37Here, the individual exposure status may depend
  • 38:40on observed individual covariates.
  • 38:43And also, here, we assume
  • 38:45the pressure interference assumption as in our second paper.
  • 38:48Although this method doesn't require
  • 38:51the pressure interference assumption.
  • 38:53We can also make the neighborhood interference consumption
  • 38:55if we were working in a setting of social networks.
  • 39:01In this paper, we define a binary spillover exposure,
  • 39:05although our methods can be generalized
  • 39:07to categorical measures of the spillover exposures as well.
  • 39:10And here we consider an extension
  • 39:13of the stratify interference
  • 39:14that we made in the previous paper where G,
  • 39:18we define by one if the proportion
  • 39:21of treated participants in interference set exceeds
  • 39:23a certain pre-specified threshold.
  • 39:27And again, credential outcomes are indexed by A and G.
  • 39:31In this paper, we're interested in the individual spillover
  • 39:35and total effects.
  • 39:39So this is the IPW estimator
  • 39:42for the average potential outcome
  • 39:45where in the denominator, we have
  • 39:47an estimated joint propensity score for the individual
  • 39:51and spillover exposures.
  • 39:53And this can be expressed as the product
  • 39:55of the individual exposure propensity score
  • 39:58and the spillover exposure propensity score.
  • 40:00And these can be estimated
  • 40:01using (indistinct) regression models.
  • 40:04And we can obtain the variance of this estimator
  • 40:07by bootstrap resampling where we can resample
  • 40:11at the individual level or at the cluster level
  • 40:15if we were working in a setting with clusters
  • 40:17as in our second paper.
  • 40:20And this estimator is consistent, if the models
  • 40:23for the propensity scores are correctly specified.
  • 40:29So as in the previous cases when interference specified,
  • 40:35we would observe G star instead of G.
  • 40:38And if we were to use G star in the IPW estimator,
  • 40:43we would get a biased estimate
  • 40:45because the expected value of this estimator is given
  • 40:49by the form shown in the bottom here where we see
  • 40:53that this estimator is only unbiased if the probability
  • 40:57observing the true exposure equal to G,
  • 40:59given that the spillover exposure
  • 41:02is also equal to G is equal to one,
  • 41:04which means that there's no measurement error.
  • 41:07And also from the form of this expectation, we can also see
  • 41:12that the bias can be eliminated if we divide both terms
  • 41:17on the right-hand side by this measurement error probability
  • 41:21and then subtracting away the second term.
  • 41:25Which is the approach that we took.
  • 41:29And this was an approach that was first proposed by brown
  • 41:33and colleagues in the setting without interference.
  • 41:36And here, we extended this estimator
  • 41:38to the setting of interference.
  • 41:42So from this bias corrected IPW estimator, we see
  • 41:46that on the right-hand side in the first term,
  • 41:49we have the IPW estimator that is estimated
  • 41:53in the main study.
  • 41:55We also have an IPW estimator
  • 41:57that is estimated in the validation study alone.
  • 42:00And the measurement error probabilities
  • 42:04are also estimated in the validation study.
  • 42:09And because here, we are estimating potential outcomes
  • 42:12in the validation study, we need to assume generalizability
  • 42:17of the potential outcome
  • 42:18and measurement error process in this study
  • 42:21so that the effects that are estimated
  • 42:23in the validation study alone
  • 42:25would be unbiased for the average effect
  • 42:28that would be observed in the main study.
  • 42:33So using these bias corrected IPW estimators,
  • 42:37we can obtain a bias corrected estimator
  • 42:39for the causal effect which is given as contrast
  • 42:42between potential outcomes estimated
  • 42:45using the bias corrected IPW estimators.
  • 42:48And here, we can write this estimator using
  • 42:55with weights, W here.
  • 42:57Where the weights are meant to minimize the variance
  • 43:00of the bias corrected causal effects
  • 43:03and the weights are given at the bottom here
  • 43:07where the variance of variance terms can also be estimated
  • 43:11using bootstrap resampling.
  • 43:17So while this estimator directly eliminates the bias,
  • 43:21it does require the outcome
  • 43:22to be available in the validation study.
  • 43:25So when this is not available,
  • 43:28we propose an alternative estimator
  • 43:30that does not impose this requirement
  • 43:33where we've extended methods proposed by rule
  • 43:36and colleagues to the setting of interference.
  • 43:40And so this is a regression calibration-based approach
  • 43:43where first, we assume that we have a continuous measure
  • 43:47of the spillover exposure.
  • 43:49And we will predict the true continuous spillover exposures
  • 43:53given the observed ones.
  • 43:55And then under the exposure mapping
  • 43:57that we had specified previously with the threshold,
  • 44:00we would dichotomize this proportion.
  • 44:07And the regression calibration based IPW estimator
  • 44:10would use the predicted binary true exposures
  • 44:15as well as the propensity scores estimated
  • 44:17under these predictive values.
  • 44:19And we've shown that as in the previous paper,
  • 44:23that this estimator is only consistent
  • 44:25if a linear measurement error model fits the data
  • 44:32In this paper, we further consider the case
  • 44:35where we might observe multiple surrogate
  • 44:37interference sets in a study.
  • 44:39And this was motivated by our illustrative example of BCPP
  • 44:45where we may consider a surrogate interference set
  • 44:48defined by a randomization cluster.
  • 44:50And we can also consider a second surrogate interference set
  • 44:54that is defined by household GPS data,
  • 44:57which is available in the study.
  • 45:00So when we have multiple surrogate interference sets,
  • 45:04we propose to first apply our bias corrected estimators,
  • 45:08either the first or the regression calibration-based one
  • 45:13to each surrogate interference set individually.
  • 45:16And then we will combine these individual estimates
  • 45:18using a weighted average estimator to reduce the variance
  • 45:21of the final estimate.
  • 45:25So the weights are given by C in the bottom here
  • 45:29where we would estimate the variance variance matrix
  • 45:33of the individual bias corrected causal effects.
  • 45:42Here, similar to the second paper, we've applied our methods
  • 45:45to BCPP where we analyzed the individual's
  • 45:49spillover total effects
  • 45:50of receiving at least one intervention component
  • 45:53on sexual risk behaviors one year after study enrollment.
  • 45:58So as a reminder, the components here are HIV testing,
  • 46:01HIV care, ART and circumcision.
  • 46:06And here, we consider a binary outcome, which we define
  • 46:09by one if a participant had reported having engaged
  • 46:13in at least 30% of the surveyed risk behaviors.
  • 46:18Here, for application,
  • 46:19we consider the randomization clusters
  • 46:22or communities as our first surrogate interference set.
  • 46:26And we also consider a second surrogate interference set
  • 46:30that is defined by smaller geographical plots.
  • 46:33And in these geographical plots, they comprised
  • 46:38of participant two to 18 participants on average,
  • 46:42which were much smaller than randomization clusters,
  • 46:44which were about 400 participants each.
  • 46:49And in both of these interference sets,
  • 46:52we define the spillover exposure to be one if at least 25%
  • 46:57of participants in the inference set received
  • 46:59at least one intervention component.
  • 47:03And as in the second paper,
  • 47:04we determine the true spillover exposures
  • 47:08from the phylogenetic dataset.
  • 47:13So here are the risk differences
  • 47:16of receiving at least one intervention component
  • 47:19on self-reported sexual risk behaviors
  • 47:22where we compare the estimates obtained when we consider
  • 47:25communities at the randomization clusters
  • 47:27or the geographical plot as to the interference sets.
  • 47:32And we compare these to the bias corrected estimates
  • 47:35where here, I'm presenting the estimates
  • 47:38of coming from the weighted average
  • 47:40of the bias corrected estimates applied individually
  • 47:43to the community and to the geographical plots.
  • 47:48Where here, under the circuit interference,
  • 47:52as we see that most effects were null.
  • 47:56However, after bias correction, we see
  • 48:00that there is a beneficial AIE when G is equal to one
  • 48:05and beneficial ASP when A is equal to one.
  • 48:08Which means that for participants
  • 48:11who received at least one component of the intervention,
  • 48:14if there were in the presence of at least 25%
  • 48:17of participants who also received the intervention,
  • 48:20that they had decreased risk behaviors.
  • 48:22And likewise for ASP one,
  • 48:28for participants who did receive the intervention,
  • 48:32if they were exposed to at least 25% of participants
  • 48:37in the interference set who also received the intervention,
  • 48:41then there risk behavior fears were also reduced.
  • 48:45But on the other hand,
  • 48:48if a participant did not receive at least one component
  • 48:51and greater than 75% of those interference
  • 48:56that also did not receive the intervention,
  • 48:59then this had an adverse effect on the risk behaviors.
  • 49:03So overall, we see that a participant's risk behaviors
  • 49:08are influenced by their own treatment
  • 49:10and also in synergy with the treatment received
  • 49:14by those in their interference set.
  • 49:21So to wrap up,
  • 49:24so we proposed several bias corrected estimators,
  • 49:27which serve to decrease the bias in assessment
  • 49:29of causal effects so that future intervention strategies
  • 49:32can be more efficiently designed and interpreted.
  • 49:36And our methods assume
  • 49:38that we have suitable validation study that provides us
  • 49:42with true measures of the interference set.
  • 49:44However, as we see from our application
  • 49:48that an exposure contamination dataset
  • 49:50or a phylogenetic dataset are still imperfect measures
  • 49:53of true social connections,
  • 49:55although we do assume
  • 49:56that these are more accurate than interference sets defined
  • 49:59by general say spatial boundaries
  • 50:02or administrative boundaries.
  • 50:06And we propose for future extensions
  • 50:08that we can perform sensitivity analysis
  • 50:13on departures from the assumptions
  • 50:14that are made in this dissertation.