Skip to Main Content

Addressing Bias in Causal Effects Estimated Under Mis Specified Interference Sets, with Application to HIV Prevention Trials

April 19, 2024

The identification and estimation of causal effects in the presence of interference relies on assumptions about, and correct measurement of, interference sets within which one individual's exposure may influence another one’s outcome. It can be challenging to properly specify interference sets, and when misspecified or mismeasured, intervention effects estimated by usual approaches will typically be biased. To address this bias, we propose several bias-correction methods developed under various study designs and estimators. We illustrate our methods in the analysis of HIV prevention trials, where social and sexual networks play a critical role in disease transmission.

Speaker: Ariel Chao, MPH

March 15th, 2024

ID
11595

Transcript

  • 00:01<v Laura>All right, let's get started.</v>
  • 00:03Thank you, everyone, for coming.
  • 00:05So let me introduce our speaker today.
  • 00:08Ariel Chao is a PhD student in the department
  • 00:11of biostatistics, advised by me and Donna Spiegelman.
  • 00:16So let me say few things about her, about Ariel.
  • 00:21So I've been working with Ariel for three years now
  • 00:24and I have to say it's been a real pleasure.
  • 00:27Ariel is an extraordinary student, very patient,
  • 00:30definitely her characteristic and independent.
  • 00:34I've always been impressed by her creativity
  • 00:37and the way she would always find solutions by herself.
  • 00:41We have been having issues
  • 00:42with getting data from our collaborators
  • 00:45and she never gave up and found ways to keep working
  • 00:48on what she had while waiting.
  • 00:51So she deeply cares about the applications
  • 00:53and she's working on and she has a great intuition.
  • 00:57I also been impressed
  • 00:59on how she can work in several things at the same time.
  • 01:03And as you will see today, she's very talented
  • 01:06and I wish her the best for her future career.
  • 01:10Before that, today, she will present her work
  • 01:13on addressing bias in causal effects,
  • 01:15estimated underspecified interference sets
  • 01:18with application to HIV prevention trials.
  • 01:22So let's give Ariel a more welcome.
  • 01:26Ariel, (crackling drowns out speaker).
  • 01:30<v Speaker>Let me just add,</v>
  • 01:32Ariel has a lot of material to present
  • 01:34so we decided to not take questions while she's talking
  • 01:38or she'll never get through the necessarily.
  • 01:40And then we're gonna allow
  • 01:42for around 10 minutes at the end for questions.
  • 01:44So write down the questions
  • 01:45and then we'll try to give as many people a chance
  • 01:48to ask her questions at the end.
  • 01:51<v Laura>I will keep track of it.</v>
  • 01:53<v Speaker>I'm just monitoring the chat.</v>
  • 01:55<v ->Oh yes, can someone, 'cause I don't think I can see you.</v>
  • 01:58<v Speaker>I can see you.</v>
  • 02:01<v ->All right.</v>
  • 02:02So thank you, Laura, and it's been a real pleasure
  • 02:05working with you as well.
  • 02:06So today, I'll be presenting on my dissertation research,
  • 02:10which is on addressing bias in causal effects
  • 02:14estimated under misspecified interference sets.
  • 02:16And we've applied our methods through the analysis
  • 02:19of HIV prevention trials.
  • 02:23So as an introduction, so interference
  • 02:26or spillover is often present in either randomized
  • 02:29or observational studies.
  • 02:31Whereby interference, we mean
  • 02:32that a participant's outcome can be determined
  • 02:35by not only their own exposure
  • 02:37but also the exposure of others.
  • 02:38So a common example is with vaccines.
  • 02:42So say, my disease status is not only affected
  • 02:44by my own vaccination status,
  • 02:46but also the vaccination status of others around me.
  • 02:49And in the context of HIV prevention trials,
  • 02:52it's been found in several network-based studies
  • 02:54that when only some participants of a network
  • 02:57are trained on say HIV knowledge
  • 03:00or safe practices, that the members
  • 03:03who are weren't trained in the network
  • 03:05also demonstrated increased knowledge
  • 03:07and reduced risk behaviors.
  • 03:09And this is known as disability effect.
  • 03:12So causal inference, that is conducted the presence
  • 03:17of interference is often done under assumptions
  • 03:19on the extent and mechanism of interference.
  • 03:22And typically, this will require a specification
  • 03:25of an interference set for each participant.
  • 03:28Whereby interference sets,
  • 03:29we mean that a group of individuals
  • 03:32who can affect the outcome of that participant.
  • 03:35And then to this interference set,
  • 03:37we also typically apply an exposure mapping function
  • 03:41that will take the exposure vector
  • 03:43observed in this interference set
  • 03:45and map it to some scaler quantity.
  • 03:47And we'll see some examples of this later.
  • 03:51So existing literature interference sets
  • 03:53are typically assumed to be correctly specified
  • 03:56so that the exposures
  • 03:57that are mapped from these interference sets
  • 03:59are also correctly measured.
  • 04:01But often, this correctly specifying
  • 04:04an interference set is challenging.
  • 04:06For example, networks can be mismeasured
  • 04:10and when interference sets are misspecified,
  • 04:12we show under various settings that causal effects estimated
  • 04:15by usual purchase are typically biased.
  • 04:18And there have been several publications
  • 04:20that have addressed this issue.
  • 04:22And the majority of these publications aim
  • 04:25to first estimate the true networks
  • 04:27and then using these estimated networks
  • 04:29to estimate the causal effects.
  • 04:31And there have also been methods proposed
  • 04:32for a sensitivity analysis as well.
  • 04:36However, we pursue a different approach where we assume
  • 04:39that we have a validation study
  • 04:41in which the true interference sets are measured alongside
  • 04:44the observed or surrogate ones for a subset
  • 04:46of the study sample.
  • 04:48And this will allow us
  • 04:49to empirically estimate the measurement error process
  • 04:52and use the estimated measurement error parameters
  • 04:55to bias correct causal effects.
  • 04:58So again, this dissertation is a collection of three papers
  • 05:01where we first consider the setting
  • 05:03of an egocentric network randomized trial
  • 05:06where at most one person per network
  • 05:08can receive the intervention.
  • 05:10Then we extend our methods
  • 05:12to consider cluster randomized trials
  • 05:13where multiple participants per cluster
  • 05:15can receive the intervention.
  • 05:17And we also consider general settings
  • 05:19where interference sets can be mismeasured
  • 05:22and the exposure is not necessarily randomized.
  • 05:27So I'll begin with the first paper
  • 05:29on egocentric network randomized trials.
  • 05:32So under this design, we have index participants
  • 05:34who are recruited into this study
  • 05:36and they're each asked to nominate a set of network members,
  • 05:40which can be their drug injection partners or sex partners,
  • 05:44and they form egocentric networks.
  • 05:46And the index participants are the ones in the study
  • 05:49who are randomized to receive their intervention.
  • 05:52And examples of this are typically
  • 05:57be peer education or behavioral-based.
  • 05:59And the index participants are asked to encourage
  • 06:03behavioral change to their network members.
  • 06:07So for some notation,
  • 06:08we have participant ik being the i participant
  • 06:11in the k network.
  • 06:13And we'll let i equal one denote the index participant
  • 06:16in each network and I incur then one
  • 06:18denote the network members.
  • 06:20We'll also define a network neighborhood for participant ik,
  • 06:24which comprises of participants
  • 06:26who share a network link with ik.
  • 06:30And then we also have a true membership matrix
  • 06:33which essentially represents whether a participant
  • 06:37is a network member of a certain index.
  • 06:39And we also have an intervention assignment indicator,
  • 06:45which again the intervention is randomized
  • 06:46and only received by the index member.
  • 06:51So we'll let, throughout this dissertation,
  • 06:54represent an individual exposure to the intervention.
  • 06:57So because in here in NNRT, only index participants
  • 07:01who are randomized to treatment can receive the treatment
  • 07:05and therefore, A is only equal to one for a treated index
  • 07:09and A is equal to zero for everyone else.
  • 07:13And so to define potential outcomes under interference,
  • 07:16we need to make assumptions on the interference structure.
  • 07:18So here, we assume neighborhood interference
  • 07:22with an exposure mapping function, which essentially says
  • 07:26that I case potential outcome is determined
  • 07:28by their own individual exposure
  • 07:31and the exposures of those in I case network neighborhood
  • 07:34and not anyone outside of it,
  • 07:36including participants from other networks
  • 07:39and out of study individuals.
  • 07:42So we further apply an exposure mapping function
  • 07:46to this network neighborhood
  • 07:48and in this paper, we consider an exposure mapping function
  • 07:51defined by the number of treated neighbors.
  • 07:54So under this assumption,
  • 07:58ik's potential outcome is given by y indexed by A and G,
  • 08:01which is their individual exposure
  • 08:02and the spillover exposure given by the number
  • 08:06of treated neighbors in their neighbor neighborhood.
  • 08:11So this figure is a representation
  • 08:14of two networks where in order
  • 08:16to define the spillover exposure,
  • 08:19we further make the assumption
  • 08:21that the networks are not overlapping.
  • 08:23And so this means that index participants
  • 08:25cannot be connected amongst themselves
  • 08:28and network members can only be connected
  • 08:30to one index participant.
  • 08:32And by making this assumption we can obtain G
  • 08:36by multiplying M and R, which is the membership matrix
  • 08:39and intervention assignment.
  • 08:41And so the spillover exposure
  • 08:42for each participant is only determined
  • 08:44by whether they are connected to a treated index member,
  • 08:47which is shown in this figure.
  • 08:53So the causal estimate of interest in this paper,
  • 08:56the average spillover effect which is the impact
  • 08:58of the intervention on the network members.
  • 09:01And we here, we define the spillover effect
  • 09:03as a risk difference and as a risk ratio.
  • 09:06And under assumptions of positivity
  • 09:08and unconfoundedness which is guaranteed in the ENRG design
  • 09:13under perfect treatment compliance,
  • 09:20we can identify the spillover effects
  • 09:22using observed outcomes and estimate them
  • 09:25using observed outcomes as well.
  • 09:29So under the unconfoundedness assumption,
  • 09:33if we had data on the true exposures, we would estimate
  • 09:36the spillover effect using sample average estimators
  • 09:40using the two by two table at the top.
  • 09:43The issue with this is that in ENRTs, the networks
  • 09:46that are observed, which are the ones
  • 09:47that are collected at study baseline,
  • 09:49they may not represent the true connections that take place
  • 09:52during the study period.
  • 09:54So for example, a network member can fall out of touch
  • 09:56with the index participant that they enrolled with
  • 09:59and they can also befriend another index of another network.
  • 10:04And so under these observed networks,
  • 10:06there's spillover exposure may also be misclassified.
  • 10:10And using these misclassified spillover exposures,
  • 10:14we will instead estimate the spillover effect
  • 10:16using the two by two table at the bottom of this slide.
  • 10:20And we show that the estimated spillover effects
  • 10:23under this table would be biased.
  • 10:27And here's a representation of the types
  • 10:29of network misclassification that can occur in ENRT.
  • 10:34So here, the black links represent
  • 10:36the correctly measured network ties.
  • 10:39The blue links represent network links that are observed
  • 10:42but are in fact not true.
  • 10:44And the red links represent the ones that are not observed
  • 10:47but in fact occur during the study period.
  • 10:50And because of these types of misclassification
  • 10:55person's truth, spillover exposure
  • 10:56can be different from their observed one.
  • 11:02In this paper, we further assume
  • 11:04non-differential misclassification,
  • 11:07which is that the misclassification process
  • 11:09doesn't depend on potential outcomes.
  • 11:12So under this assumption we derive an expression
  • 11:15for the bias using four parameters,
  • 11:17which are the baseline Malcolm rate,
  • 11:20the true spillover risk ratio, PM, which is the probability
  • 11:25of being classified into the correct network as well as PR,
  • 11:28which is the intervention allocation probability.
  • 11:32And using these expressions we can show
  • 11:34that there's no bias when PM is one,
  • 11:36which is when everyone is correctly classified
  • 11:38due to the correct network.
  • 11:41We can also show that the bias is always towards the null
  • 11:44under the non differential misclassification assumption.
  • 11:47So the ASP would always be underestimated
  • 11:50under this assumption
  • 11:51if spillover exposures were misclassified.
  • 11:57So in order to correct for this bias,
  • 11:59we use a validation study.
  • 12:01So again, this is where the true network
  • 12:03or spillover exposure is measured
  • 12:05alongside the mismeasured ones
  • 12:07for a subsample of the main study.
  • 12:09And then in this paper, we estimate the sensitivity
  • 12:12and specificity of spillover exposure classification
  • 12:15among network members and we assume that the parameters
  • 12:18that are estimated in the validation study
  • 12:20is generalizable to the main study.
  • 12:24We can show that the sensitivity
  • 12:26and specificity can be expressed as functions of PM and PR
  • 12:32where the intuition is that if a participant
  • 12:35or if a network member is observed to be connected
  • 12:38to a treated index, given that there really are connected
  • 12:43to a treated index,
  • 12:44this can be because they are correctly classified
  • 12:47or it could be because they were misclassified
  • 12:49but still connected
  • 12:50to a treated index just from another network.
  • 12:54We can also estimate the sensitivity
  • 12:55and specificity using the two by two table.
  • 12:58Here, when we assume that the misclassification process
  • 13:02doesn't depend on covariate.
  • 13:07So we propose three estimators in this paper.
  • 13:10The first is called the matrix method estimator.
  • 13:13And here, this estimator takes the estimated sensitivity
  • 13:17and specificity from the validation study
  • 13:20to bias correct accounts observed
  • 13:22from the two by two table in the main study.
  • 13:25And this would be the form of the bias corrected estimators
  • 13:27for this spillover effect.
  • 13:29And we can obtain its variance
  • 13:31by the multivariate delta method.
  • 13:33And if we believe that there is clustering in the study,
  • 13:36we can also adjust for this by a design effect inflation
  • 13:40or we can perform network bootstrapping
  • 13:43where we re-sample networks as whole.
  • 13:46And we know that to use this method
  • 13:49and for this method to perform well,
  • 13:52there needs to be constraints on the value of sensitivity
  • 13:55and specificity for the estimator to be stable
  • 13:58and to avoid estimating negative cell counts.
  • 14:02So when these constraints are not met,
  • 14:05we can instead consider an inverse matrix method estimator
  • 14:10which corrects the cell counts in the two at two table
  • 14:13in the main study using the positive
  • 14:15and negative predictive values
  • 14:16instead of the sensitivity and specificity.
  • 14:19And this method uses the PPV and MPV
  • 14:22estimated separately for those with and without the outcome.
  • 14:26And therefore the matrix method estimated
  • 14:28may be more efficient relative to this estimator
  • 14:31if the outcome is rare
  • 14:32and the validation study is small
  • 14:37and the last estimator we considered
  • 14:39was a likelihood-based estimator
  • 14:42because while the matrix
  • 14:43and inverse matrix estimators are easily implemented,
  • 14:47there's no clear way of directly incorporating the effect
  • 14:49of clustering into the entrance.
  • 14:52And therefore, we can specify an outcome model
  • 14:55including a network random effect
  • 15:00to account for clustering by networks
  • 15:02and using the likelihood specified here,
  • 15:06we can obtain the MLE of the ASP
  • 15:10and its variance by the inverse
  • 15:13of the observation information matrix.
  • 15:19Right, so I'll go over our application
  • 15:23of our methods using the HPTN 037 study,
  • 15:26which was an ENRT that was conducted in Philadelphia
  • 15:30and Chiang Mai Thailand
  • 15:31where in the study indexes were randomized
  • 15:34to receive an intervention that consisted
  • 15:36of a peer education training where they were encouraged
  • 15:42to disseminate HIV knowledge
  • 15:44and injection sexual risk reduction behaviors
  • 15:47with their network members.
  • 15:49In the study, we were interested in looking at the effect
  • 15:53of the intervention on any self-reported HIV risk behaviors
  • 15:57at one year after study enrollment.
  • 16:01Here, we define G star,
  • 16:03which is they observed spillover exposure
  • 16:05based on the intervention assigned to the networks
  • 16:07and receive by their index members.
  • 16:09And we define G the true spillover exposure
  • 16:17based on an exposure contamination survey
  • 16:19that was taken at six months post baseline.
  • 16:22So in the survey, participants were asked
  • 16:24to recall five specific terminologies associated
  • 16:28with the intervention training.
  • 16:31So we suppose that if a network member were able
  • 16:33to recall any of these five terms, that they were exposed
  • 16:36to the intervention through a treated index
  • 16:38and if they weren't able to recall
  • 16:40any of the terms, then they weren't exposed.
  • 16:43And because there was a possibility
  • 16:45that network members may be exposed to the training
  • 16:48but just didn't remember any of the terms
  • 16:51or that there may be network members
  • 16:53who were not exposed to the training
  • 16:56but just said that they remember terms
  • 16:57because of social desirability,
  • 16:59we only included network members
  • 17:01who recalled the positive control term which was exposed
  • 17:04to everybody regardless of their randomization arm
  • 17:08and none of the negative control terms,
  • 17:09which none of the participants were supposed to know
  • 17:14that only these participants
  • 17:15were included in the validation study
  • 17:16so we can more accurately estimate
  • 17:19the sensitivity and specificity.
  • 17:23So here are the effects of the intervention
  • 17:27or the spillover effects of the intervention
  • 17:29on risk behaviors were from the validation study we see
  • 17:34that there was indeed some degree
  • 17:36of network misclassification where the sensitivity was 60%
  • 17:41and specificity was 79%.
  • 17:44So then intent-to-treat estimator uses G star,
  • 17:47which is the intervention assigned to the networks.
  • 17:50And we do see already
  • 17:51that there was significant spillover effect
  • 17:54of the intervention on reducing risk behaviors.
  • 17:57And this effect was amplified
  • 17:59after applying our bias correction method.
  • 18:03So we applied the matrix method estimator as well as the MLE
  • 18:07and the inverse matrix method
  • 18:11was not an ideal choice in this study
  • 18:14because of our small validation study
  • 18:15and the number of participants
  • 18:18who had the outcome within the study.
  • 18:21Here, we also compared several standard errors were first,
  • 18:26we consider standard standards obtained
  • 18:27from the delta method and those inflated
  • 18:30by the design effect.
  • 18:32And we see that the confidence intervals here
  • 18:34were pretty wide, which is due
  • 18:36to the small validation study sample size.
  • 18:40However, when we consider network bootstrapping
  • 18:42or the likelihood base method,
  • 18:44we see that the confidence interval significantly narrowed
  • 18:47and we were able to see a significant spillover effect.
  • 18:54So as a summary of this first paper,
  • 18:56we proposed several bias correction estimators
  • 18:58for this spillover effect
  • 19:00to address network misclassification and NRTs.
  • 19:04So our methods here assume that both the exposure
  • 19:07and outcome are binary measures
  • 19:09and we did not consider covariate adjustment
  • 19:12because the intervention is randomized.
  • 19:15And so as a segue to the second paper,
  • 19:18we will be developing methods
  • 19:20for non-binary exposures outcomes
  • 19:23as well as allowing for covariate adjustment.
  • 19:26And we develop these methods in the setting
  • 19:27of cluster randomized trials.
  • 19:35So causal inference in cluster randomized trials
  • 19:38in CRTs often rely on the assumption
  • 19:41of partial interference,
  • 19:43which is that participants are separated
  • 19:46into non-overlapping clusters
  • 19:47and interference is assumed
  • 19:50to be only contained within these clusters
  • 19:52and not across clusters.
  • 19:54And this assumption is typically made
  • 19:56because there is an absence of social network data and CRTs.
  • 20:02So interference sets define other departure
  • 20:06interference assumptions are usually given
  • 20:07by the randomization clusters in the trial.
  • 20:10So this can be villages or communities
  • 20:16and the interference says they're given
  • 20:17by the randomization clusters can be measured with there
  • 20:20because they might be a lot larger than the true networks
  • 20:24if they were considered to be whole communities.
  • 20:26And also interactions can exist
  • 20:28across these communities as well.
  • 20:31So this figure was taken from a file genetic analysis
  • 20:35from BCPP where BCPP was in HIV prevention CRT
  • 20:40that was conducted in 30 communities in Botswana.
  • 20:44And so the randomization clusters were communities
  • 20:48but from the phylogenetic analysis where they sequenced
  • 20:52HIV genes viral sequences
  • 20:58that they saw that the viral transmission chains obtained
  • 21:00from the sequences, the majority of them
  • 21:04actually crossed two or more communities,
  • 21:06which was an indication
  • 21:07of high-end cluster mixing in this study.
  • 21:10And interference says that are defined
  • 21:11just by communities would be misspecified in this case.
  • 21:18So here we again have participant ik
  • 21:20as the i participant indicate cluster.
  • 21:24We have script one and to denote the study sample.
  • 21:28In this study, we first consider a two-stage CRT
  • 21:33where clusters are first randomized
  • 21:36to an intervention allocation strategy.
  • 21:38Here, we consider strategies alpha one and alpha two
  • 21:42and alpha one and alpha two are probabilities.
  • 21:44And under a balanced design,
  • 21:46half of the clusters would be assigned to alpha one
  • 21:48and half would be assigned to alpha alpha two.
  • 21:51Then after the first stage randomization,
  • 21:55participants within these clusters
  • 21:57would be randomized to receive the intervention
  • 21:59with the probability equal to the one
  • 22:01that was assigned to their cluster.
  • 22:04And we'll extend our methods to consider general CRTs,
  • 22:07which can be considered as a special case
  • 22:10of a two-stage design where alpha one
  • 22:12and alpha two are one and zero, which means
  • 22:15that clusters are randomized to intervention or control
  • 22:18and there isn't a second stage randomization
  • 22:20at the participant level.
  • 22:25So we again have to denote the individual exposure.
  • 22:29And to define potential outcomes in the setting,
  • 22:32we first define a subset of the study sample
  • 22:35for participant ik and we denote this by script I.
  • 22:40So here, we make the partial interference assumption
  • 22:45where ik's potential outcome is influenced
  • 22:48by their own exposure as well as the exposures
  • 22:52of the participants within this subset
  • 22:55and not anyone outside of this subset.
  • 22:57So because only the exposures of the participants
  • 23:00will affect the outcome of ik, we call this subset
  • 23:05for ik's interference set.
  • 23:08And we can further apply an exposure mapping function
  • 23:11to this interference set
  • 23:14to obtain a scaler quantity of a spillover exposure.
  • 23:17Here, we consider stratify interference,
  • 23:22which essentially assumes that spillover occurs
  • 23:25through the proportion of treated participants
  • 23:27in the interference set regardless of who they are.
  • 23:30So the spillover exposure would be given by disproportion
  • 23:34and as in the first paper,
  • 23:35we can index potential outcomes by A and G.
  • 23:41Here, we consider four causal effects
  • 23:45which the individual effect, spillover effect,
  • 23:48total effect and overall effect.
  • 23:50The individual effect is the effect
  • 23:53of the individual exposure under a fixed spillover exposure.
  • 23:57And on the other hand, the spillover effect is the effect
  • 24:01of the spillover exposure under a fixed individual exposure.
  • 24:05And then the total effect is the effect
  • 24:07of having both an individual exposure to the intervention
  • 24:10and some degree of spillover exposure
  • 24:12versus neither type of exposure.
  • 24:15And then the overall effect compares the effect
  • 24:17of being assigned to a cluster randomized
  • 24:21to treatment allocation strategy alpha versus alpha.
  • 24:27So these causal effects can again be identified
  • 24:32under the assumption the identifying assumptions
  • 24:35we made in the paper earlier
  • 24:37or as in the first paper,
  • 24:38which were the unconfounded assumption
  • 24:42which would hold under a two-stage design
  • 24:44given perfect treatment compliance.
  • 24:47And we estimate these effects
  • 24:49using a regression-based estimation approach,
  • 24:52which is consistent
  • 24:53and efficient under a correctly specified model
  • 24:56for the potential outcome.
  • 24:59So here, we consider an outcome model in this form
  • 25:06where we include a cluster random effect to account
  • 25:10for the effect of clustering and the inference
  • 25:12and we also have an interaction between A and G
  • 25:15so that we can allow the individual effect to vary
  • 25:18with G and the spillover effect to vary with A.
  • 25:23So once we have the estimated coefficients from this model,
  • 25:27we can estimate causal effects
  • 25:30using these estimated coefficients.
  • 25:36So again, in CRTs,
  • 25:38because we don't have data on social connections
  • 25:41and when we consider interference to be given
  • 25:43by randomization clusters, they can be measured with error.
  • 25:47And as a consequence, this spillover exposure
  • 25:49can also be measured with error.
  • 25:51So we have shown in this paper
  • 25:53that when the outcome model is fit with G star instead of G,
  • 25:57the estimated model coefficient will be biased
  • 26:00and the causal effects estimated
  • 26:01with these bias coefficients were therefore also be biased.
  • 26:08And to correct for the bias
  • 26:10in these regression coefficients,
  • 26:12we apply a regression calibration approach
  • 26:15which is developed under the assumption
  • 26:17that the measurement error is additive
  • 26:20and also the non differential measurement error
  • 26:22as in the previous paper.
  • 26:25So to apply this method,
  • 26:27we will first regress the outcome
  • 26:30on the mismeasured exposure
  • 26:31in the main study as we would
  • 26:34under the intent-to-treat analysis.
  • 26:36In the validation study
  • 26:37because we assumed that the measurement error is additive,
  • 26:41we fit a linear measurement error model of the true exposure
  • 26:45given the mismeasured spillover exposure.
  • 26:49And then we can obtain bias corrected regression
  • 26:51coefficients using the coefficients
  • 26:53obtained from these two models.
  • 26:56And we can obtain the variance
  • 26:59of these corrected coefficients using the delta.
  • 27:06We can also extend this approach to account
  • 27:09for covariate adjustment
  • 27:11and there may be several reasons why we need
  • 27:13to adjust for covariates.
  • 27:15First, if we step out of the two stage CRT setting
  • 27:18and we consider a general CRT where intervention is work,
  • 27:22clusters are assigned to either intervention or control.
  • 27:25And a lot of public health studies that the interventions
  • 27:29that are given to these clusters may be prone
  • 27:31to non-compliance.
  • 27:33And intervention uptake will depend
  • 27:36on individual characteristics
  • 27:38that may need to be accounted for.
  • 27:41So in the when there are confounders between the outcome
  • 27:46and the individual exposure, A or G,
  • 27:50we would need to assume conditional unconfoundedness
  • 27:55of the individual exposure exposures.
  • 27:57So when there are covariates or confounders between Y and A,
  • 28:02we would adjust them in the outcome model.
  • 28:05And if they were confounders between Y and G,
  • 28:09we would adjust for them in the outcome model
  • 28:11as well as in the measurement error model.
  • 28:15We might need to also make the non-differential
  • 28:20measurement error assumption conditional on covariates.
  • 28:25And in this case, because these covariates are related
  • 28:27to the measurement error as well as the outcome.
  • 28:30They would need to be adjusted in both models as well.
  • 28:34And lastly, we may only be able
  • 28:36to generalize the measurement error parameters estimated
  • 28:40in the validation study to the main study
  • 28:43conditional on covariates.
  • 28:44And in this case, we would adjust
  • 28:46for these covariates as well.
  • 28:50But regardless of the types of covariates that are adjusted
  • 28:53for the regression calibration estimators
  • 28:56and variance estimators
  • 29:01for the coefficients that are of interest that are used
  • 29:04to estimate the causal effects, they would not be changed
  • 29:08as in the case without the variates.
  • 29:14We've applied our methods to the BCPP study,
  • 29:19which is, which was a HIV prevention CRT
  • 29:22and 30 Botswana communities that was conducted
  • 29:26between 2013 and 2018.
  • 29:28And this trial was to assess whether an intervention package
  • 29:33will reduce HIV incidents.
  • 29:35So in this trial, 15 communities were randomized
  • 29:39to receive intervention package
  • 29:41that included HIV testing, linkage to care,
  • 29:45and early ART initiation for those who are HIV positive
  • 29:49as well as increased access
  • 29:50to voluntary medical male circumcision.
  • 29:53And the other 15 communities
  • 29:55were randomized to a standard of care.
  • 29:58So in the primary analysis they found
  • 30:00that there were decreased incident rates
  • 30:03and increased file suppression rates
  • 30:05in the intervention communities
  • 30:06compared to control communities.
  • 30:08And in our application, we are accounting for non-compliance
  • 30:12to the components where we analyzed the individual's
  • 30:15spillover, total, and over effects
  • 30:17of the package intervention
  • 30:18that was received on behavioral and clinical outcomes.
  • 30:23And here, we consider the communities
  • 30:26to be the misspecified interference sets
  • 30:29and we determined the true exposures
  • 30:33using phylogenetics data, which we consider
  • 30:36as our validation data.
  • 30:41So the phylogenetic data was obtained from the study shown
  • 30:45in the beginning of this section
  • 30:46where they found viral transmission chains
  • 30:49that crossed multiple clusters.
  • 30:53So here, they approached HIV positive individuals
  • 30:56in the study and obtained blood samples from them
  • 30:59and they were able to sequence their viral genomes
  • 31:03and construct HIV clusters
  • 31:06where a participants group in the same viral cluster
  • 31:09were implied to be from the same viral transmission chain.
  • 31:14So here, each viral cluster they found to be composed
  • 31:18of two to 27 participants who were
  • 31:19from one to 16 communities.
  • 31:23And there were several considerations that we had to make
  • 31:27by using the phylogenetic data as our validation data
  • 31:32because the viral clusters only captured participants
  • 31:35were infected by the same HIV strain.
  • 31:38And would not necessarily represent a participant's
  • 31:42entire true interference set.
  • 31:45So we had to make some assumptions
  • 31:47to obtain the true spillover exposure
  • 31:50using this phylogenetic data where the first,
  • 31:53we consider the connections observed
  • 31:56within the viral cluster were representative
  • 32:01of the participants
  • 32:03who were HIV positive in ik's true interference set.
  • 32:08And then we also assume transportability
  • 32:12of the measurement error process where we assume
  • 32:15that the inter-cluster interactions
  • 32:17that we observed from the viral clusters
  • 32:19would've been the same among those who were HIV negative.
  • 32:23And lastly, we considered
  • 32:25that because those who are HIV positive
  • 32:26might not have the same characteristics
  • 32:28for intervention uptake as those who are HIV negative.
  • 32:32We derived the true spillover exposure
  • 32:35based on a weighted average of cluster intervention uptake.
  • 32:40So for example, if there were five participants observed
  • 32:44in the viral cluster
  • 32:46from two different randomization communities,
  • 32:49then we take the intervention uptake
  • 32:52of these two communities
  • 32:53and waited by the portion of participants
  • 32:56from each community that were observed in the viral cluster.
  • 33:04And so here are some details on the intervention components
  • 33:09for the study.
  • 33:11I don't know, do I have enough to?
  • 33:13Okay, so basically,
  • 33:16there are four components to this intervention package
  • 33:19and these four components were eligible
  • 33:22to different study populations.
  • 33:24So here are the eligibility criteria that we had considered
  • 33:28for our application where for testing, we considered
  • 33:33that they were eligible for testing if they did not have
  • 33:35documented HIV positive status prior to baseline
  • 33:38and participants were eligible for HIV care
  • 33:41and ART initiation if they were HIV positive at baseline.
  • 33:47And for circumcision, we considered someone
  • 33:51to be eligible for this treatment
  • 33:54if they were an HIV negative male at baseline
  • 33:56who had not been circumcised.
  • 33:59And we also considered several definitions
  • 34:02of the individual exposure which were receiving
  • 34:05at least one of these intervention components
  • 34:08or receiving all eligible components
  • 34:10versus some or none of them.
  • 34:13In this paper, we also considered three outcomes.
  • 34:17First was a behavioral outcome that we defined
  • 34:20as a sexual risk behavior score
  • 34:22and these is defined as the number
  • 34:26of self-reported behaviors that they had reported
  • 34:30at their survey interview at one year post baseline.
  • 34:35And then we also looked at two clinical outcomes,
  • 34:37which were viral load at one year post baseline
  • 34:40and HIV incidents by the end of the study.
  • 34:46Before we looked at the effect
  • 34:48of receiving the individual components,
  • 34:50we first assessed the overall effect of being assigned
  • 34:54to an intervention cluster versus control
  • 34:57on these three outcomes where the ITT estimates
  • 35:01were conducted assuming that the interference sets
  • 35:06were communities.
  • 35:08And we see that there was a minimal overall effect
  • 35:15of cluster assignment on decreasing sexual risk behaviors.
  • 35:20But there was significant effect on viral load
  • 35:23and incidents where our findings echoed the ones
  • 35:28from the primary analysis where they found
  • 35:30increased viral suppression
  • 35:32and decreased incidents for clusters assigned
  • 35:34to intervention and versus control.
  • 35:37And after bias correction we see
  • 35:39that these effects are again amplified,
  • 35:43which was expected due to the high levels
  • 35:45of inter-cluster mixing where say,
  • 35:48some preventative measures from intervention communities
  • 35:52may have gone into the control communities
  • 35:54and some incidents observed in intervention communities
  • 35:57may have been attributable to control communities.
  • 36:03And we also looked at the effect
  • 36:05of receiving at least one component
  • 36:08on essential risk behavior score where we see
  • 36:12that after applying our bias correction method,
  • 36:15that there was a significant total
  • 36:17and overall effect of receiving at least one component
  • 36:21on decreased sexual risk behaviors.
  • 36:27And there was also a significant individual effect
  • 36:30of receiving both HIV care
  • 36:33and ART on decreased viral load which was expected.
  • 36:42So here, we proposed methods to bias correct
  • 36:47causal effects estimated underspecified interference
  • 36:50sets in a CRT, although our methods are not restricted
  • 36:54to the setting can be applied to broader settings as well.
  • 36:59And to use our regression calibration method,
  • 37:01we had to assume that both the measurement error
  • 37:04and outcome models were correctly specified.
  • 37:08And we also made some assumptions
  • 37:10on the measurement error structure.
  • 37:11So we proposed for a third paper and IPW-based method
  • 37:19where parametric assumptions on the outcome model
  • 37:22were not required and also we didn't need
  • 37:24to make assumptions on the additive
  • 37:26or non-differential nature of the measurement error process.
  • 37:34Okay, so propensity score based methods are widely used
  • 37:39to estimate intervention effects when characteristics
  • 37:43of the exposed and unexposed participants may be unbalanced,
  • 37:47which may be an observational setting
  • 37:50where the exposure is not randomized.
  • 37:55And in particular, we're focused on an IPW estimator
  • 37:58that has been previously extended
  • 38:00to estimate causal effects in the setting of interference.
  • 38:04And this is typically done assuming the interference sets
  • 38:07are known and true.
  • 38:09And in this paper, we show that when interference sets
  • 38:15are mismeasured and spillover exposures are mismeasured
  • 38:18as a consequence, there is an error
  • 38:19in not only the spillover exposure
  • 38:21but also in the propensity score estimates.
  • 38:28So for notations, here, we have,
  • 38:31we're outside of the network and cluster setting
  • 38:34so we have just i from one to end participants.
  • 38:37Here, the individual exposure status may depend
  • 38:40on observed individual covariates.
  • 38:43And also, here, we assume
  • 38:45the pressure interference assumption as in our second paper.
  • 38:48Although this method doesn't require
  • 38:51the pressure interference assumption.
  • 38:53We can also make the neighborhood interference consumption
  • 38:55if we were working in a setting of social networks.
  • 39:01In this paper, we define a binary spillover exposure,
  • 39:05although our methods can be generalized
  • 39:07to categorical measures of the spillover exposures as well.
  • 39:10And here we consider an extension
  • 39:13of the stratify interference
  • 39:14that we made in the previous paper where G,
  • 39:18we define by one if the proportion
  • 39:21of treated participants in interference set exceeds
  • 39:23a certain pre-specified threshold.
  • 39:27And again, credential outcomes are indexed by A and G.
  • 39:31In this paper, we're interested in the individual spillover
  • 39:35and total effects.
  • 39:39So this is the IPW estimator
  • 39:42for the average potential outcome
  • 39:45where in the denominator, we have
  • 39:47an estimated joint propensity score for the individual
  • 39:51and spillover exposures.
  • 39:53And this can be expressed as the product
  • 39:55of the individual exposure propensity score
  • 39:58and the spillover exposure propensity score.
  • 40:00And these can be estimated
  • 40:01using (indistinct) regression models.
  • 40:04And we can obtain the variance of this estimator
  • 40:07by bootstrap resampling where we can resample
  • 40:11at the individual level or at the cluster level
  • 40:15if we were working in a setting with clusters
  • 40:17as in our second paper.
  • 40:20And this estimator is consistent, if the models
  • 40:23for the propensity scores are correctly specified.
  • 40:29So as in the previous cases when interference specified,
  • 40:35we would observe G star instead of G.
  • 40:38And if we were to use G star in the IPW estimator,
  • 40:43we would get a biased estimate
  • 40:45because the expected value of this estimator is given
  • 40:49by the form shown in the bottom here where we see
  • 40:53that this estimator is only unbiased if the probability
  • 40:57observing the true exposure equal to G,
  • 40:59given that the spillover exposure
  • 41:02is also equal to G is equal to one,
  • 41:04which means that there's no measurement error.
  • 41:07And also from the form of this expectation, we can also see
  • 41:12that the bias can be eliminated if we divide both terms
  • 41:17on the right-hand side by this measurement error probability
  • 41:21and then subtracting away the second term.
  • 41:25Which is the approach that we took.
  • 41:29And this was an approach that was first proposed by brown
  • 41:33and colleagues in the setting without interference.
  • 41:36And here, we extended this estimator
  • 41:38to the setting of interference.
  • 41:42So from this bias corrected IPW estimator, we see
  • 41:46that on the right-hand side in the first term,
  • 41:49we have the IPW estimator that is estimated
  • 41:53in the main study.
  • 41:55We also have an IPW estimator
  • 41:57that is estimated in the validation study alone.
  • 42:00And the measurement error probabilities
  • 42:04are also estimated in the validation study.
  • 42:09And because here, we are estimating potential outcomes
  • 42:12in the validation study, we need to assume generalizability
  • 42:17of the potential outcome
  • 42:18and measurement error process in this study
  • 42:21so that the effects that are estimated
  • 42:23in the validation study alone
  • 42:25would be unbiased for the average effect
  • 42:28that would be observed in the main study.
  • 42:33So using these bias corrected IPW estimators,
  • 42:37we can obtain a bias corrected estimator
  • 42:39for the causal effect which is given as contrast
  • 42:42between potential outcomes estimated
  • 42:45using the bias corrected IPW estimators.
  • 42:48And here, we can write this estimator using
  • 42:55with weights, W here.
  • 42:57Where the weights are meant to minimize the variance
  • 43:00of the bias corrected causal effects
  • 43:03and the weights are given at the bottom here
  • 43:07where the variance of variance terms can also be estimated
  • 43:11using bootstrap resampling.
  • 43:17So while this estimator directly eliminates the bias,
  • 43:21it does require the outcome
  • 43:22to be available in the validation study.
  • 43:25So when this is not available,
  • 43:28we propose an alternative estimator
  • 43:30that does not impose this requirement
  • 43:33where we've extended methods proposed by rule
  • 43:36and colleagues to the setting of interference.
  • 43:40And so this is a regression calibration-based approach
  • 43:43where first, we assume that we have a continuous measure
  • 43:47of the spillover exposure.
  • 43:49And we will predict the true continuous spillover exposures
  • 43:53given the observed ones.
  • 43:55And then under the exposure mapping
  • 43:57that we had specified previously with the threshold,
  • 44:00we would dichotomize this proportion.
  • 44:07And the regression calibration based IPW estimator
  • 44:10would use the predicted binary true exposures
  • 44:15as well as the propensity scores estimated
  • 44:17under these predictive values.
  • 44:19And we've shown that as in the previous paper,
  • 44:23that this estimator is only consistent
  • 44:25if a linear measurement error model fits the data
  • 44:32In this paper, we further consider the case
  • 44:35where we might observe multiple surrogate
  • 44:37interference sets in a study.
  • 44:39And this was motivated by our illustrative example of BCPP
  • 44:45where we may consider a surrogate interference set
  • 44:48defined by a randomization cluster.
  • 44:50And we can also consider a second surrogate interference set
  • 44:54that is defined by household GPS data,
  • 44:57which is available in the study.
  • 45:00So when we have multiple surrogate interference sets,
  • 45:04we propose to first apply our bias corrected estimators,
  • 45:08either the first or the regression calibration-based one
  • 45:13to each surrogate interference set individually.
  • 45:16And then we will combine these individual estimates
  • 45:18using a weighted average estimator to reduce the variance
  • 45:21of the final estimate.
  • 45:25So the weights are given by C in the bottom here
  • 45:29where we would estimate the variance variance matrix
  • 45:33of the individual bias corrected causal effects.
  • 45:42Here, similar to the second paper, we've applied our methods
  • 45:45to BCPP where we analyzed the individual's
  • 45:49spillover total effects
  • 45:50of receiving at least one intervention component
  • 45:53on sexual risk behaviors one year after study enrollment.
  • 45:58So as a reminder, the components here are HIV testing,
  • 46:01HIV care, ART and circumcision.
  • 46:06And here, we consider a binary outcome, which we define
  • 46:09by one if a participant had reported having engaged
  • 46:13in at least 30% of the surveyed risk behaviors.
  • 46:18Here, for application,
  • 46:19we consider the randomization clusters
  • 46:22or communities as our first surrogate interference set.
  • 46:26And we also consider a second surrogate interference set
  • 46:30that is defined by smaller geographical plots.
  • 46:33And in these geographical plots, they comprised
  • 46:38of participant two to 18 participants on average,
  • 46:42which were much smaller than randomization clusters,
  • 46:44which were about 400 participants each.
  • 46:49And in both of these interference sets,
  • 46:52we define the spillover exposure to be one if at least 25%
  • 46:57of participants in the inference set received
  • 46:59at least one intervention component.
  • 47:03And as in the second paper,
  • 47:04we determine the true spillover exposures
  • 47:08from the phylogenetic dataset.
  • 47:13So here are the risk differences
  • 47:16of receiving at least one intervention component
  • 47:19on self-reported sexual risk behaviors
  • 47:22where we compare the estimates obtained when we consider
  • 47:25communities at the randomization clusters
  • 47:27or the geographical plot as to the interference sets.
  • 47:32And we compare these to the bias corrected estimates
  • 47:35where here, I'm presenting the estimates
  • 47:38of coming from the weighted average
  • 47:40of the bias corrected estimates applied individually
  • 47:43to the community and to the geographical plots.
  • 47:48Where here, under the circuit interference,
  • 47:52as we see that most effects were null.
  • 47:56However, after bias correction, we see
  • 48:00that there is a beneficial AIE when G is equal to one
  • 48:05and beneficial ASP when A is equal to one.
  • 48:08Which means that for participants
  • 48:11who received at least one component of the intervention,
  • 48:14if there were in the presence of at least 25%
  • 48:17of participants who also received the intervention,
  • 48:20that they had decreased risk behaviors.
  • 48:22And likewise for ASP one,
  • 48:28for participants who did receive the intervention,
  • 48:32if they were exposed to at least 25% of participants
  • 48:37in the interference set who also received the intervention,
  • 48:41then there risk behavior fears were also reduced.
  • 48:45But on the other hand,
  • 48:48if a participant did not receive at least one component
  • 48:51and greater than 75% of those interference
  • 48:56that also did not receive the intervention,
  • 48:59then this had an adverse effect on the risk behaviors.
  • 49:03So overall, we see that a participant's risk behaviors
  • 49:08are influenced by their own treatment
  • 49:10and also in synergy with the treatment received
  • 49:14by those in their interference set.
  • 49:21So to wrap up,
  • 49:24so we proposed several bias corrected estimators,
  • 49:27which serve to decrease the bias in assessment
  • 49:29of causal effects so that future intervention strategies
  • 49:32can be more efficiently designed and interpreted.
  • 49:36And our methods assume
  • 49:38that we have suitable validation study that provides us
  • 49:42with true measures of the interference set.
  • 49:44However, as we see from our application
  • 49:48that an exposure contamination dataset
  • 49:50or a phylogenetic dataset are still imperfect measures
  • 49:53of true social connections,
  • 49:55although we do assume
  • 49:56that these are more accurate than interference sets defined
  • 49:59by general say spatial boundaries
  • 50:02or administrative boundaries.
  • 50:06And we propose for future extensions
  • 50:08that we can perform sensitivity analysis
  • 50:13on departures from the assumptions
  • 50:14that are made in this dissertation.