Skip to Main Content

Power and Sample Size Calculations for Evaluating Spillover Effects in Networks with Non-randomized Interventions

July 07, 2023

Speaker: Ashley Buchanan

February 10, 2023

Co-sponsored by the Department of Biostatistics, Yale School of Public Health

ID
10112

Transcript

  • 00:59<v Vin>Donna's looking over it.</v>
  • 01:00I'll just start.
  • 01:04So can we hear us okay online?
  • 01:08<v Donna>Yeah, if you want you can go to the podium.</v>
  • 01:10<v ->Yeah.</v> <v ->Okay.</v>
  • 01:11'Cause this is last-minute,
  • 01:13so I need to just get your bio (indistinct).
  • 01:15(laughing)
  • 01:18So, hi, everyone.
  • 01:20It's my pleasure to welcome Dr. Ashley Buchanan
  • 01:24today as our speaker in this seminar series.
  • 01:27And Dr. Buchanan is associate professor of biostatistics
  • 01:31in the Department of Pharmacy Practice
  • 01:34in University Rhode Island
  • 01:35and also as an adjunct in Brown University Biostatistics.
  • 01:40Hi, Donna. <v ->Hi.</v>
  • 01:41<v Vin>And she specializes in the area</v>
  • 01:43of epidemiology and causal inference.
  • 01:45And she has a lot of experiences
  • 01:47collaborating on HIV/AIDS research,
  • 01:49work closely with colleagues both domestically
  • 01:52and internationally to develop
  • 01:54and apply causal methods to improve treatment
  • 01:57and prevention of HIV and AIDS.
  • 01:59And without further ado,
  • 02:02I'll give the floor to you, Ashley.
  • 02:04(indistinct)
  • 02:06<v ->Thanks, Vin, for that nice introduction.</v>
  • 02:09And thanks for the invitation, Donna,
  • 02:10to speak at (indistinct) today.
  • 02:12It's nice to be here in person with folks
  • 02:15that I normally just see on Zoom.
  • 02:17So great to be here.
  • 02:18And welcome to all the folks on Zoom, as well.
  • 02:22Just get my slide.
  • 02:26I see that the slides are already sharing.
  • 02:29Let's do the slideshow.
  • 02:32Oops.
  • 02:34<v Vin>It's the lower right.</v>
  • 02:36It's a little-
  • 02:37<v ->Is this gonna work? (drowned out)</v>
  • 02:41Can the folks on Zoom still see the slides?
  • 02:44<v Vin>You did share, right?</v>
  • 02:45<v ->Yeah, I think it's sharing.</v>
  • 02:47<v Gabrielle>We see a full screen.</v>
  • 02:48(indistinct)
  • 02:50<v ->Okay, great.</v> <v ->Perfect.</v>
  • 02:53<v ->Okay, so today, I'm gonna be presenting work</v>
  • 02:56about study design, power,
  • 02:57and sample size calculation for evaluating
  • 03:00spillover in networks in the context
  • 03:02of the interventions not randomized.
  • 03:05This is definitely work in progress, ongoing work.
  • 03:08So we have some initial simulation results
  • 03:12and some promising findings
  • 03:13and then a lot of open questions
  • 03:15that I'd love to have some discussion about towards the end,
  • 03:19sort of about where the practical world
  • 03:21meets the statistical world,
  • 03:23and how can we bring these ideas into practice
  • 03:26for designing these network type studies.
  • 03:30I'd like to start off with acknowledgements.
  • 03:32So Ke Zhang is a graduate student at URI,
  • 03:36and she's been primarily leading
  • 03:38a lot of the simulation work.
  • 03:39She's been a key individual in this work.
  • 03:43We also have collaborators, Doctors Katenka, Wu, and Lee.
  • 03:46And then I also wanna thank a larger list of collaborators
  • 03:49that have been part of this ongoing work with Avenir,
  • 03:52including Dr. Lee, Forastieri,
  • 03:54Halleran, Friedman, and Nichopoulos.
  • 03:58And then just to acknowledge our funding support
  • 04:00and funding support that collected the motivating data set.
  • 04:07So an outline for today,
  • 04:10I'm gonna give a little bit of background
  • 04:11and talk about the motivating study of TRIP,
  • 04:13talk about the objectives of this particular work.
  • 04:16And then we'll look at some of the simulation results
  • 04:19and then discuss conclusions and future directions.
  • 04:24So this work is focused on people who inject drugs,
  • 04:28and these individuals are at risk for HIV
  • 04:31due to drug use, sharing equipment,
  • 04:34and sexual risk behaviors.
  • 04:36In addition, these individuals are often part of networks.
  • 04:39So when they receive an intervention,
  • 04:42the intervention can benefit not only them
  • 04:44but their partners and possibly even beyond that.
  • 04:47So in these networks, interventions often have
  • 04:50what's known as spillover effects,
  • 04:51sometimes called the indirect effect
  • 04:53in interference literature.
  • 04:55So spillover,
  • 04:58historically in the causal inference literature,
  • 05:00it's been called interference.
  • 05:02Here I'll be calling it spillover.
  • 05:04So that's when one individual's exposure
  • 05:06affects another's outcome.
  • 05:09And recently, there's been several papers
  • 05:12that have been looking at how do we assess
  • 05:14these spillover effects in network studies.
  • 05:21So our motivating study
  • 05:22is the Transmission Reduction Intervention Project.
  • 05:25This was a network-based study of injection drug users
  • 05:27and their contacts in Athens, Greece, 2013 to 2015.
  • 05:32And the individuals were connected
  • 05:35through sexual and drug use partnerships.
  • 05:37The original study was focused on using
  • 05:40this new network tracing technique
  • 05:42to find recently infected individuals
  • 05:44and get them on treatment.
  • 05:46So the idea is when individuals are acutely infected,
  • 05:49they're more likely to transmit.
  • 05:51So if we can find more
  • 05:52of these recently infected individuals,
  • 05:54get them on treatment,
  • 05:55they'll be less likely to infect their partners.
  • 05:58And the punchline from the main study
  • 05:59was this was very successful in finding
  • 06:01more recently infected individuals.
  • 06:05<v Ke>Excuse me.</v>
  • 06:06<v Ashley>What?</v>
  • 06:07<v Ke>I'm so sorry for the bothering,</v>
  • 06:09but from my end, the slides are not moving.
  • 06:13<v ->Not at all, okay, let me try again.</v>
  • 06:15One second. (indistinct)
  • 06:17(Donna laughing)
  • 06:19<v Donna>For one more day (laughs).</v>
  • 06:22<v Vin>At least the (indistinct), so that's okay.</v>
  • 06:24<v ->Yeah, yeah, we haven't made it too far.</v>
  • 06:26(laughing)
  • 06:28<v Donna>Thanks for telling us.</v>
  • 06:28<v Vin>Thanks for letting us know.</v>
  • 06:34How 'bout now?
  • 06:37<v Gabrielle>Yep, we can see the motivating study slide.</v>
  • 06:40<v ->[Donna And Ashley] Okay.</v>
  • 06:41Is it the slide?
  • 06:42Is it in presentation view or is it the slide?
  • 06:45<v ->On the right-hand side,</v>
  • 06:46we can see the next slide and then some notes.
  • 06:50<v ->Oh, so it's in presentation.</v>
  • 06:51I mean, that's not the worst thing,
  • 06:52but sometimes, it's better if they can
  • 06:56just see the whole slide (laughs).
  • 06:58Sorry about that.
  • 07:03<v ->Think you'll have to maybe go</v>
  • 07:04out of the presentation mode.
  • 07:07<v ->Exit presentation mode.</v>
  • 07:08<v Vin>Yeah, so then it's the same in the computer</v>
  • 07:12and the screen sharing.
  • 07:27<v ->Sorry.</v>
  • 07:30How do you do it, Vin?
  • 07:31<v Vin>Just that little button, yeah.</v>
  • 07:33You're actually on it right now.
  • 07:35<v ->I think they're still see-</v> <v ->If you could just click</v>
  • 07:36on that.
  • 07:37<v Donna>Or you can go to the top bar, too, I think.</v>
  • 07:39And there we go. <v ->No, I think they'll</v>
  • 07:40still see that.
  • 07:41<v ->And then I think over here,</v>
  • 07:42maybe there's a way to even exit presentation mode.
  • 07:47(indistinct)
  • 07:49It's that.
  • 07:50<v ->(indistinct) slidehow.</v>
  • 07:53(indistinct)
  • 08:00<v ->(indistinct) if there's any.</v>
  • 08:04(indistinct) presenter view.
  • 08:06<v ->There we go.</v>
  • 08:08<v ->Okay, thanks, Vin.</v>
  • 08:08<v ->Does that look okay for (drowned out)?</v>
  • 08:10(laughing) (indistinct)
  • 08:12<v Gabrielle>Yep, now, it's in presentation mode.</v>
  • 08:15<v ->Okay, great.</v>
  • 08:17Sorry about that, thanks for your patience.
  • 08:20So where were we; so we were talking
  • 08:21about the Transmission Reduction Intervention Project.
  • 08:24So this worked well to find
  • 08:26these recently infected individuals
  • 08:28and refer them to treatment.
  • 08:29So it was this successful strategic network
  • 08:32tracing approach.
  • 08:33In addition in this study,
  • 08:34they also delivered community alerts.
  • 08:36So if there is an individual
  • 08:38who was recently infected in the network...
  • 08:41Get this outta the way so you guys can see the figure.
  • 08:44There's an individual who was recently infected
  • 08:46in the proximity of a particular individual in the network,
  • 08:50these community alerts would be distributed,
  • 08:52which were basically flyers, handouts,
  • 08:55or flyers even posted on the wall of frequented venues.
  • 09:00So then individuals in the network
  • 09:02either received these community alerts
  • 09:04from the investigators or they did not.
  • 09:06So the little red dots are those individuals
  • 09:09who received the alerts.
  • 09:10And then the blue ones are those who were not alerted.
  • 09:14And then we looked at this in our previous paper.
  • 09:17We looked at the spillover effects of the community alerts
  • 09:19on HIV injection risk behavior at six months
  • 09:23to see if receiving this alert yourself
  • 09:26reduced your injection risk behavior.
  • 09:27Or if you had contacts who were alerted,
  • 09:31then did that information spill over to you,
  • 09:33and then you also reduced your injection risk behavior?
  • 09:44<v Donna>So is that the actual network, that picture?</v>
  • 09:47<v ->Yep, that's the visualization of the network among...</v>
  • 09:50There's some missing data
  • 09:51and this problem system among the individuals
  • 09:53that had all the outcomes observed.
  • 09:57Okay, good, the slides can move.
  • 10:00So I'm just gonna,
  • 10:01for those who are not familiar with networks,
  • 10:03I'll define some terminology using this slide.
  • 10:06So this is a visualization of the network here,
  • 10:09the TRIP network.
  • 10:10There's 216 individuals here.
  • 10:14So the individuals are denoted by the blue dots.
  • 10:16Those are people who inject drugs
  • 10:18and their sexual and drug use partners.
  • 10:20And then the edges represent when two individuals,
  • 10:24or nodes, share a partnership.
  • 10:26And we call those connections edges sometimes.
  • 10:29And then the little pink one is an example of a component.
  • 10:34So that's a connected subnetwork
  • 10:36for individuals in that group are connected
  • 10:38to each other through at least one path
  • 10:40but not connected to others in the network.
  • 10:43So right away, we see that TRIP primarily comprised
  • 10:45this one, large, connected component
  • 10:48and several other small components.
  • 10:50We can sort of see them out on the edges of the network.
  • 10:54And then when we zoom in on the component,
  • 10:57the individual in red is the,
  • 11:01we'll call that the index person.
  • 11:03And then the individuals shaded
  • 11:05in this lighter pink are their neighbors
  • 11:08or their first-degree contacts.
  • 11:10So as I go through presenting these methods,
  • 11:12there are some times when I'll be talking about components.
  • 11:14And then in terms of defining the spillover effects,
  • 11:17in this particular paper,
  • 11:18we defined it using the exposure of the nearest neighbors.
  • 11:24<v Donna>By nearest neighbors,</v>
  • 11:25you mean just first-degree (drowned out)?
  • 11:26<v ->First-degree, yeah, it may be said</v>
  • 11:28even more applied to their partners.
  • 11:31<v ->Okay.</v> <v ->Right, so we're really</v>
  • 11:32thinking about their immediate partners,
  • 11:36and these would be individuals
  • 11:37that they either used drugs with or had sex with,
  • 11:39and they reported that in the study
  • 11:41for that edge to be there.
  • 11:43Yep.
  • 11:47So a little bit of notation.
  • 11:49So we have N is denoting the participants in the study.
  • 11:53A is going to be the intervention
  • 11:55based on the community alerts in our example.
  • 11:57We have baseline covariates,
  • 11:59and then we index the neighbor, the partners who were...
  • 12:02I guess in the networks they call it the neighbors.
  • 12:04But in this case, it's really just their partners,
  • 12:06set of participants that share an edge
  • 12:09or partnership with person I.
  • 12:12We have the degree.
  • 12:13And then we have a vector
  • 12:14of the baseline covariates for the neighbors,
  • 12:16vector of baseline covariates for...
  • 12:19Sorry, the treatment for the neighbors,
  • 12:21baseline covariates for the neighbors.
  • 12:23And then we denote the non-overlapping subnetworks by G.
  • 12:31So we're doing causal inference with an intervention
  • 12:35that's not randomized in a network.
  • 12:37So this requires numerous assumptions
  • 12:39in order to be able to identify these causal effects.
  • 12:43So first, as in the figure,
  • 12:46what I alluded to is we're assuming
  • 12:48the nearest neighbor interference set.
  • 12:50So basically, it's only the person's exposure themselves
  • 12:54or the exposure of their neighbors that can impact
  • 12:58the potential outcome or affect the potential outcome.
  • 13:01We have an exchange ability assumption that applies
  • 13:05not only to the exposure for the person
  • 13:07but, also, the vector of exposures for their neighbors.
  • 13:10So we have comparability between individuals
  • 13:15who are exposed and not exposed.
  • 13:17This is, of course, conditional on baseline covariates.
  • 13:20We require a positivity assumption
  • 13:23so that there's a positive probability of exposure.
  • 13:25Each level of the covariates, again,
  • 13:27both for the individual and their neighbors.
  • 13:29And we also assume if there are different versions
  • 13:33of the community alerts, for example,
  • 13:35they don't matter for the potential outcome.
  • 13:37So it's really whether you just got the alert,
  • 13:40whether you got it as a paper flyer handed to you,
  • 13:43or you saw it as a poster,
  • 13:45we're just assuming it's the same intervention.
  • 13:49So with these assumptions,
  • 13:50we can write the potential outcome index
  • 13:52by the exposure for the individual and their neighbors.
  • 13:55And then by consistency,
  • 13:57the observed outcome is one of the potential outcomes
  • 13:59corresponding to the intervention received.
  • 14:02And there's a little bit of notation
  • 14:05that goes into the background of defining these effects.
  • 14:09But long story short,
  • 14:10we define the average potential outcomes
  • 14:13using a Bernoulli allocation strategy,
  • 14:15which is why those, when we define the spillover effect,
  • 14:19it's a wide bar.
  • 14:20And then what this effect is,
  • 14:23is it's comparing the average potential outcome
  • 14:25of unexposed individuals
  • 14:27under two different allocation strategies.
  • 14:30So that's the spillover effect
  • 14:32that is in the first paper that we worked on.
  • 14:35And then now when we're doing the power
  • 14:37and sample size stuff,
  • 14:38this is, basically, the parameter of interest.
  • 14:48In the first paper, there's two different estimators.
  • 14:51To get started with this study design stuff,
  • 14:53we're looking at the second IPW estimator,
  • 14:56which uses a generalized propensity score
  • 14:59extending work in Laura's paper from 2021
  • 15:02from a stratified estimator
  • 15:05to an inverse probability weighted estimator.
  • 15:07And we actually made the decision
  • 15:08to start with this one first,
  • 15:10because in the simulations of the first paper,
  • 15:13it actually had slightly better finite sample performance.
  • 15:16And then in actual application,
  • 15:18we were able to add more covariates
  • 15:20to this model to control for confounding.
  • 15:22So we decided to start here.
  • 15:23We'll also look at IPW-1 as a different estimator
  • 15:27for the study design stuff.
  • 15:28But we decided to start with IPW-2.
  • 15:32And IPW-2, what this does is it uses
  • 15:37a stratified interference assumption.
  • 15:39So it looks at,
  • 15:42instead of looking at the vector
  • 15:44of exposures of the neighbors,
  • 15:46it looks at SI which is the number
  • 15:47of your neighbors that were exposed.
  • 15:50Then, there's also a reducible propensity score assumption,
  • 15:54which allows us to factor that generalized propensity score
  • 15:57into a propensity score for the individual
  • 16:02and then a propensity score
  • 16:03for the neighbor's conditional on the individual.
  • 16:08I may have just mixed that up,
  • 16:09but it's on the next slide.
  • 16:11Yeah, this is the neighbor's conditional on the individual
  • 16:13and then the individual conditional on their covariates.
  • 16:17Okay, got it right (laughs).
  • 16:21So then this estimator looks like this.
  • 16:25And then just to kind of break apart what's going on here,
  • 16:29so it's an inverse probability weighted estimator
  • 16:31where we have this generalized propensity score,
  • 16:34where we have the individual exposure
  • 16:35following a Bernoulli distribution
  • 16:38with a certain probability.
  • 16:39And then the SI variable,
  • 16:41the number of the neighbors exposed,
  • 16:43following a binomial distribution.
  • 16:45And then with that reducible propensity score assumption,
  • 16:48we can factor,
  • 16:50one approach is to factor it this way.
  • 16:52And then we can use these forms
  • 16:54to estimate the propensity score.
  • 16:58And then we still have this pi term here,
  • 17:00because we're standardizing
  • 17:01to a certain allocation strategy.
  • 17:03So we're thinking about specific policies here
  • 17:06when defining the counterfactuals.
  • 17:08<v Donna>Ashley, I have a question.</v>
  • 17:09The very first equation where you have Y at IPW-2,
  • 17:16open paren zero comma alpha one.
  • 17:19What does the zero mean?
  • 17:20<v ->That means that the individual...</v>
  • 17:22So A refers to the exposure for the individual.
  • 17:26So it means the individual is not exposed,
  • 17:30possibly contrary to facts.
  • 17:32So they're all counterfactuals,
  • 17:33but the individual themselves is not exposed.
  • 17:35<v Donna>They're not directly exposed.</v>
  • 17:37<v ->I don't like the words, "Directly exposed."</v>
  • 17:40So in my mind, it's like we're either exposed or we're not.
  • 17:43I don't know, it cleans it up in my mind a little bit,
  • 17:45but I know what you're saying.
  • 17:46So the individual themselves did not receive the...
  • 17:48Let's make it in the context of the problem.
  • 17:50Individual themselves did not receive the community alert
  • 17:53from the TRIP investigative staff.
  • 17:55<v Donna>Okay.</v>
  • 17:56<v ->They may have gotten it secondhand,</v>
  • 17:57which is the whole thing we're trying to estimate.
  • 18:00So they didn't get it from the investigators,
  • 18:02but then their neighbors,
  • 18:05so these orange folks, alpha output percent of them,
  • 18:09a certain percentage of them received the alert.
  • 18:12So maybe we're interested in if 75% of your neighbors
  • 18:16were alerted versus just 20%.
  • 18:20And then there's sort of some practical considerations
  • 18:23that I try to follow in our work.
  • 18:25So we actually look at the distribution
  • 18:27of coverage of treatment for the neighbors,
  • 18:29and we only wanna be estimating effects
  • 18:31sort of within the range of what we're seeing.
  • 18:33So say 20% to maybe 60% were alerted
  • 18:38and we have a lot of data there,
  • 18:39then we could do contrast
  • 18:41for those alpha levels in the data.
  • 18:44Maybe some people feel more comfortable
  • 18:46going out of the range of data,
  • 18:47but I like to know we have information there.
  • 18:50'Cause I think a lot of the times,
  • 18:51it'll give you an estimate,
  • 18:52but it feels better knowing we have this many neighbors,
  • 18:55neighborhoods that had this type of exposure.
  • 19:00Does that make sense? <v ->Yeah.</v>
  • 19:02It does, so I don't agree with last thing.
  • 19:04(laughing)
  • 19:05<v ->Okay.</v>
  • 19:08We all have different preferences I guess (laughs).
  • 19:12<v Donna>I mean, yeah, you take that</v>
  • 19:13to its logical extreme,
  • 19:14I would say that it (indistinct) having a simple regression.
  • 19:17You would have to observe X at every single value.
  • 19:21<v ->Not every single value, but just the range.</v>
  • 19:24So say that it stops at six-
  • 19:25<v ->You don't wanna-</v> <v ->Say it stops at-</v>
  • 19:26(drowned out).
  • 19:27<v ->Yeah, yeah, say it stops at 60%,</v>
  • 19:30and then we're trying to estimate 95% coverage.
  • 19:32It almost feels too far out.
  • 19:34<v Donna>So you don't wanna extrapolate,</v>
  • 19:35but you're willing to interpolate.
  • 19:37<v ->Yeah, yep.</v> <v ->Okay, I thought you</v>
  • 19:38were saying you weren't willing to interpolate.
  • 19:40<v ->No, then the coverage levels,</v>
  • 19:41if you look at the distribution,
  • 19:42it kind of bumps around and there's some that are missing.
  • 19:44But I'm okay going over that range of the data, but-
  • 19:48<v Donna>Then I do.</v>
  • 19:49<v ->Okay, that's good.</v>
  • 19:50<v Colleague>I mean, you can still do it,</v>
  • 19:51people do it like to extrapolate,
  • 19:53but you know that the (indistinct) we'll get
  • 19:55is gonna be higher, right?
  • 19:56'Cause you don't have data there.
  • 19:58<v ->Yep.</v>
  • 20:01That's a little digression
  • 20:02from where I wanted to go with the slides,
  • 20:04but it's still interesting (laughs).
  • 20:06<v Donna>Ashley, can ask you a question about the,</v>
  • 20:08so (indistinct) design IPW-1,
  • 20:10but you said that you weren't able
  • 20:13to include more covariates (indistinct).
  • 20:16<v ->In the TRIP data.</v>
  • 20:17<v Donna>And what (indistinct)?</v>
  • 20:19<v ->So I think it has to do with,</v>
  • 20:20so just to say it's not really even on this slide,
  • 20:22but IPW-1 uses a generalized logit model
  • 20:26to estimate the propensity score.
  • 20:28And basically, that thing's kind of a bugger.
  • 20:31It's pretty sensitive it.
  • 20:32It doesn't...
  • 20:33Linear mixed models tend to do pretty well,
  • 20:36but these ones with the logit link
  • 20:38I find in practice they can be,
  • 20:41they run into these convergence issues.
  • 20:45And then this one that extended Laura's estimator,
  • 20:48in practice at least,
  • 20:49we haven't run it in hundreds of data sets or anything,
  • 20:51but the few that we have,
  • 20:53we tend to be able to add more covariates.
  • 20:55And because the nonrandomized intervention,
  • 20:57that just seems like the right thing to do,
  • 20:59because we want better control for confounding.
  • 21:03<v Donna>Thanks.</v>
  • 21:04<v ->Yours is winning.</v>
  • 21:05(laughing)
  • 21:06(indistinct)
  • 21:08At least with our team recently.
  • 21:12And that's not to say IPW-1...
  • 21:14It's a great estimator, as well.
  • 21:16It has some nice properties,
  • 21:17but there's just sort of this practical issue
  • 21:19of the generalized logit model.
  • 21:23<v Donna>Yeah, the benefit of that one, though,</v>
  • 21:24is that you don't have to assume
  • 21:26the stratified interference.
  • 21:27<v ->Right, you don't have to assume stratified interference,</v>
  • 21:29and then we don't have to make
  • 21:30this reducible propensity score assumption.
  • 21:32So pros and cons, right?
  • 21:37Yeah, and then it's interesting
  • 21:38to think about what are our practical recommendations
  • 21:40when folks have a menu of estimators to choose from.
  • 21:43What do we tell folks to do in their substantive papers?
  • 21:48Do we ask them to check both?
  • 21:50I think that's what I've been advising for now,
  • 21:52just as it's one is your main analysis,
  • 21:54one is for sensitivity analysis,
  • 21:56but I think that's another open question.
  • 22:01So I spared us all the notation on this slide,
  • 22:03but just to say the variance estimation
  • 22:07is used in the study design issue.
  • 22:09So we use M estimation here.
  • 22:11And then to do M estimation,
  • 22:13we're using the union of the connected subnetworks
  • 22:17to break up the graph.
  • 22:22But at the same time,
  • 22:23we also preserve the underlying connection.
  • 22:26So we maintained that nearest neighbor structure
  • 22:29when calculating the variance.
  • 22:31And then in the simulation study,
  • 22:33we found that accounting for that
  • 22:36as compared to just doing complete partial interference
  • 22:39was more efficient.
  • 22:41So the complete partial interference
  • 22:43would be you would assume
  • 22:45the entire component is the interference set
  • 22:48versus, here, we maintain that the neighbors
  • 22:50of the interference set.
  • 22:51But then we still leverage
  • 22:53the components as independent units,
  • 22:55because it's required for M estimation.
  • 23:01Okay.
  • 23:02So that was all the background to build up to (laughs)
  • 23:06(indistinct) to do study design in these networks
  • 23:10with these particular methods
  • 23:11that have been developed over the recent years.
  • 23:16So basically, I don't know.
  • 23:17I don't think I need to sell it to this group,
  • 23:19but to understand how features
  • 23:23of the study design impact the power is important.
  • 23:26As far as we can tell,
  • 23:27this hasn't been a real emphasis in network-based studies,
  • 23:32particularly in the area of substance use in HIV.
  • 23:34Folks kind of get the sample that they can get.
  • 23:37It's a ton of work,
  • 23:38so they're not thinking about designing them
  • 23:40like a cluster randomized trial.
  • 23:43Or even in observational studies,
  • 23:45there's some proposals where they'll wanna see
  • 23:48at least power calculations to show
  • 23:50that there's a large enough sample size.
  • 23:53So there are approaches coming out
  • 23:55in the statistics literature.
  • 23:57Of course, there are some older ones about overall effects
  • 24:00in cluster randomized trials.
  • 24:02I just put one reference there,
  • 24:03but that's a very large literature.
  • 24:05But then getting into the causal spillover effects,
  • 24:08there are some papers by Baird et al.
  • 24:11looking at a two-stage randomized design.
  • 24:13And I found another paper by Sinclair in 2012
  • 24:16that was a multi-level randomized design,
  • 24:18which kind of had the similar flavor
  • 24:20to a cluster randomized design,
  • 24:22but it was from the econ literature,
  • 24:23so they had a slightly different name for it.
  • 24:26However, when we're doing a sociometric network study,
  • 24:30these larger network-based studies,
  • 24:33it would be difficult to implement
  • 24:35a two-stage randomized design
  • 24:37just because of how folks are recruited.
  • 24:40And then we're also interested in being able to evaluate
  • 24:42interventions that are not randomized.
  • 24:45So we wanna have adequately powered studies
  • 24:48to evaluate these interventions.
  • 24:55So this overall paper,
  • 24:57we're gonna start off with simulation studies,
  • 25:00thinking about the varying the number of components
  • 25:03and the number of nodes,
  • 25:05and then changing different parameters
  • 25:07in the network including effect size,
  • 25:10features of the network like degree,
  • 25:13intracluster correlation,
  • 25:15and see how these impact the power.
  • 25:17And then lastly, trying to work on driving
  • 25:20an expression for the minimal detectable effect
  • 25:24as well as expressions for sample size.
  • 25:30So the ongoing work I'll be presenting today
  • 25:32are focusing on mostly on the first aim,
  • 25:35so simulation study to detect spillover effects,
  • 25:37varying the number of components
  • 25:39for the number of nodes in the network.
  • 25:41And then as the next step for this,
  • 25:44we have some initial results for a wall test statistic
  • 25:47and showing that that test statistic
  • 25:49is normally distributed.
  • 25:51So just an overview of how we've generated some of the data.
  • 25:54We started off by generating
  • 25:55a network with certain features.
  • 25:57Then on that network, we simulate random variables
  • 25:59and then generate the potential outcomes
  • 26:03and then, subsequently, the observed outcomes.
  • 26:05In each data set, we estimate the spillover effects using,
  • 26:08in this case we used IPW-2 and confidence intervals.
  • 26:12And then we calculate the power
  • 26:13in the empirical coverage probability.
  • 26:18(coughs)
  • 26:20Sip of water.
  • 26:25So in the first setting,
  • 26:27we're looking to see if power varies by components,
  • 26:30which I thought was a good place to start,
  • 26:32because our M estimation,
  • 26:34the effective sample size is M,
  • 26:36or the number of components.
  • 26:38So we had two different approaches.
  • 26:40We keep the component size the same
  • 26:42and increase the number of components,
  • 26:44or we fix the number of nodes
  • 26:46and then increase the number of components.
  • 26:48So the first one is really how the statistics
  • 26:51of the M estimation are working.
  • 26:53And the second one I think is empirically interesting.
  • 26:55I don't think it's as founded in the theory
  • 26:58of the estimation, just to be clear,
  • 27:02but nonetheless, I think interesting to look at.
  • 27:04<v Donna>Could you go back a second?</v>
  • 27:06<v ->Yeah.</v> <v ->So what did</v>
  • 27:07the motivating study have in terms
  • 27:09of the number of components and the number of nodes?
  • 27:13<v ->The motivating study has 10 components, 216 nodes.</v>
  • 27:18And then what we did in our first paper
  • 27:20was to try to increase the number of components.
  • 27:21We tried to break up that largest connected component using
  • 27:26network science community detection methods, which is okay.
  • 27:30I don't think it's the most satisfying answer.
  • 27:33And then once we do the community detection,
  • 27:34then we had 20 components.
  • 27:36So the actual motivating data set
  • 27:39is really 10 to 20 components, about 216 individuals.
  • 27:44<v Donna>Okay, so nodes and individuals are the same thing?</v>
  • 27:47<v ->Yep, sorry, I may have probably using those-</v>
  • 27:50<v ->No, that's okay.</v> <v ->Individual, yeah.</v>
  • 27:51216 nodes, yep.
  • 27:54<v Donna>Ashley, can ask you another question?</v>
  • 27:55<v ->Yeah.</v>
  • 27:56<v Donna>So is that in general?</v>
  • 27:57And you see that treatment, right?
  • 28:00Like in the previous slide said (indistinct) treatment
  • 28:02and potential outcomes I guess, right?
  • 28:05(indistinct) treatment.
  • 28:08So do you do that (indistinct) thing of observational study
  • 28:10like simulating the treatment from propensity score?
  • 28:14<v ->Yeah, so we fit the propensity score in the TRIP data,</v>
  • 28:19and then you'll see in a couple slides
  • 28:20I have the actual values of the parameters that we used.
  • 28:23And then we, obviously, can't fit a model to,
  • 28:26we just fit a model to the observed outcome
  • 28:28to try to get the betas for the model,
  • 28:30the potential outcome out of the TRIP.
  • 28:33Again, the motivating data.
  • 28:35Yep, good question.
  • 28:36And this is like a roadmap.
  • 28:38I'm gonna actually go through
  • 28:39a lot of detail for each one now (laughs).
  • 28:42<v Vin>Sorry, I also have a question.</v>
  • 28:43So in the simulation for component,
  • 28:45and there's nobody in that component received
  • 28:47the treatment in the simulations,
  • 28:50is that possible?
  • 28:51<v ->Yep, that could happen.</v>
  • 28:52<v Vin>And then like for that component,</v>
  • 28:54is that excluded from this,
  • 28:56because perhaps it violate
  • 28:58the positivity assumption I guess?
  • 29:00<v ->Well, it depends on...</v>
  • 29:01They would come into play
  • 29:03if you're interested in a coverage of 0%.
  • 29:07Right, so it depends on what your...
  • 29:09So that would be if you're interested in estimating
  • 29:11Y of zero with alpha equals 0%.
  • 29:17It's like a pure control group.
  • 29:19So it would be that case.
  • 29:23Yep.
  • 29:25Yeah, so we didn't exclude anyone on that case,
  • 29:28but in another paper, we did exclude...
  • 29:30We were actually looking at HIV seroconversion
  • 29:32in the other paper,
  • 29:33and we did an analysis by components.
  • 29:35So if the component had no HIV-infected individuals
  • 29:40at baseline and the components
  • 29:42in the study were not allowed to change,
  • 29:44then that was like a,
  • 29:46I forget what the epi term for it,
  • 29:47there's no way anyone can get infected.
  • 29:50So it was a perfectly protected component.
  • 29:52So we excluded those.
  • 29:54So we wanted components in that study that were at risk.
  • 29:57So we had to have at least one individual
  • 30:01in the component with HIV at baseline.]
  • 30:03so there was some chance that it could spread.
  • 30:06<v Colleague>But it seems that even if you don't exclude</v>
  • 30:08these components where no one is treated,
  • 30:10the (indistinct) weights will be very low, right?
  • 30:12<v ->Yep, they'll just get downgraded for the treatment thing.</v>
  • 30:16But then I guess it might made
  • 30:17my mind go to thinking about,
  • 30:20particularly for HIV seroconversion,
  • 30:22if you have a case where there is a really small,
  • 30:25maybe it's one of these little components,
  • 30:27and it's just these two people,
  • 30:30like the two, like a little dyad, neither have HIV.
  • 30:35I guess then, if you're assuming
  • 30:36that there's no other edges into there,
  • 30:39then there can be no events.
  • 30:41So thinking about like, you know.
  • 30:43I think it makes sense to exclude that,
  • 30:44because they're not at risk as a group, as a dyad.
  • 30:51And maybe that's another tangent (laughs).
  • 30:55Okay, so approach one.
  • 30:58We have this regular connected network with degree four,
  • 31:01which is approximately the observed degree
  • 31:03in the TRIP network.
  • 31:05And then we sampled nodes from a place on 10 distribution.
  • 31:09And then we repeat this
  • 31:10and then combine the M subnetworks to form the full network.
  • 31:15So this is the first case where we,
  • 31:19yeah, we have the number,
  • 31:21we keep the component size the same,
  • 31:22and then we're increasing the number of components.
  • 31:26Alternatively for approach two,
  • 31:28we have the same four-degree network.
  • 31:31We have M components but for a fixed set of number of nodes,
  • 31:37and then we generate the connected network,
  • 31:39and then, again, combine the subnetworks.
  • 31:45So in either case, there's sort these two scenarios
  • 31:47where we're generating the network,
  • 31:50and then we generate the potential outcomes
  • 31:53and the observed outcomes.
  • 31:55We assign random effects to induce
  • 31:57correlation within each component,
  • 31:59and then simulate...
  • 32:01We just have one binary covariate for now.
  • 32:03Of course, we wanna extend this
  • 32:04to multiple covariates, continuous covariates.
  • 32:08And then we generate the potential outcome
  • 32:09using this formula here
  • 32:11where the values of the parameters
  • 32:13are from an estimated model in the TRIP data.
  • 32:16And then we generate the treatment
  • 32:18or exposure using this per newly random variable.
  • 32:21Again, with the parameter values
  • 32:23from a model in the TRIP data.
  • 32:27And then depending on what the value of A is,
  • 32:30and A and I is,
  • 32:32then we can pull off the observed outcome
  • 32:35from the vector of potential outcomes for each individual.
  • 32:40<v Donna>I have a question.</v>
  • 32:41<v ->Yep.</v>
  • 32:43<v Donna>So earlier, you said you were</v>
  • 32:45only allowing spillover between first-degree,
  • 32:49nodes that were connected by first-degree.
  • 32:52<v ->Mn-hm.</v>
  • 32:54<v Donna>But then if you're same kind of variable</v>
  • 32:58to describe spillovers,
  • 33:00the proportion of nodes,
  • 33:03or the proportion of, I don't know what you call them,
  • 33:06participants in a component that are exposed,
  • 33:12then it's ignoring that.
  • 33:14<v ->So yeah, maybe I was mixing papers.</v>
  • 33:16In this paper, it's really the proportion
  • 33:20of the neighbors that are treated.
  • 33:21So you have each person.
  • 33:22It's the proportion of their neighbors that are treated
  • 33:25that's going to define their potential outcome.
  • 33:27<v Donna>That has to be a first-degree neighbors-</v>
  • 33:29<v ->In this-</v> <v ->Anybody (indistinct).</v>
  • 33:31<v ->In this paper you, we could extend this to second,</v>
  • 33:34third-degree, different interference structures.
  • 33:37But in this particular paper, that's how it's defined.
  • 33:39But I think what I was doing,
  • 33:40I was actually giving an example from another paper
  • 33:42where we assume partial interference by component.
  • 33:45In this paper, it's the nearest neighbor interference.
  • 33:48So the potential outcomes depend on the number
  • 33:51of the neighbors that are treated
  • 33:53out of the total, the proportion.
  • 33:56<v Donna>One other question.</v>
  • 33:57So at this point, five squared between subjects' variance,
  • 34:03what kind have ICC does that give, do you know?
  • 34:06<v ->I don't remember off the top of my head,</v>
  • 34:07but we can check.
  • 34:10And I'm trying to remember.
  • 34:11I think we got that from looking at the TRIP data,
  • 34:14but I'd have to go back and check how we landed on that.
  • 34:19But yeah, it's a good idea to check.
  • 34:27And then we estimate the spillover effect
  • 34:30and the corresponding 95% confidence interval
  • 34:33in each data set using the methods
  • 34:36that were presented earlier.
  • 34:37And then we calculate the power
  • 34:39in the empirical coverage probability.
  • 34:41We simulated across 500 data sets,
  • 34:43and we're still working on driving
  • 34:46and evaluating the test statistic.
  • 34:47So for now, we just use the confidence interval
  • 34:50to see if the null value is in the confidence interval
  • 34:52or not as a way to assess the power.
  • 34:55And then just as a sanity check,
  • 34:57we checked it in the first paper,
  • 34:58but we also look at the empirical coverage probability
  • 35:01just to make sure the estimators are behaving as we expect.
  • 35:05<v Donna>So is there a test statistic?</v>
  • 35:08<v ->It's derived and we're looking</v>
  • 35:09at the normality of it first, assessing it.
  • 35:12And then the next step, which we ran outta time
  • 35:14to do for today is we wanna redo these simulations.
  • 35:16So that's step four.
  • 35:19Sub two is based on the test statistic,
  • 35:22not the confidence interval.
  • 35:24I mean, they should largely agree,
  • 35:25but what makes me nervous is it's a confidence interval
  • 35:28for a estimation of two parameters.
  • 35:32And sometimes in that case, the confidence interval
  • 35:34may not always agree with the test statistics.
  • 35:36So it should typically, but to be...
  • 35:40I think it's correct.
  • 35:42It's more appropriate to be using the test statistic.
  • 35:46<v Vin>The confidence interval or the indirect effect?</v>
  • 35:49<v ->Yeah.</v>
  • 35:50<v Vin>So you will...</v>
  • 35:52I mean, I think there are...
  • 35:53They should agree, right?
  • 35:55<v ->But I worry about-</v> <v ->(drowned out) the null</v>
  • 35:56distribution for the test statistic.
  • 35:59<v ->Yeah.</v>
  • 36:00<v Donna>That's the main thing.</v>
  • 36:01If it's a wall test statistic,
  • 36:04then we use the null distribution,
  • 36:07which you can't do (indistinct) have
  • 36:09different statistical (drowned out).
  • 36:09<v Vin>Yeah, I see, yeah.</v>
  • 36:13<v ->So I think this is a good way</v>
  • 36:14that we got started as we're working on...
  • 36:17We first wanna evaluate we got the test statistic correct
  • 36:19before we blow through all this.
  • 36:22<v Donna>The other thing is that the robust standard errors</v>
  • 36:24are problematic in smaller samples, too.
  • 36:27And there are all these different fixes to it.
  • 36:29So I don't know if the test statistic
  • 36:31would also have that problem.
  • 36:33<v ->Yeah, potentially.</v> <v ->We've mostly seen it</v>
  • 36:35about confidence intervals.
  • 36:36Have you seen it about test statistics?
  • 36:39<v ->Yeah.</v> <v ->With the robust</v>
  • 36:40standardized- <v ->The same thing (indistinct).</v>
  • 36:42They would agree,
  • 36:43because we're always talking about,
  • 36:45assuming normality, the variance doesn't change
  • 36:48across the hypothesis (indistinct) space.
  • 36:53But then, CI here,
  • 36:54you're refer to the CI of the impact (indistinct).
  • 36:57<v ->Correct, yeah.</v>
  • 36:58<v Vin>And that's already accounting for the covariance.</v>
  • 37:01The two potential outcome estimates.
  • 37:05So if normality holds, they would agree.
  • 37:09If you can derive the normality of the estimator,
  • 37:13then the CI I think (indistinct).
  • 37:15<v ->Yeah, so we have the normality of the estimator already,</v>
  • 37:17and then in a couple slides,
  • 37:18I'll show what we have for the test statistic.
  • 37:20And I have some preliminary results showing
  • 37:22that it looks approximately normal,
  • 37:23but I don't think it's quite ready for prime time (laughs).
  • 37:29<v Donna>So then that error is reliant</v>
  • 37:30on M estimation, right?
  • 37:31<v ->Correct.</v>
  • 37:32Yep. (drowned out)
  • 37:33Yeah, and that's the AOS paper.
  • 37:35All the M estimations worked out for this.
  • 37:38The IPW-2, for example.
  • 37:39<v ->Right.</v> <v ->Yep.</v>
  • 37:44In our first results, we actually had a,
  • 37:49this is a smaller, yep, smaller effect size.
  • 37:50The effect size is -0.1,
  • 37:51and this is on the different scale.
  • 37:53So the smaller effect size,
  • 37:55the power was actually surprisingly low.
  • 37:58Even as we increased the number of components,
  • 38:00it didn't even reach 40%.
  • 38:02Although, the coverage of the estimator was approximately
  • 38:05where we'd expect it to be performing.
  • 38:08So the next thing we looked at was changing the effect size,
  • 38:12making the effect size,
  • 38:13in this case, actually making it larger
  • 38:15and seeing how that impacts the power.
  • 38:19So we basically picked...
  • 38:22There's the supplemental slide if anyone has questions,
  • 38:24but we have the original effect size,
  • 38:27the largest effect size that we could obtain
  • 38:29in this particular simulation setting,
  • 38:32and then something in between.
  • 38:34So we see as we increase the effect size
  • 38:36that the largest effect size is -0.42.
  • 38:40That actually achieves 80% power.
  • 38:42Excuse me.
  • 38:43A little bit, actually, it's right around 20 components.
  • 38:47But then as we see, as the effect size gets smaller,
  • 38:50it's harder for it to achieve that 80% power level.
  • 38:56So I thought that was kinda interesting.
  • 38:58And then approach two.
  • 39:01We wanted to see changing the number of components
  • 39:05for a fixed number of nodes.
  • 39:07So here, we fixed a hundred, 300, 600, or a thousand nodes,
  • 39:12and we see it doesn't really matter so much
  • 39:14how many components are in the problem,
  • 39:15which was a little bit surprising to me.
  • 39:17So this is preliminary results.
  • 39:19I'm not sure if this is gonna hold up as we keep
  • 39:21pulling on the threads here, just as a disclaimer.
  • 39:25But we see that with a hundred nodes,
  • 39:30it doesn't achieve the appropriate power.
  • 39:34Once we get up to 300 nodes
  • 39:39and a thousand, sorry, 600 nodes,
  • 39:41and then a thousand nodes,
  • 39:42we see it's at 80% power or higher.
  • 39:46<v Donna>So just to say cluster randomized designs,</v>
  • 39:50in certain structures, you can find that no matter how much,
  • 39:54like if you say the components are like the clusters,
  • 39:58and then the nodes are like
  • 39:59the number of people in that cluster,
  • 40:00you can have a situation where,
  • 40:02for a fixed number of components,
  • 40:04no matter how many people you put into each component,
  • 40:11you have an asymptote.
  • 40:12Never get to the power you want.
  • 40:14The only way to get to it is by increasing components.
  • 40:18But you're finding an asymptote with components.
  • 40:22<v ->Yeah, but here this is the number of people overall</v>
  • 40:26in the whole study, not per component.
  • 40:29So this was a little bit surprising
  • 40:31that it seems to be a bigger driver
  • 40:34is just the number of people enrolled in the network
  • 40:36regardless of the number of components.
  • 40:39<v Donna>So you fixed the total number of units,</v>
  • 40:41and essentially you have them divided
  • 40:45into different numbers of components.
  • 40:47<v ->Yep.</v>
  • 40:48<v Donna>And you're seeing that it doesn't change how many</v>
  • 40:50components (indistinct). <v ->Yeah,</v>
  • 40:51which I also acknowledge that's an artificial thing
  • 40:53that probably would never happen in the real world, right?
  • 40:56Because say we enroll 600 people,
  • 40:59we can't force them into different sets
  • 41:02of partners to get the statistics to work.
  • 41:04So this is a very theoretical thought exercise.
  • 41:08<v Vin>I also wonder if it's a function</v>
  • 41:10of the residual correlation you were specifying
  • 41:12in the simulation study.
  • 41:13<v ->The random effect?</v>
  • 41:15<v Donna>Yeah.</v>
  • 41:17<v ->Interesting.</v> <v ->'Cause that'll definitely</v>
  • 41:19affect the effect sample size, right?
  • 41:20<v ->Mn-hm.</v> <v ->Yeah.</v>
  • 41:21<v Vin>So maybe it's relatively small</v>
  • 41:23and doesn't really matter in this simulation,
  • 41:25and that could be-
  • 41:26<v ->Oh, so if we-</v> <v ->a possibility.</v>
  • 41:27<v ->If we increase the amount of correlation in the component,</v>
  • 41:30this story could be very different.
  • 41:32<v Donna>It might but might not.</v>
  • 41:33So that's something to check maybe.
  • 41:35<v ->Yep.</v>
  • 41:36That's why, yeah, another disclaimer.
  • 41:38This is very preliminary.
  • 41:39And I think even at the end I remind us
  • 41:41that needs more investigation.
  • 41:43<v Vin>Right, but it's cool,</v>
  • 41:44because I guess the cost of randomized design
  • 41:46is sort of a limiting design in some sense.
  • 41:49They probably would not have
  • 41:50the same outputting (indistinct) anyways.
  • 41:54That's good to-
  • 41:55<v Colleague>What's the minimum number</v>
  • 41:56of components you could use?
  • 42:01<v ->Looking at the dots, it looks like she went</v>
  • 42:02all the way down to maybe about two,
  • 42:05but it depends on, looks like there's a...
  • 42:07Depending on which number of nodes you have,
  • 42:10she looks at different numbers of components,
  • 42:12because when Ke generated it, it's from here.
  • 42:19Yeah, the cluster size is the number of nodes
  • 42:21divided by the number of components.
  • 42:24<v Colleague>So I'm wondering, with these few components</v>
  • 42:27(indistinct) specified?
  • 42:31<v ->Yeah, we should.</v>
  • 42:33Based on other results, it should be.
  • 42:35We start to see good coverage around 50 components.
  • 42:39<v Colleague>That's what I see.</v>
  • 42:40<v ->Yeah.</v>
  • 42:41<v Donna>But I think it would depend</v>
  • 42:42on if the cluster randomized designs
  • 42:43or anything like this would also depend on the ICC.
  • 42:47Because if that ICC is zero,
  • 42:50then you could have one component (indistinct)
  • 42:52is equivalent to, again, a noncluster design.
  • 42:56<v ->Yeah.</v> <v ->Yep.</v>
  • 43:02Okay, so here's the preliminary results
  • 43:03for the wall test statistic.
  • 43:05So I changed the notation a little bit here
  • 43:08just to make this easier to read.
  • 43:09So uber expressed, the estimator is this theta hat.
  • 43:12Based on the AOS paper, we have that this will converge
  • 43:15in distribution to a multivariate normal.
  • 43:17And then we actually have an estimator
  • 43:20of the variance in that paper, as well.
  • 43:27Yeah, and then building a wall test statistic
  • 43:30from that parameter, we have a form that looks like this.
  • 43:34And then actually in the AOS paper,
  • 43:35just a minor note is the normalizing constant
  • 43:38of one over M is tucked into the sigma term.
  • 43:42I had to go back and double check that yesterday.
  • 43:44So then we have a wall test statistic
  • 43:46that's a form like this.
  • 43:47It should follow a normal distribution.
  • 43:54So then we started looking at this
  • 43:56empirically across the simulations.
  • 43:59And this looks, to my eye, to be approximately normal.
  • 44:02And what we're working on now,
  • 44:04the results aren't quite ready,
  • 44:05is actually doing a test for a normality
  • 44:08like a Kolmogorov-Smirnov test
  • 44:10to test for normality across these different scenarios.
  • 44:14So we're working on those results now,
  • 44:16and that's something we wanted to confirm
  • 44:18before we fold it into the rest of the simulations.
  • 44:23<v Donna>That test has very low power (indistinct).</v>
  • 44:25<v ->Low power?</v>
  • 44:27Yeah, and then there's other tests too,
  • 44:29but some of 'em are-
  • 44:30<v Donna>I think they all have low power.</v>
  • 44:32<v ->Yeah.</v>
  • 44:34So if anyone has any other thoughts about that,
  • 44:36about how to evaluate.
  • 44:37Like we derived this, but how do we-
  • 44:40<v Donna>In some sense, your simulations will tell you,</v>
  • 44:42because the property's relying
  • 44:45on that (indistinct) normality.
  • 44:47And so if you don't have 5% type one error,
  • 44:51and then you know (indistinct),
  • 44:53you now have...
  • 44:55I guess that would be the main thing
  • 44:56would 5% type one error.
  • 45:02<v Vin>I think maybe another way to visualize</v>
  • 45:04that is to try to increase the M,
  • 45:09and then actually gradually see if that looks more normal.
  • 45:12I guess that's just-
  • 45:13<v ->Yep.</v>
  • 45:15<v Vin>And I think people tend to do something like that.</v>
  • 45:18When they check convergence rate,
  • 45:20they would probably do something like plot
  • 45:23the results along with the sample size
  • 45:25and see how well they converge.
  • 45:28And then the limiting end would correspond
  • 45:29to the perfect results,
  • 45:31and then you'll see more of a bell curve shape.
  • 45:33But I think right now, looking at these 10 iterations,
  • 45:36it's a little spiky sometimes.
  • 45:37<v ->Yeah, and it doesn't seem...</v>
  • 45:38Like this one down in the far corner
  • 45:40is already a hundred components,
  • 45:41and it doesn't really seem like it's getting too much...
  • 45:45I mean, these are at least, yeah.
  • 45:48There's not a trend of constant-
  • 45:49<v Vin>(drowned out) specified model, right?</v>
  • 45:51It's definitely correctly specified
  • 45:54propensity score models and everything-
  • 45:55<v ->Should be, but we can double check.</v>
  • 45:57<v Vin>So the simulation models</v>
  • 45:59are basically identical to the models (drowned out).
  • 46:01<v ->Yep.</v>
  • 46:04<v Donna>But the spiking,</v>
  • 46:05this also just depends arbitrarily on the event size?
  • 46:08<v Vin>Yeah, that's right.</v>
  • 46:09<v Donna>So you could make it look very spiky</v>
  • 46:11if you have bigger events.
  • 46:13<v Vin>Right, and (indistinct)</v>
  • 46:14you could even Q-Q plot events sometimes.
  • 46:15<v ->Yeah.</v> <v ->Yep.</v>
  • 46:17(drowned out)
  • 46:18Vin says Q-Q plot.
  • 46:21(Donna laughs)
  • 46:22(indistinct)
  • 46:23Okay.
  • 46:25So that's the direction where we're heading in with this.
  • 46:28From simulation two, we also have some preliminary results.
  • 46:31So this is fixing the number of components
  • 46:33and varying the number of nodes.
  • 46:36In here, we see power increases with the number of nodes,
  • 46:41but we don't see any variation
  • 46:44between the number of components.
  • 46:46So the power is plotted against the number of nodes
  • 46:48and each line represents a different number of components,
  • 46:52which I think kind of echoes the other results
  • 46:54that we were seeing earlier in the talk.
  • 47:01<v Donna>That's the opposite of cluster randomized trials,</v>
  • 47:04'cause you're getting a lot of power by increasing nodes,
  • 47:08and you're barely seeing any impact of components.
  • 47:10Whereas with cluster randomized trials,
  • 47:12it's all in the clusters,
  • 47:14and it doesn't matter that much after a relatively
  • 47:17small number of people within cluster.
  • 47:19<v Vin>Right.</v>
  • 47:20<v ->Which this is still very surprising to me,</v>
  • 47:22because the M estimation,
  • 47:23the effective sample size is the number of components.
  • 47:26So yeah, this is pretty surprising.
  • 47:29<v Vin>(indistinct) interested to really check</v>
  • 47:30how that changes or not changes with the-
  • 47:32<v ->The IC?</v> <v ->Yeah.</v>
  • 47:34<v ->Change the. (drowned out)</v>
  • 47:36Yes. <v ->Yeah.</v>
  • 47:40<v Donna>What is the outcome?</v>
  • 47:41Like sort of this idea in this simulation,
  • 47:43what were you thinking of?
  • 47:45Is it a binary or a continuous?
  • 47:47<v ->Binary HIV risk behavior.</v>
  • 47:51So yeah, whether the person reports,
  • 47:52specifically injection risk behavior.
  • 47:57And then the intervention,
  • 47:58all the effects that we're looking at are negative,
  • 48:00because the intervention should be reducing the behavior.
  • 48:06<v Donna>Yeah, so with an ICC of 0.5 times 1 minus 0.5,</v>
  • 48:10that's the maximum amount of binomial variants.
  • 48:13So this should be...
  • 48:14The simulation is done under a very high ICC.
  • 48:20Like it might be the highest possible with binary-
  • 48:22<v ->For that binary data, yep.</v>
  • 48:27Okay, so zooming out a little bit,
  • 48:29thinking about network study design in practice,
  • 48:33some of the things that might come out of this work.
  • 48:35So there are definitely features that can be planned
  • 48:37when designing the study, right?
  • 48:39So we could increase the number of components
  • 48:41by having multiple sites or multiple cities
  • 48:44contributing to one particular study.
  • 48:49Although, that's, you know, can be very costly,
  • 48:50very time consuming.
  • 48:52We can, of course, increase more individuals recruited,
  • 48:55but that depends on who,
  • 48:57'cause it's a network study, who are their contacts,
  • 48:59if they don't have contacts
  • 49:00to kind of come to an end in the network.
  • 49:03We can try to ensure distance between components some way.
  • 49:06And I put distance in quotes,
  • 49:08'cause that could mean all sorts of things,
  • 49:10not just geographical distance.
  • 49:12And then we have some control
  • 49:13over the intervention treatment.
  • 49:15What proportion do we want to expose to the intervention?
  • 49:19And then I was thinking about features
  • 49:20that likely cannot be planned,
  • 49:22'cause maybe someone's really creative.
  • 49:24And we could think about ways
  • 49:25that these could be manipulated.
  • 49:28So once we have a given set of individuals,
  • 49:31pretty sure we can't force them into different components,
  • 49:34unless we're doing, actually now that's coming to my mind,
  • 49:37unless we're doing a network intervention
  • 49:39that's meant to change the edges.
  • 49:41Then, we would have some control
  • 49:42over who's interacting with whom,
  • 49:45but that's a little bit complicated,
  • 49:46because then your structure is intertwined
  • 49:48with your intervention.
  • 49:51The features of the network like degree,
  • 49:53centrality, intracluster correlation,
  • 49:56we don't have control over those.
  • 49:58Who's connected to whom:
  • 49:59these are individual sexual and drug partnerships.
  • 50:02We don't have control over that.
  • 50:04What the effect sizes are
  • 50:05or what the outcome prevalence is in the particular study.
  • 50:08<v Donna>Well, you can't choose your study population,</v>
  • 50:12though, to have certain of these characteristics.
  • 50:15You can't change them.
  • 50:17Let's say you could do a study
  • 50:18of 10 different kind of places, communities,
  • 50:21and some might be more-
  • 50:23<v ->Different outcome prevalences or-</v>
  • 50:25<v Donna>Yeah, or different degrees of centrality,</v>
  • 50:27and they could have different ICCs and all of that.
  • 50:31So if people know what's important,
  • 50:33they could look for study populations that have the features
  • 50:37that will maximize power of the study.
  • 50:39<v ->Yep, that's a good point.</v>
  • 50:42That's why I said likely,
  • 50:43'cause I knew Donna would think of something.
  • 50:45(laughs)
  • 50:47(indistinct)
  • 50:51<v Colleague>What about the propensity score?</v>
  • 50:52You also don't have control.
  • 50:57<v ->Yeah, I mean that's the...</v>
  • 50:59It was non-randomized intervention.
  • 51:01So it's what the folks are are choosing or being exposed to
  • 51:05and then just their observed covariates.
  • 51:08<v Donna>Oh, there's one way, just randomize them.</v>
  • 51:12(drowned out) (laughing)
  • 51:13In epidemiology, we always talk about this
  • 51:15as one of the ways to control confounding,
  • 51:18which is to choose a homogeneous population
  • 51:20so you have no variation in the risk factors,
  • 51:24and that lowers the amount of confounding.
  • 51:28You might lose the ability to externally channelize,
  • 51:31but you'll reduce confounding.
  • 51:38<v ->Yeah, so I think there's a lot of thinking</v>
  • 51:41and papers that need to be written for design in networks.
  • 51:45I mean, I think in designing trials
  • 51:47and designing cluster randomized trials,
  • 51:48even thinking about observational studies,
  • 51:51I think it's clear to me how you have
  • 51:53more control over certain things.
  • 51:55But then here, I think there's a lot of work
  • 51:58to think about how do we take...
  • 52:01It's just in the beginning
  • 52:02with some of these statistical results,
  • 52:04but how do we take these statistical findings
  • 52:06and translate them into something that folks
  • 52:08can actually use in study designs,
  • 52:10grant proposals for network-based studies in public health.
  • 52:14So I think that's a call to action
  • 52:15to some of the folks in the room and on Zoom.
  • 52:21So just some highlights from what we found so far.
  • 52:23So the power for estimating spillover effects
  • 52:25increases with more nodes or larger effect sizes.
  • 52:30It requires, of course, more investigation
  • 52:32like we've been discussing today.
  • 52:34There's some things we need to look into,
  • 52:35but the number of components may have less impact on power,
  • 52:39but that requires looking at some additional features.
  • 52:42When the effect size is large enough,
  • 52:44the spillover effect has reasonable power.
  • 52:46And then in the initial setting,
  • 52:47that was even with only 20 components.
  • 52:50And then just as a sanity check,
  • 52:52we saw the empirical coverage probability
  • 52:55was around the nominal level
  • 52:57as we would expect from our earlier paper.
  • 53:01So future directions.
  • 53:02We wanna keep looking at the impact
  • 53:04of other design parameters on the power,
  • 53:07continue working with this test statistic
  • 53:10and making sure it's performing as we expect,
  • 53:13and then using it in the simulation study
  • 53:15and working on getting a minimal detectable effect size,
  • 53:18as well as number of individuals
  • 53:21and/or components required for adequate power.
  • 53:24And if we have confined closed forms,
  • 53:26we'll have those expressions.
  • 53:27If not, we'll have some simulation-based programs
  • 53:29to look at this.
  • 53:30And then we want to...
  • 53:32We've done some kind of back-of-the-envelope things
  • 53:34in thinking about the power that we might have had
  • 53:36in TRIP to detect these effects
  • 53:38but doing that more carefully and formally.
  • 53:42And then last was sort of the issue
  • 53:43I was talking about at the end
  • 53:44is all of these statistical results are really interesting
  • 53:48and exciting for folks like us,
  • 53:50but then how do we make it practical
  • 53:52and useful and something that individuals
  • 53:56can use in their grant writing
  • 53:57when getting their network based studies funded.
  • 54:01Okay, and then this is my shameless plug.
  • 54:03If you thought this talk was interesting,
  • 54:06we're going to have an online workshop hosted by my group
  • 54:10at URI on Friday, March 10th from 2:00 to 5:00.
  • 54:13It's free and we have a star-studded lineup
  • 54:17of speakers that'll be joining for the workshop.
  • 54:20And I have some flyers,
  • 54:22and I can email the flyer around, as well.
  • 54:24<v Donna>We can circulate everything.</v>
  • 54:26(indistinct) also.
  • 54:28<v ->Yeah, that'd be great.</v>
  • 54:30Yeah, so welcome everyone on the call,
  • 54:31everyone in the room to join,
  • 54:34and I think it'll be a really informative
  • 54:36and interesting afternoon.
  • 54:38And if you're interested in this methods area,
  • 54:40it'd be a nice way to get caught up
  • 54:42on some of the literature
  • 54:44and start thinking about how you can use this
  • 54:46in some of your work.
  • 54:49So just a couple of references, as well.
  • 54:53And I know I've been taking questions as we go along,
  • 54:55but if there's any other questions from the audience,
  • 54:58happy to discuss.
  • 55:02<v Vin>So it's interesting to see that the component size</v>
  • 55:04doesn't have a very strong effect on the power,
  • 55:07but do you think in reality we need also consider
  • 55:12variability in that component size?
  • 55:14'Cause we always see a huge component.
  • 55:16<v ->Yep, that's a really good point.</v>
  • 55:18<v Vin>But there are a lot of very small components.</v>
  • 55:20<v ->Yep. (drowned out)</v>
  • 55:22Yep, great point.
  • 55:24And particularly in these HIV risk networks,
  • 55:28I mean, it's not like there's hundreds of them,
  • 55:29but the handful that we have and we've been able to look at,
  • 55:33there is a lot of variability.
  • 55:34We have, usually, there's one giant connected component
  • 55:37and then these smaller components.
  • 55:39And of course, whether or not that's the real network,
  • 55:42that's some of Laura's work, right?
  • 55:45These smaller components may actually even be connected
  • 55:47to the larger component,
  • 55:48or they might be connected to each other, as well.
  • 55:50But in this work, we assume that the network we observe
  • 55:54is the truer known network for now
  • 55:58just so we can look at some of these other issues.
  • 56:00But of course, there's always the caveat
  • 56:01that the network itself is mismeasured.
  • 56:04<v ->'Cause they-</v> <v ->Ashley, there's a bunch</v>
  • 56:05of things up in the chat maybe.
  • 56:07Just to give other people a chance.
  • 56:09<v ->Sure.</v> <v ->Some of it might have to do</v>
  • 56:11with the beginning when we were having technical problems,
  • 56:13but it might have some questions.
  • 56:16<v ->"See you have some technical problems.</v>
  • 56:17Slide's not moving."
  • 56:18Oh, and then thanks, Gabby.
  • 56:20Gabby's part of the URI team.
  • 56:22She put in a link to register for the workshop.
  • 56:25We actually,
  • 56:27we just have a couple survey questions as you register,
  • 56:29because what we wanna do is try to tailor
  • 56:30the content to the folks that are showing up.
  • 56:34So there's just a couple of quick questions,
  • 56:35and then that's all you have to do.
  • 56:37It's free (laughs).
  • 56:39Just answer a little survey.
  • 56:40And then Gabby put a link for some more details
  • 56:44about the workshop, as well.
  • 56:48(indistinct)
  • 56:48<v Donna>This has gotta be our last question,</v>
  • 56:50'cause we're down to 12.
  • 56:51<v Vin>Yeah, just a short comment.</v>
  • 56:52I think there's a potential to make this work more impactful
  • 56:55is that it doesn't have to be attached to IPW-2 I think,
  • 57:00because you're providing a simulation framework.
  • 57:01And theoretically, one can fit other IPW estimators,
  • 57:05certified estimators, regression based estimators,
  • 57:07and even double robust estimators.
  • 57:09And I would also imagine that they could have
  • 57:12different operating characteristics,
  • 57:14and so the impact of M and N could also,
  • 57:19that could also be specific
  • 57:21to not only the simulation parameters we choose,
  • 57:26but also to the estimators we choose.
  • 57:28I think it's an underappreciated point,
  • 57:31but it's very important to emphasize is that the power
  • 57:35we calculate is always gonna be based on approach.
  • 57:38<v Donna>It's true that it's underappreciated.</v>
  • 57:40Surprisingly, right?
  • 57:41<v Vin>Yeah, like you could say I use the approach</v>
  • 57:43to consider IPW-2 based power,
  • 57:46but I think a regression based approach
  • 57:48in terms of power would be different.
  • 57:50It's actually very specific, too.
  • 57:51And also, it curves to show could have some difference.
  • 57:55<v ->Yeah, that's interesting.</v>
  • 57:56So we can start, 'cause we have IPW-1, IPW-2 ready to go.
  • 57:59So we could start, for this work, we could look at that.
  • 58:02But I think maybe an idea would be to write the code.
  • 58:05Like if we have our programs that we're gonna share for this
  • 58:07to write it flexible enough so that the user-
  • 58:10<v Vin>That's something people should be able to choose.</v>
  • 58:11Or even if you have a estimate or specific program,
  • 58:14that should be sort of emphasized and clarified.
  • 58:17'Cause as a very simple example, if you are,
  • 58:20like in the cluster (indistinct) literature,
  • 58:22if you're assuming working independence
  • 58:24and working exchangeable,
  • 58:25the results can be very different
  • 58:26in terms of the efficiency.
  • 58:28And the extent to which the cluster size variation
  • 58:33impact the study power is also specific to whether you adopt
  • 58:37a independence working correlation or an exchangeable.
  • 58:41So sometimes, we have a unified conclusion,
  • 58:45but that's almost always coming from a specific estimate
  • 58:51and cannot really be overly generalized.
  • 58:53<v ->Yep, yeah, that's a great point.</v>
  • 58:54Thanks, Vin.
  • 58:56<v ->Well, this was a really interesting seminar, Ashley.</v>
  • 58:59<v ->Thank you.</v> <v ->You presented it</v>
  • 59:00very clearly,
  • 59:01so we really appreciate it.
  • 59:03Thank you so much and thanks to everybody else (indistinct).
  • 59:06<v ->Thank you, thanks, everyone.</v>
  • 59:13<v Donna>So go ahead and close the Zoom.</v>
  • 59:14<v ->Sure, yeah, thanks, everyone, for joining.</v>
  • 59:16We hope to see you at the online workshop.