Power and Sample Size Calculations for Evaluating Spillover Effects in Networks with Non-randomized Interventions

July 07, 2023

Information

Speaker: Ashley Buchanan

February 10, 2023

Co-sponsored by the Department of Biostatistics, Yale School of Public Health

ID10112

To CiteDCA Citation Guide

00:59<v Vin>Donna's looking over it.</v>
01:00I'll just start.
01:04So can we hear us okay online?
01:08<v Donna>Yeah, if you want you can go to the podium.</v>
01:10<v ->Yeah.</v> <v ->Okay.</v>
01:11'Cause this is last-minute,
01:13so I need to just get your bio (indistinct).
01:15(laughing)
01:18So, hi, everyone.
01:20It's my pleasure to welcome Dr. Ashley Buchanan
01:24today as our speaker in this seminar series.
01:27And Dr. Buchanan is associate professor of biostatistics
01:31in the Department of Pharmacy Practice
01:34in University Rhode Island
01:35and also as an adjunct in Brown University Biostatistics.
01:40Hi, Donna. <v ->Hi.</v>
01:41<v Vin>And she specializes in the area</v>
01:43of epidemiology and causal inference.
01:45And she has a lot of experiences
01:47collaborating on HIV/AIDS research,
01:49work closely with colleagues both domestically
01:52and internationally to develop
01:54and apply causal methods to improve treatment
01:57and prevention of HIV and AIDS.
01:59And without further ado,
02:02I'll give the floor to you, Ashley.
02:04(indistinct)
02:06<v ->Thanks, Vin, for that nice introduction.</v>
02:09And thanks for the invitation, Donna,
02:10to speak at (indistinct) today.
02:12It's nice to be here in person with folks
02:15that I normally just see on Zoom.
02:17So great to be here.
02:18And welcome to all the folks on Zoom, as well.
02:22Just get my slide.
02:26I see that the slides are already sharing.
02:29Let's do the slideshow.
02:32Oops.
02:34<v Vin>It's the lower right.</v>
02:36It's a little-
02:37<v ->Is this gonna work? (drowned out)</v>
02:41Can the folks on Zoom still see the slides?
02:44<v Vin>You did share, right?</v>
02:45<v ->Yeah, I think it's sharing.</v>
02:47<v Gabrielle>We see a full screen.</v>
02:48(indistinct)
02:50<v ->Okay, great.</v> <v ->Perfect.</v>
02:53<v ->Okay, so today, I'm gonna be presenting work</v>
02:56about study design, power,
02:57and sample size calculation for evaluating
03:00spillover in networks in the context
03:02of the interventions not randomized.
03:05This is definitely work in progress, ongoing work.
03:08So we have some initial simulation results
03:12and some promising findings
03:13and then a lot of open questions
03:15that I'd love to have some discussion about towards the end,
03:19sort of about where the practical world
03:21meets the statistical world,
03:23and how can we bring these ideas into practice
03:26for designing these network type studies.
03:30I'd like to start off with acknowledgements.
03:32So Ke Zhang is a graduate student at URI,
03:36and she's been primarily leading
03:38a lot of the simulation work.
03:39She's been a key individual in this work.
03:43We also have collaborators, Doctors Katenka, Wu, and Lee.
03:46And then I also wanna thank a larger list of collaborators
03:49that have been part of this ongoing work with Avenir,
03:52including Dr. Lee, Forastieri,
03:54Halleran, Friedman, and Nichopoulos.
03:58And then just to acknowledge our funding support
04:00and funding support that collected the motivating data set.
04:07So an outline for today,
04:10I'm gonna give a little bit of background
04:11and talk about the motivating study of TRIP,
04:13talk about the objectives of this particular work.
04:16And then we'll look at some of the simulation results
04:19and then discuss conclusions and future directions.
04:24So this work is focused on people who inject drugs,
04:28and these individuals are at risk for HIV
04:31due to drug use, sharing equipment,
04:34and sexual risk behaviors.
04:36In addition, these individuals are often part of networks.
04:39So when they receive an intervention,
04:42the intervention can benefit not only them
04:44but their partners and possibly even beyond that.
04:47So in these networks, interventions often have
04:50what's known as spillover effects,
04:51sometimes called the indirect effect
04:53in interference literature.
04:55So spillover,
04:58historically in the causal inference literature,
05:00it's been called interference.
05:02Here I'll be calling it spillover.
05:04So that's when one individual's exposure
05:06affects another's outcome.
05:09And recently, there's been several papers
05:12that have been looking at how do we assess
05:14these spillover effects in network studies.
05:21So our motivating study
05:22is the Transmission Reduction Intervention Project.
05:25This was a network-based study of injection drug users
05:27and their contacts in Athens, Greece, 2013 to 2015.
05:32And the individuals were connected
05:35through sexual and drug use partnerships.
05:37The original study was focused on using
05:40this new network tracing technique
05:42to find recently infected individuals
05:44and get them on treatment.
05:46So the idea is when individuals are acutely infected,
05:49they're more likely to transmit.
05:51So if we can find more
05:52of these recently infected individuals,
05:54get them on treatment,
05:55they'll be less likely to infect their partners.
05:58And the punchline from the main study
05:59was this was very successful in finding
06:01more recently infected individuals.
06:05<v Ke>Excuse me.</v>
06:06<v Ashley>What?</v>
06:07<v Ke>I'm so sorry for the bothering,</v>
06:09but from my end, the slides are not moving.
06:13<v ->Not at all, okay, let me try again.</v>
06:15One second. (indistinct)
06:17(Donna laughing)
06:19<v Donna>For one more day (laughs).</v>
06:22<v Vin>At least the (indistinct), so that's okay.</v>
06:24<v ->Yeah, yeah, we haven't made it too far.</v>
06:26(laughing)
06:28<v Donna>Thanks for telling us.</v>
06:28<v Vin>Thanks for letting us know.</v>
06:34How 'bout now?
06:37<v Gabrielle>Yep, we can see the motivating study slide.</v>
06:40<v ->[Donna And Ashley] Okay.</v>
06:41Is it the slide?
06:42Is it in presentation view or is it the slide?
06:45<v ->On the right-hand side,</v>
06:46we can see the next slide and then some notes.
06:50<v ->Oh, so it's in presentation.</v>
06:51I mean, that's not the worst thing,
06:52but sometimes, it's better if they can
06:56just see the whole slide (laughs).
06:58Sorry about that.
07:03<v ->Think you'll have to maybe go</v>
07:04out of the presentation mode.
07:07<v ->Exit presentation mode.</v>
07:08<v Vin>Yeah, so then it's the same in the computer</v>
07:12and the screen sharing.
07:27<v ->Sorry.</v>
07:30How do you do it, Vin?
07:31<v Vin>Just that little button, yeah.</v>
07:33You're actually on it right now.
07:35<v ->I think they're still see-</v> <v ->If you could just click</v>
07:36on that.
07:37<v Donna>Or you can go to the top bar, too, I think.</v>
07:39And there we go. <v ->No, I think they'll</v>
07:40still see that.
07:41<v ->And then I think over here,</v>
07:42maybe there's a way to even exit presentation mode.
07:47(indistinct)
07:49It's that.
07:50<v ->(indistinct) slidehow.</v>
07:53(indistinct)
08:00<v ->(indistinct) if there's any.</v>
08:04(indistinct) presenter view.
08:06<v ->There we go.</v>
08:08<v ->Okay, thanks, Vin.</v>
08:08<v ->Does that look okay for (drowned out)?</v>
08:10(laughing) (indistinct)
08:12<v Gabrielle>Yep, now, it's in presentation mode.</v>
08:15<v ->Okay, great.</v>
08:17Sorry about that, thanks for your patience.
08:20So where were we; so we were talking
08:21about the Transmission Reduction Intervention Project.
08:24So this worked well to find
08:26these recently infected individuals
08:28and refer them to treatment.
08:29So it was this successful strategic network
08:32tracing approach.
08:33In addition in this study,
08:34they also delivered community alerts.
08:36So if there is an individual
08:38who was recently infected in the network...
08:41Get this outta the way so you guys can see the figure.
08:44There's an individual who was recently infected
08:46in the proximity of a particular individual in the network,
08:50these community alerts would be distributed,
08:52which were basically flyers, handouts,
08:55or flyers even posted on the wall of frequented venues.
09:00So then individuals in the network
09:02either received these community alerts
09:04from the investigators or they did not.
09:06So the little red dots are those individuals
09:09who received the alerts.
09:10And then the blue ones are those who were not alerted.
09:14And then we looked at this in our previous paper.
09:17We looked at the spillover effects of the community alerts
09:19on HIV injection risk behavior at six months
09:23to see if receiving this alert yourself
09:26reduced your injection risk behavior.
09:27Or if you had contacts who were alerted,
09:31then did that information spill over to you,
09:33and then you also reduced your injection risk behavior?
09:44<v Donna>So is that the actual network, that picture?</v>
09:47<v ->Yep, that's the visualization of the network among...</v>
09:50There's some missing data
09:51and this problem system among the individuals
09:53that had all the outcomes observed.
09:57Okay, good, the slides can move.
10:00So I'm just gonna,
10:01for those who are not familiar with networks,
10:03I'll define some terminology using this slide.
10:06So this is a visualization of the network here,
10:09the TRIP network.
10:10There's 216 individuals here.
10:14So the individuals are denoted by the blue dots.
10:16Those are people who inject drugs
10:18and their sexual and drug use partners.
10:20And then the edges represent when two individuals,
10:24or nodes, share a partnership.
10:26And we call those connections edges sometimes.
10:29And then the little pink one is an example of a component.
10:34So that's a connected subnetwork
10:36for individuals in that group are connected
10:38to each other through at least one path
10:40but not connected to others in the network.
10:43So right away, we see that TRIP primarily comprised
10:45this one, large, connected component
10:48and several other small components.
10:50We can sort of see them out on the edges of the network.
10:54And then when we zoom in on the component,
10:57the individual in red is the,
11:01we'll call that the index person.
11:03And then the individuals shaded
11:05in this lighter pink are their neighbors
11:08or their first-degree contacts.
11:10So as I go through presenting these methods,
11:12there are some times when I'll be talking about components.
11:14And then in terms of defining the spillover effects,
11:17in this particular paper,
11:18we defined it using the exposure of the nearest neighbors.
11:24<v Donna>By nearest neighbors,</v>
11:25you mean just first-degree (drowned out)?
11:26<v ->First-degree, yeah, it may be said</v>
11:28even more applied to their partners.
11:31<v ->Okay.</v> <v ->Right, so we're really</v>
11:32thinking about their immediate partners,
11:36and these would be individuals
11:37that they either used drugs with or had sex with,
11:39and they reported that in the study
11:41for that edge to be there.
11:43Yep.
11:47So a little bit of notation.
11:49So we have N is denoting the participants in the study.
11:53A is going to be the intervention
11:55based on the community alerts in our example.
11:57We have baseline covariates,
11:59and then we index the neighbor, the partners who were...
12:02I guess in the networks they call it the neighbors.
12:04But in this case, it's really just their partners,
12:06set of participants that share an edge
12:09or partnership with person I.
12:12We have the degree.
12:13And then we have a vector
12:14of the baseline covariates for the neighbors,
12:16vector of baseline covariates for...
12:19Sorry, the treatment for the neighbors,
12:21baseline covariates for the neighbors.
12:23And then we denote the non-overlapping subnetworks by G.
12:31So we're doing causal inference with an intervention
12:35that's not randomized in a network.
12:37So this requires numerous assumptions
12:39in order to be able to identify these causal effects.
12:43So first, as in the figure,
12:46what I alluded to is we're assuming
12:48the nearest neighbor interference set.
12:50So basically, it's only the person's exposure themselves
12:54or the exposure of their neighbors that can impact
12:58the potential outcome or affect the potential outcome.
13:01We have an exchange ability assumption that applies
13:05not only to the exposure for the person
13:07but, also, the vector of exposures for their neighbors.
13:10So we have comparability between individuals
13:15who are exposed and not exposed.
13:17This is, of course, conditional on baseline covariates.
13:20We require a positivity assumption
13:23so that there's a positive probability of exposure.
13:25Each level of the covariates, again,
13:27both for the individual and their neighbors.
13:29And we also assume if there are different versions
13:33of the community alerts, for example,
13:35they don't matter for the potential outcome.
13:37So it's really whether you just got the alert,
13:40whether you got it as a paper flyer handed to you,
13:43or you saw it as a poster,
13:45we're just assuming it's the same intervention.
13:49So with these assumptions,
13:50we can write the potential outcome index
13:52by the exposure for the individual and their neighbors.
13:55And then by consistency,
13:57the observed outcome is one of the potential outcomes
13:59corresponding to the intervention received.
14:02And there's a little bit of notation
14:05that goes into the background of defining these effects.
14:09But long story short,
14:10we define the average potential outcomes
14:13using a Bernoulli allocation strategy,
14:15which is why those, when we define the spillover effect,
14:19it's a wide bar.
14:20And then what this effect is,
14:23is it's comparing the average potential outcome
14:25of unexposed individuals
14:27under two different allocation strategies.
14:30So that's the spillover effect
14:32that is in the first paper that we worked on.
14:35And then now when we're doing the power
14:37and sample size stuff,
14:38this is, basically, the parameter of interest.
14:48In the first paper, there's two different estimators.
14:51To get started with this study design stuff,
14:53we're looking at the second IPW estimator,
14:56which uses a generalized propensity score
14:59extending work in Laura's paper from 2021
15:02from a stratified estimator
15:05to an inverse probability weighted estimator.
15:07And we actually made the decision
15:08to start with this one first,
15:10because in the simulations of the first paper,
15:13it actually had slightly better finite sample performance.
15:16And then in actual application,
15:18we were able to add more covariates
15:20to this model to control for confounding.
15:22So we decided to start here.
15:23We'll also look at IPW-1 as a different estimator
15:27for the study design stuff.
15:28But we decided to start with IPW-2.
15:32And IPW-2, what this does is it uses
15:37a stratified interference assumption.
15:39So it looks at,
15:42instead of looking at the vector
15:44of exposures of the neighbors,
15:46it looks at SI which is the number
15:47of your neighbors that were exposed.
15:50Then, there's also a reducible propensity score assumption,
15:54which allows us to factor that generalized propensity score
15:57into a propensity score for the individual
16:02and then a propensity score
16:03for the neighbor's conditional on the individual.
16:08I may have just mixed that up,
16:09but it's on the next slide.
16:11Yeah, this is the neighbor's conditional on the individual
16:13and then the individual conditional on their covariates.
16:17Okay, got it right (laughs).
16:21So then this estimator looks like this.
16:25And then just to kind of break apart what's going on here,
16:29so it's an inverse probability weighted estimator
16:31where we have this generalized propensity score,
16:34where we have the individual exposure
16:35following a Bernoulli distribution
16:38with a certain probability.
16:39And then the SI variable,
16:41the number of the neighbors exposed,
16:43following a binomial distribution.
16:45And then with that reducible propensity score assumption,
16:48we can factor,
16:50one approach is to factor it this way.
16:52And then we can use these forms
16:54to estimate the propensity score.
16:58And then we still have this pi term here,
17:00because we're standardizing
17:01to a certain allocation strategy.
17:03So we're thinking about specific policies here
17:06when defining the counterfactuals.
17:08<v Donna>Ashley, I have a question.</v>
17:09The very first equation where you have Y at IPW-2,
17:16open paren zero comma alpha one.
17:19What does the zero mean?
17:20<v ->That means that the individual...</v>
17:22So A refers to the exposure for the individual.
17:26So it means the individual is not exposed,
17:30possibly contrary to facts.
17:32So they're all counterfactuals,
17:33but the individual themselves is not exposed.
17:35<v Donna>They're not directly exposed.</v>
17:37<v ->I don't like the words, "Directly exposed."</v>
17:40So in my mind, it's like we're either exposed or we're not.
17:43I don't know, it cleans it up in my mind a little bit,
17:45but I know what you're saying.
17:46So the individual themselves did not receive the...
17:48Let's make it in the context of the problem.
17:50Individual themselves did not receive the community alert
17:53from the TRIP investigative staff.
17:55<v Donna>Okay.</v>
17:56<v ->They may have gotten it secondhand,</v>
17:57which is the whole thing we're trying to estimate.
18:00So they didn't get it from the investigators,
18:02but then their neighbors,
18:05so these orange folks, alpha output percent of them,
18:09a certain percentage of them received the alert.
18:12So maybe we're interested in if 75% of your neighbors
18:16were alerted versus just 20%.
18:20And then there's sort of some practical considerations
18:23that I try to follow in our work.
18:25So we actually look at the distribution
18:27of coverage of treatment for the neighbors,
18:29and we only wanna be estimating effects
18:31sort of within the range of what we're seeing.
18:33So say 20% to maybe 60% were alerted
18:38and we have a lot of data there,
18:39then we could do contrast
18:41for those alpha levels in the data.
18:44Maybe some people feel more comfortable
18:46going out of the range of data,
18:47but I like to know we have information there.
18:50'Cause I think a lot of the times,
18:51it'll give you an estimate,
18:52but it feels better knowing we have this many neighbors,
18:55neighborhoods that had this type of exposure.
19:00Does that make sense? <v ->Yeah.</v>
19:02It does, so I don't agree with last thing.
19:04(laughing)
19:05<v ->Okay.</v>
19:08We all have different preferences I guess (laughs).
19:12<v Donna>I mean, yeah, you take that</v>
19:13to its logical extreme,
19:14I would say that it (indistinct) having a simple regression.
19:17You would have to observe X at every single value.
19:21<v ->Not every single value, but just the range.</v>
19:24So say that it stops at six-
19:25<v ->You don't wanna-</v> <v ->Say it stops at-</v>
19:26(drowned out).
19:27<v ->Yeah, yeah, say it stops at 60%,</v>
19:30and then we're trying to estimate 95% coverage.
19:32It almost feels too far out.
19:34<v Donna>So you don't wanna extrapolate,</v>
19:35but you're willing to interpolate.
19:37<v ->Yeah, yep.</v> <v ->Okay, I thought you</v>
19:38were saying you weren't willing to interpolate.
19:40<v ->No, then the coverage levels,</v>
19:41if you look at the distribution,
19:42it kind of bumps around and there's some that are missing.
19:44But I'm okay going over that range of the data, but-
19:48<v Donna>Then I do.</v>
19:49<v ->Okay, that's good.</v>
19:50<v Colleague>I mean, you can still do it,</v>
19:51people do it like to extrapolate,
19:53but you know that the (indistinct) we'll get
19:55is gonna be higher, right?
19:56'Cause you don't have data there.
19:58<v ->Yep.</v>
20:01That's a little digression
20:02from where I wanted to go with the slides,
20:04but it's still interesting (laughs).
20:06<v Donna>Ashley, can ask you a question about the,</v>
20:08so (indistinct) design IPW-1,
20:10but you said that you weren't able
20:13to include more covariates (indistinct).
20:16<v ->In the TRIP data.</v>
20:17<v Donna>And what (indistinct)?</v>
20:19<v ->So I think it has to do with,</v>
20:20so just to say it's not really even on this slide,
20:22but IPW-1 uses a generalized logit model
20:26to estimate the propensity score.
20:28And basically, that thing's kind of a bugger.
20:31It's pretty sensitive it.
20:32It doesn't...
20:33Linear mixed models tend to do pretty well,
20:36but these ones with the logit link
20:38I find in practice they can be,
20:41they run into these convergence issues.
20:45And then this one that extended Laura's estimator,
20:48in practice at least,
20:49we haven't run it in hundreds of data sets or anything,
20:51but the few that we have,
20:53we tend to be able to add more covariates.
20:55And because the nonrandomized intervention,
20:57that just seems like the right thing to do,
20:59because we want better control for confounding.
21:03<v Donna>Thanks.</v>
21:04<v ->Yours is winning.</v>
21:05(laughing)
21:06(indistinct)
21:08At least with our team recently.
21:12And that's not to say IPW-1...
21:14It's a great estimator, as well.
21:16It has some nice properties,
21:17but there's just sort of this practical issue
21:19of the generalized logit model.
21:23<v Donna>Yeah, the benefit of that one, though,</v>
21:24is that you don't have to assume
21:26the stratified interference.
21:27<v ->Right, you don't have to assume stratified interference,</v>
21:29and then we don't have to make
21:30this reducible propensity score assumption.
21:32So pros and cons, right?
21:37Yeah, and then it's interesting
21:38to think about what are our practical recommendations
21:40when folks have a menu of estimators to choose from.
21:43What do we tell folks to do in their substantive papers?
21:48Do we ask them to check both?
21:50I think that's what I've been advising for now,
21:52just as it's one is your main analysis,
21:54one is for sensitivity analysis,
21:56but I think that's another open question.
22:01So I spared us all the notation on this slide,
22:03but just to say the variance estimation
22:07is used in the study design issue.
22:09So we use M estimation here.
22:11And then to do M estimation,
22:13we're using the union of the connected subnetworks
22:17to break up the graph.
22:22But at the same time,
22:23we also preserve the underlying connection.
22:26So we maintained that nearest neighbor structure
22:29when calculating the variance.
22:31And then in the simulation study,
22:33we found that accounting for that
22:36as compared to just doing complete partial interference
22:39was more efficient.
22:41So the complete partial interference
22:43would be you would assume
22:45the entire component is the interference set
22:48versus, here, we maintain that the neighbors
22:50of the interference set.
22:51But then we still leverage
22:53the components as independent units,
22:55because it's required for M estimation.
23:01Okay.
23:02So that was all the background to build up to (laughs)
23:06(indistinct) to do study design in these networks
23:10with these particular methods
23:11that have been developed over the recent years.
23:16So basically, I don't know.
23:17I don't think I need to sell it to this group,
23:19but to understand how features
23:23of the study design impact the power is important.
23:26As far as we can tell,
23:27this hasn't been a real emphasis in network-based studies,
23:32particularly in the area of substance use in HIV.
23:34Folks kind of get the sample that they can get.
23:37It's a ton of work,
23:38so they're not thinking about designing them
23:40like a cluster randomized trial.
23:43Or even in observational studies,
23:45there's some proposals where they'll wanna see
23:48at least power calculations to show
23:50that there's a large enough sample size.
23:53So there are approaches coming out
23:55in the statistics literature.
23:57Of course, there are some older ones about overall effects
24:00in cluster randomized trials.
24:02I just put one reference there,
24:03but that's a very large literature.
24:05But then getting into the causal spillover effects,
24:08there are some papers by Baird et al.
24:11looking at a two-stage randomized design.
24:13And I found another paper by Sinclair in 2012
24:16that was a multi-level randomized design,
24:18which kind of had the similar flavor
24:20to a cluster randomized design,
24:22but it was from the econ literature,
24:23so they had a slightly different name for it.
24:26However, when we're doing a sociometric network study,
24:30these larger network-based studies,
24:33it would be difficult to implement
24:35a two-stage randomized design
24:37just because of how folks are recruited.
24:40And then we're also interested in being able to evaluate
24:42interventions that are not randomized.
24:45So we wanna have adequately powered studies
24:48to evaluate these interventions.
24:55So this overall paper,
24:57we're gonna start off with simulation studies,
25:00thinking about the varying the number of components
25:03and the number of nodes,
25:05and then changing different parameters
25:07in the network including effect size,
25:10features of the network like degree,
25:13intracluster correlation,
25:15and see how these impact the power.
25:17And then lastly, trying to work on driving
25:20an expression for the minimal detectable effect
25:24as well as expressions for sample size.
25:30So the ongoing work I'll be presenting today
25:32are focusing on mostly on the first aim,
25:35so simulation study to detect spillover effects,
25:37varying the number of components
25:39for the number of nodes in the network.
25:41And then as the next step for this,
25:44we have some initial results for a wall test statistic
25:47and showing that that test statistic
25:49is normally distributed.
25:51So just an overview of how we've generated some of the data.
25:54We started off by generating
25:55a network with certain features.
25:57Then on that network, we simulate random variables
25:59and then generate the potential outcomes
26:03and then, subsequently, the observed outcomes.
26:05In each data set, we estimate the spillover effects using,
26:08in this case we used IPW-2 and confidence intervals.
26:12And then we calculate the power
26:13in the empirical coverage probability.
26:18(coughs)
26:20Sip of water.
26:25So in the first setting,
26:27we're looking to see if power varies by components,
26:30which I thought was a good place to start,
26:32because our M estimation,
26:34the effective sample size is M,
26:36or the number of components.
26:38So we had two different approaches.
26:40We keep the component size the same
26:42and increase the number of components,
26:44or we fix the number of nodes
26:46and then increase the number of components.
26:48So the first one is really how the statistics
26:51of the M estimation are working.
26:53And the second one I think is empirically interesting.
26:55I don't think it's as founded in the theory
26:58of the estimation, just to be clear,
27:02but nonetheless, I think interesting to look at.
27:04<v Donna>Could you go back a second?</v>
27:06<v ->Yeah.</v> <v ->So what did</v>
27:07the motivating study have in terms
27:09of the number of components and the number of nodes?
27:13<v ->The motivating study has 10 components, 216 nodes.</v>
27:18And then what we did in our first paper
27:20was to try to increase the number of components.
27:21We tried to break up that largest connected component using
27:26network science community detection methods, which is okay.
27:30I don't think it's the most satisfying answer.
27:33And then once we do the community detection,
27:34then we had 20 components.
27:36So the actual motivating data set
27:39is really 10 to 20 components, about 216 individuals.
27:44<v Donna>Okay, so nodes and individuals are the same thing?</v>
27:47<v ->Yep, sorry, I may have probably using those-</v>
27:50<v ->No, that's okay.</v> <v ->Individual, yeah.</v>
27:51216 nodes, yep.
27:54<v Donna>Ashley, can ask you another question?</v>
27:55<v ->Yeah.</v>
27:56<v Donna>So is that in general?</v>
27:57And you see that treatment, right?
28:00Like in the previous slide said (indistinct) treatment
28:02and potential outcomes I guess, right?
28:05(indistinct) treatment.
28:08So do you do that (indistinct) thing of observational study
28:10like simulating the treatment from propensity score?
28:14<v ->Yeah, so we fit the propensity score in the TRIP data,</v>
28:19and then you'll see in a couple slides
28:20I have the actual values of the parameters that we used.
28:23And then we, obviously, can't fit a model to,
28:26we just fit a model to the observed outcome
28:28to try to get the betas for the model,
28:30the potential outcome out of the TRIP.
28:33Again, the motivating data.
28:35Yep, good question.
28:36And this is like a roadmap.
28:38I'm gonna actually go through
28:39a lot of detail for each one now (laughs).
28:42<v Vin>Sorry, I also have a question.</v>
28:43So in the simulation for component,
28:45and there's nobody in that component received
28:47the treatment in the simulations,
28:50is that possible?
28:51<v ->Yep, that could happen.</v>
28:52<v Vin>And then like for that component,</v>
28:54is that excluded from this,
28:56because perhaps it violate
28:58the positivity assumption I guess?
29:00<v ->Well, it depends on...</v>
29:01They would come into play
29:03if you're interested in a coverage of 0%.
29:07Right, so it depends on what your...
29:09So that would be if you're interested in estimating
29:11Y of zero with alpha equals 0%.
29:17It's like a pure control group.
29:19So it would be that case.
29:23Yep.
29:25Yeah, so we didn't exclude anyone on that case,
29:28but in another paper, we did exclude...
29:30We were actually looking at HIV seroconversion
29:32in the other paper,
29:33and we did an analysis by components.
29:35So if the component had no HIV-infected individuals
29:40at baseline and the components
29:42in the study were not allowed to change,
29:44then that was like a,
29:46I forget what the epi term for it,
29:47there's no way anyone can get infected.
29:50So it was a perfectly protected component.
29:52So we excluded those.
29:54So we wanted components in that study that were at risk.
29:57So we had to have at least one individual
30:01in the component with HIV at baseline.]
30:03so there was some chance that it could spread.
30:06<v Colleague>But it seems that even if you don't exclude</v>
30:08these components where no one is treated,
30:10the (indistinct) weights will be very low, right?
30:12<v ->Yep, they'll just get downgraded for the treatment thing.</v>
30:16But then I guess it might made
30:17my mind go to thinking about,
30:20particularly for HIV seroconversion,
30:22if you have a case where there is a really small,
30:25maybe it's one of these little components,
30:27and it's just these two people,
30:30like the two, like a little dyad, neither have HIV.
30:35I guess then, if you're assuming
30:36that there's no other edges into there,
30:39then there can be no events.
30:41So thinking about like, you know.
30:43I think it makes sense to exclude that,
30:44because they're not at risk as a group, as a dyad.
30:51And maybe that's another tangent (laughs).
30:55Okay, so approach one.
30:58We have this regular connected network with degree four,
31:01which is approximately the observed degree
31:03in the TRIP network.
31:05And then we sampled nodes from a place on 10 distribution.
31:09And then we repeat this
31:10and then combine the M subnetworks to form the full network.
31:15So this is the first case where we,
31:19yeah, we have the number,
31:21we keep the component size the same,
31:22and then we're increasing the number of components.
31:26Alternatively for approach two,
31:28we have the same four-degree network.
31:31We have M components but for a fixed set of number of nodes,
31:37and then we generate the connected network,
31:39and then, again, combine the subnetworks.
31:45So in either case, there's sort these two scenarios
31:47where we're generating the network,
31:50and then we generate the potential outcomes
31:53and the observed outcomes.
31:55We assign random effects to induce
31:57correlation within each component,
31:59and then simulate...
32:01We just have one binary covariate for now.
32:03Of course, we wanna extend this
32:04to multiple covariates, continuous covariates.
32:08And then we generate the potential outcome
32:09using this formula here
32:11where the values of the parameters
32:13are from an estimated model in the TRIP data.
32:16And then we generate the treatment
32:18or exposure using this per newly random variable.
32:21Again, with the parameter values
32:23from a model in the TRIP data.
32:27And then depending on what the value of A is,
32:30and A and I is,
32:32then we can pull off the observed outcome
32:35from the vector of potential outcomes for each individual.
32:40<v Donna>I have a question.</v>
32:41<v ->Yep.</v>
32:43<v Donna>So earlier, you said you were</v>
32:45only allowing spillover between first-degree,
32:49nodes that were connected by first-degree.
32:52<v ->Mn-hm.</v>
32:54<v Donna>But then if you're same kind of variable</v>
32:58to describe spillovers,
33:00the proportion of nodes,
33:03or the proportion of, I don't know what you call them,
33:06participants in a component that are exposed,
33:12then it's ignoring that.
33:14<v ->So yeah, maybe I was mixing papers.</v>
33:16In this paper, it's really the proportion
33:20of the neighbors that are treated.
33:21So you have each person.
33:22It's the proportion of their neighbors that are treated
33:25that's going to define their potential outcome.
33:27<v Donna>That has to be a first-degree neighbors-</v>
33:29<v ->In this-</v> <v ->Anybody (indistinct).</v>
33:31<v ->In this paper you, we could extend this to second,</v>
33:34third-degree, different interference structures.
33:37But in this particular paper, that's how it's defined.
33:39But I think what I was doing,
33:40I was actually giving an example from another paper
33:42where we assume partial interference by component.
33:45In this paper, it's the nearest neighbor interference.
33:48So the potential outcomes depend on the number
33:51of the neighbors that are treated
33:53out of the total, the proportion.
33:56<v Donna>One other question.</v>
33:57So at this point, five squared between subjects' variance,
34:03what kind have ICC does that give, do you know?
34:06<v ->I don't remember off the top of my head,</v>
34:07but we can check.
34:10And I'm trying to remember.
34:11I think we got that from looking at the TRIP data,
34:14but I'd have to go back and check how we landed on that.
34:19But yeah, it's a good idea to check.
34:27And then we estimate the spillover effect
34:30and the corresponding 95% confidence interval
34:33in each data set using the methods
34:36that were presented earlier.
34:37And then we calculate the power
34:39in the empirical coverage probability.
34:41We simulated across 500 data sets,
34:43and we're still working on driving
34:46and evaluating the test statistic.
34:47So for now, we just use the confidence interval
34:50to see if the null value is in the confidence interval
34:52or not as a way to assess the power.
34:55And then just as a sanity check,
34:57we checked it in the first paper,
34:58but we also look at the empirical coverage probability
35:01just to make sure the estimators are behaving as we expect.
35:05<v Donna>So is there a test statistic?</v>
35:08<v ->It's derived and we're looking</v>
35:09at the normality of it first, assessing it.
35:12And then the next step, which we ran outta time
35:14to do for today is we wanna redo these simulations.
35:16So that's step four.
35:19Sub two is based on the test statistic,
35:22not the confidence interval.
35:24I mean, they should largely agree,
35:25but what makes me nervous is it's a confidence interval
35:28for a estimation of two parameters.
35:32And sometimes in that case, the confidence interval
35:34may not always agree with the test statistics.
35:36So it should typically, but to be...
35:40I think it's correct.
35:42It's more appropriate to be using the test statistic.
35:46<v Vin>The confidence interval or the indirect effect?</v>
35:49<v ->Yeah.</v>
35:50<v Vin>So you will...</v>
35:52I mean, I think there are...
35:53They should agree, right?
35:55<v ->But I worry about-</v> <v ->(drowned out) the null</v>
35:56distribution for the test statistic.
35:59<v ->Yeah.</v>
36:00<v Donna>That's the main thing.</v>
36:01If it's a wall test statistic,
36:04then we use the null distribution,
36:07which you can't do (indistinct) have
36:09different statistical (drowned out).
36:09<v Vin>Yeah, I see, yeah.</v>
36:13<v ->So I think this is a good way</v>
36:14that we got started as we're working on...
36:17We first wanna evaluate we got the test statistic correct
36:19before we blow through all this.
36:22<v Donna>The other thing is that the robust standard errors</v>
36:24are problematic in smaller samples, too.
36:27And there are all these different fixes to it.
36:29So I don't know if the test statistic
36:31would also have that problem.
36:33<v ->Yeah, potentially.</v> <v ->We've mostly seen it</v>
36:35about confidence intervals.
36:36Have you seen it about test statistics?
36:39<v ->Yeah.</v> <v ->With the robust</v>
36:40standardized- <v ->The same thing (indistinct).</v>
36:42They would agree,
36:43because we're always talking about,
36:45assuming normality, the variance doesn't change
36:48across the hypothesis (indistinct) space.
36:53But then, CI here,
36:54you're refer to the CI of the impact (indistinct).
36:57<v ->Correct, yeah.</v>
36:58<v Vin>And that's already accounting for the covariance.</v>
37:01The two potential outcome estimates.
37:05So if normality holds, they would agree.
37:09If you can derive the normality of the estimator,
37:13then the CI I think (indistinct).
37:15<v ->Yeah, so we have the normality of the estimator already,</v>
37:17and then in a couple slides,
37:18I'll show what we have for the test statistic.
37:20And I have some preliminary results showing
37:22that it looks approximately normal,
37:23but I don't think it's quite ready for prime time (laughs).
37:29<v Donna>So then that error is reliant</v>
37:30on M estimation, right?
37:31<v ->Correct.</v>
37:32Yep. (drowned out)
37:33Yeah, and that's the AOS paper.
37:35All the M estimations worked out for this.
37:38The IPW-2, for example.
37:39<v ->Right.</v> <v ->Yep.</v>
37:44In our first results, we actually had a,
37:49this is a smaller, yep, smaller effect size.
37:50The effect size is -0.1,
37:51and this is on the different scale.
37:53So the smaller effect size,
37:55the power was actually surprisingly low.
37:58Even as we increased the number of components,
38:00it didn't even reach 40%.
38:02Although, the coverage of the estimator was approximately
38:05where we'd expect it to be performing.
38:08So the next thing we looked at was changing the effect size,
38:12making the effect size,
38:13in this case, actually making it larger
38:15and seeing how that impacts the power.
38:19So we basically picked...
38:22There's the supplemental slide if anyone has questions,
38:24but we have the original effect size,
38:27the largest effect size that we could obtain
38:29in this particular simulation setting,
38:32and then something in between.
38:34So we see as we increase the effect size
38:36that the largest effect size is -0.42.
38:40That actually achieves 80% power.
38:42Excuse me.
38:43A little bit, actually, it's right around 20 components.
38:47But then as we see, as the effect size gets smaller,
38:50it's harder for it to achieve that 80% power level.
38:56So I thought that was kinda interesting.
38:58And then approach two.
39:01We wanted to see changing the number of components
39:05for a fixed number of nodes.
39:07So here, we fixed a hundred, 300, 600, or a thousand nodes,
39:12and we see it doesn't really matter so much
39:14how many components are in the problem,
39:15which was a little bit surprising to me.
39:17So this is preliminary results.
39:19I'm not sure if this is gonna hold up as we keep
39:21pulling on the threads here, just as a disclaimer.
39:25But we see that with a hundred nodes,
39:30it doesn't achieve the appropriate power.
39:34Once we get up to 300 nodes
39:39and a thousand, sorry, 600 nodes,
39:41and then a thousand nodes,
39:42we see it's at 80% power or higher.
39:46<v Donna>So just to say cluster randomized designs,</v>
39:50in certain structures, you can find that no matter how much,
39:54like if you say the components are like the clusters,
39:58and then the nodes are like
39:59the number of people in that cluster,
40:00you can have a situation where,
40:02for a fixed number of components,
40:04no matter how many people you put into each component,
40:11you have an asymptote.
40:12Never get to the power you want.
40:14The only way to get to it is by increasing components.
40:18But you're finding an asymptote with components.
40:22<v ->Yeah, but here this is the number of people overall</v>
40:26in the whole study, not per component.
40:29So this was a little bit surprising
40:31that it seems to be a bigger driver
40:34is just the number of people enrolled in the network
40:36regardless of the number of components.
40:39<v Donna>So you fixed the total number of units,</v>
40:41and essentially you have them divided
40:45into different numbers of components.
40:47<v ->Yep.</v>
40:48<v Donna>And you're seeing that it doesn't change how many</v>
40:50components (indistinct). <v ->Yeah,</v>
40:51which I also acknowledge that's an artificial thing
40:53that probably would never happen in the real world, right?
40:56Because say we enroll 600 people,
40:59we can't force them into different sets
41:02of partners to get the statistics to work.
41:04So this is a very theoretical thought exercise.
41:08<v Vin>I also wonder if it's a function</v>
41:10of the residual correlation you were specifying
41:12in the simulation study.
41:13<v ->The random effect?</v>
41:15<v Donna>Yeah.</v>
41:17<v ->Interesting.</v> <v ->'Cause that'll definitely</v>
41:19affect the effect sample size, right?
41:20<v ->Mn-hm.</v> <v ->Yeah.</v>
41:21<v Vin>So maybe it's relatively small</v>
41:23and doesn't really matter in this simulation,
41:25and that could be-
41:26<v ->Oh, so if we-</v> <v ->a possibility.</v>
41:27<v ->If we increase the amount of correlation in the component,</v>
41:30this story could be very different.
41:32<v Donna>It might but might not.</v>
41:33So that's something to check maybe.
41:35<v ->Yep.</v>
41:36That's why, yeah, another disclaimer.
41:38This is very preliminary.
41:39And I think even at the end I remind us
41:41that needs more investigation.
41:43<v Vin>Right, but it's cool,</v>
41:44because I guess the cost of randomized design
41:46is sort of a limiting design in some sense.
41:49They probably would not have
41:50the same outputting (indistinct) anyways.
41:54That's good to-
41:55<v Colleague>What's the minimum number</v>
41:56of components you could use?
42:01<v ->Looking at the dots, it looks like she went</v>
42:02all the way down to maybe about two,
42:05but it depends on, looks like there's a...
42:07Depending on which number of nodes you have,
42:10she looks at different numbers of components,
42:12because when Ke generated it, it's from here.
42:19Yeah, the cluster size is the number of nodes
42:21divided by the number of components.
42:24<v Colleague>So I'm wondering, with these few components</v>
42:27(indistinct) specified?
42:31<v ->Yeah, we should.</v>
42:33Based on other results, it should be.
42:35We start to see good coverage around 50 components.
42:39<v Colleague>That's what I see.</v>
42:40<v ->Yeah.</v>
42:41<v Donna>But I think it would depend</v>
42:42on if the cluster randomized designs
42:43or anything like this would also depend on the ICC.
42:47Because if that ICC is zero,
42:50then you could have one component (indistinct)
42:52is equivalent to, again, a noncluster design.
42:56<v ->Yeah.</v> <v ->Yep.</v>
43:02Okay, so here's the preliminary results
43:03for the wall test statistic.
43:05So I changed the notation a little bit here
43:08just to make this easier to read.
43:09So uber expressed, the estimator is this theta hat.
43:12Based on the AOS paper, we have that this will converge
43:15in distribution to a multivariate normal.
43:17And then we actually have an estimator
43:20of the variance in that paper, as well.
43:27Yeah, and then building a wall test statistic
43:30from that parameter, we have a form that looks like this.
43:34And then actually in the AOS paper,
43:35just a minor note is the normalizing constant
43:38of one over M is tucked into the sigma term.
43:42I had to go back and double check that yesterday.
43:44So then we have a wall test statistic
43:46that's a form like this.
43:47It should follow a normal distribution.
43:54So then we started looking at this
43:56empirically across the simulations.
43:59And this looks, to my eye, to be approximately normal.
44:02And what we're working on now,
44:04the results aren't quite ready,
44:05is actually doing a test for a normality
44:08like a Kolmogorov-Smirnov test
44:10to test for normality across these different scenarios.
44:14So we're working on those results now,
44:16and that's something we wanted to confirm
44:18before we fold it into the rest of the simulations.
44:23<v Donna>That test has very low power (indistinct).</v>
44:25<v ->Low power?</v>
44:27Yeah, and then there's other tests too,
44:29but some of 'em are-
44:30<v Donna>I think they all have low power.</v>
44:32<v ->Yeah.</v>
44:34So if anyone has any other thoughts about that,
44:36about how to evaluate.
44:37Like we derived this, but how do we-
44:40<v Donna>In some sense, your simulations will tell you,</v>
44:42because the property's relying
44:45on that (indistinct) normality.
44:47And so if you don't have 5% type one error,
44:51and then you know (indistinct),
44:53you now have...
44:55I guess that would be the main thing
44:56would 5% type one error.
45:02<v Vin>I think maybe another way to visualize</v>
45:04that is to try to increase the M,
45:09and then actually gradually see if that looks more normal.
45:12I guess that's just-
45:13<v ->Yep.</v>
45:15<v Vin>And I think people tend to do something like that.</v>
45:18When they check convergence rate,
45:20they would probably do something like plot
45:23the results along with the sample size
45:25and see how well they converge.
45:28And then the limiting end would correspond
45:29to the perfect results,
45:31and then you'll see more of a bell curve shape.
45:33But I think right now, looking at these 10 iterations,
45:36it's a little spiky sometimes.
45:37<v ->Yeah, and it doesn't seem...</v>
45:38Like this one down in the far corner
45:40is already a hundred components,
45:41and it doesn't really seem like it's getting too much...
45:45I mean, these are at least, yeah.
45:48There's not a trend of constant-
45:49<v Vin>(drowned out) specified model, right?</v>
45:51It's definitely correctly specified
45:54propensity score models and everything-
45:55<v ->Should be, but we can double check.</v>
45:57<v Vin>So the simulation models</v>
45:59are basically identical to the models (drowned out).
46:01<v ->Yep.</v>
46:04<v Donna>But the spiking,</v>
46:05this also just depends arbitrarily on the event size?
46:08<v Vin>Yeah, that's right.</v>
46:09<v Donna>So you could make it look very spiky</v>
46:11if you have bigger events.
46:13<v Vin>Right, and (indistinct)</v>
46:14you could even Q-Q plot events sometimes.
46:15<v ->Yeah.</v> <v ->Yep.</v>
46:17(drowned out)
46:18Vin says Q-Q plot.
46:21(Donna laughs)
46:22(indistinct)
46:23Okay.
46:25So that's the direction where we're heading in with this.
46:28From simulation two, we also have some preliminary results.
46:31So this is fixing the number of components
46:33and varying the number of nodes.
46:36In here, we see power increases with the number of nodes,
46:41but we don't see any variation
46:44between the number of components.
46:46So the power is plotted against the number of nodes
46:48and each line represents a different number of components,
46:52which I think kind of echoes the other results
46:54that we were seeing earlier in the talk.
47:01<v Donna>That's the opposite of cluster randomized trials,</v>
47:04'cause you're getting a lot of power by increasing nodes,
47:08and you're barely seeing any impact of components.
47:10Whereas with cluster randomized trials,
47:12it's all in the clusters,
47:14and it doesn't matter that much after a relatively
47:17small number of people within cluster.
47:19<v Vin>Right.</v>
47:20<v ->Which this is still very surprising to me,</v>
47:22because the M estimation,
47:23the effective sample size is the number of components.
47:26So yeah, this is pretty surprising.
47:29<v Vin>(indistinct) interested to really check</v>
47:30how that changes or not changes with the-
47:32<v ->The IC?</v> <v ->Yeah.</v>
47:34<v ->Change the. (drowned out)</v>
47:36Yes. <v ->Yeah.</v>
47:40<v Donna>What is the outcome?</v>
47:41Like sort of this idea in this simulation,
47:43what were you thinking of?
47:45Is it a binary or a continuous?
47:47<v ->Binary HIV risk behavior.</v>
47:51So yeah, whether the person reports,
47:52specifically injection risk behavior.
47:57And then the intervention,
47:58all the effects that we're looking at are negative,
48:00because the intervention should be reducing the behavior.
48:06<v Donna>Yeah, so with an ICC of 0.5 times 1 minus 0.5,</v>
48:10that's the maximum amount of binomial variants.
48:13So this should be...
48:14The simulation is done under a very high ICC.
48:20Like it might be the highest possible with binary-
48:22<v ->For that binary data, yep.</v>
48:27Okay, so zooming out a little bit,
48:29thinking about network study design in practice,
48:33some of the things that might come out of this work.
48:35So there are definitely features that can be planned
48:37when designing the study, right?
48:39So we could increase the number of components
48:41by having multiple sites or multiple cities
48:44contributing to one particular study.
48:49Although, that's, you know, can be very costly,
48:50very time consuming.
48:52We can, of course, increase more individuals recruited,
48:55but that depends on who,
48:57'cause it's a network study, who are their contacts,
48:59if they don't have contacts
49:00to kind of come to an end in the network.
49:03We can try to ensure distance between components some way.
49:06And I put distance in quotes,
49:08'cause that could mean all sorts of things,
49:10not just geographical distance.
49:12And then we have some control
49:13over the intervention treatment.
49:15What proportion do we want to expose to the intervention?
49:19And then I was thinking about features
49:20that likely cannot be planned,
49:22'cause maybe someone's really creative.
49:24And we could think about ways
49:25that these could be manipulated.
49:28So once we have a given set of individuals,
49:31pretty sure we can't force them into different components,
49:34unless we're doing, actually now that's coming to my mind,
49:37unless we're doing a network intervention
49:39that's meant to change the edges.
49:41Then, we would have some control
49:42over who's interacting with whom,
49:45but that's a little bit complicated,
49:46because then your structure is intertwined
49:48with your intervention.
49:51The features of the network like degree,
49:53centrality, intracluster correlation,
49:56we don't have control over those.
49:58Who's connected to whom:
49:59these are individual sexual and drug partnerships.
50:02We don't have control over that.
50:04What the effect sizes are
50:05or what the outcome prevalence is in the particular study.
50:08<v Donna>Well, you can't choose your study population,</v>
50:12though, to have certain of these characteristics.
50:15You can't change them.
50:17Let's say you could do a study
50:18of 10 different kind of places, communities,
50:21and some might be more-
50:23<v ->Different outcome prevalences or-</v>
50:25<v Donna>Yeah, or different degrees of centrality,</v>
50:27and they could have different ICCs and all of that.
50:31So if people know what's important,
50:33they could look for study populations that have the features
50:37that will maximize power of the study.
50:39<v ->Yep, that's a good point.</v>
50:42That's why I said likely,
50:43'cause I knew Donna would think of something.
50:45(laughs)
50:47(indistinct)
50:51<v Colleague>What about the propensity score?</v>
50:52You also don't have control.
50:57<v ->Yeah, I mean that's the...</v>
50:59It was non-randomized intervention.
51:01So it's what the folks are are choosing or being exposed to
51:05and then just their observed covariates.
51:08<v Donna>Oh, there's one way, just randomize them.</v>
51:12(drowned out) (laughing)
51:13In epidemiology, we always talk about this
51:15as one of the ways to control confounding,
51:18which is to choose a homogeneous population
51:20so you have no variation in the risk factors,
51:24and that lowers the amount of confounding.
51:28You might lose the ability to externally channelize,
51:31but you'll reduce confounding.
51:38<v ->Yeah, so I think there's a lot of thinking</v>
51:41and papers that need to be written for design in networks.
51:45I mean, I think in designing trials
51:47and designing cluster randomized trials,
51:48even thinking about observational studies,
51:51I think it's clear to me how you have
51:53more control over certain things.
51:55But then here, I think there's a lot of work
51:58to think about how do we take...
52:01It's just in the beginning
52:02with some of these statistical results,
52:04but how do we take these statistical findings
52:06and translate them into something that folks
52:08can actually use in study designs,
52:10grant proposals for network-based studies in public health.
52:14So I think that's a call to action
52:15to some of the folks in the room and on Zoom.
52:21So just some highlights from what we found so far.
52:23So the power for estimating spillover effects
52:25increases with more nodes or larger effect sizes.
52:30It requires, of course, more investigation
52:32like we've been discussing today.
52:34There's some things we need to look into,
52:35but the number of components may have less impact on power,
52:39but that requires looking at some additional features.
52:42When the effect size is large enough,
52:44the spillover effect has reasonable power.
52:46And then in the initial setting,
52:47that was even with only 20 components.
52:50And then just as a sanity check,
52:52we saw the empirical coverage probability
52:55was around the nominal level
52:57as we would expect from our earlier paper.
53:01So future directions.
53:02We wanna keep looking at the impact
53:04of other design parameters on the power,
53:07continue working with this test statistic
53:10and making sure it's performing as we expect,
53:13and then using it in the simulation study
53:15and working on getting a minimal detectable effect size,
53:18as well as number of individuals
53:21and/or components required for adequate power.
53:24And if we have confined closed forms,
53:26we'll have those expressions.
53:27If not, we'll have some simulation-based programs
53:29to look at this.
53:30And then we want to...
53:32We've done some kind of back-of-the-envelope things
53:34in thinking about the power that we might have had
53:36in TRIP to detect these effects
53:38but doing that more carefully and formally.
53:42And then last was sort of the issue
53:43I was talking about at the end
53:44is all of these statistical results are really interesting
53:48and exciting for folks like us,
53:50but then how do we make it practical
53:52and useful and something that individuals
53:56can use in their grant writing
53:57when getting their network based studies funded.
54:01Okay, and then this is my shameless plug.
54:03If you thought this talk was interesting,
54:06we're going to have an online workshop hosted by my group
54:10at URI on Friday, March 10th from 2:00 to 5:00.
54:13It's free and we have a star-studded lineup
54:17of speakers that'll be joining for the workshop.
54:20And I have some flyers,
54:22and I can email the flyer around, as well.
54:24<v Donna>We can circulate everything.</v>
54:26(indistinct) also.
54:28<v ->Yeah, that'd be great.</v>
54:30Yeah, so welcome everyone on the call,
54:31everyone in the room to join,
54:34and I think it'll be a really informative
54:36and interesting afternoon.
54:38And if you're interested in this methods area,
54:40it'd be a nice way to get caught up
54:42on some of the literature
54:44and start thinking about how you can use this
54:46in some of your work.
54:49So just a couple of references, as well.
54:53And I know I've been taking questions as we go along,
54:55but if there's any other questions from the audience,
54:58happy to discuss.
55:02<v Vin>So it's interesting to see that the component size</v>
55:04doesn't have a very strong effect on the power,
55:07but do you think in reality we need also consider
55:12variability in that component size?
55:14'Cause we always see a huge component.
55:16<v ->Yep, that's a really good point.</v>
55:18<v Vin>But there are a lot of very small components.</v>
55:20<v ->Yep. (drowned out)</v>
55:22Yep, great point.
55:24And particularly in these HIV risk networks,
55:28I mean, it's not like there's hundreds of them,
55:29but the handful that we have and we've been able to look at,
55:33there is a lot of variability.
55:34We have, usually, there's one giant connected component
55:37and then these smaller components.
55:39And of course, whether or not that's the real network,
55:42that's some of Laura's work, right?
55:45These smaller components may actually even be connected
55:47to the larger component,
55:48or they might be connected to each other, as well.
55:50But in this work, we assume that the network we observe
55:54is the truer known network for now
55:58just so we can look at some of these other issues.
56:00But of course, there's always the caveat
56:01that the network itself is mismeasured.
56:04<v ->'Cause they-</v> <v ->Ashley, there's a bunch</v>
56:05of things up in the chat maybe.
56:07Just to give other people a chance.
56:09<v ->Sure.</v> <v ->Some of it might have to do</v>
56:11with the beginning when we were having technical problems,
56:13but it might have some questions.
56:16<v ->"See you have some technical problems.</v>
56:17Slide's not moving."
56:18Oh, and then thanks, Gabby.
56:20Gabby's part of the URI team.
56:22She put in a link to register for the workshop.
56:25We actually,
56:27we just have a couple survey questions as you register,
56:29because what we wanna do is try to tailor
56:30the content to the folks that are showing up.
56:34So there's just a couple of quick questions,
56:35and then that's all you have to do.
56:37It's free (laughs).
56:39Just answer a little survey.
56:40And then Gabby put a link for some more details
56:44about the workshop, as well.
56:48(indistinct)
56:48<v Donna>This has gotta be our last question,</v>
56:50'cause we're down to 12.
56:51<v Vin>Yeah, just a short comment.</v>
56:52I think there's a potential to make this work more impactful
56:55is that it doesn't have to be attached to IPW-2 I think,
57:00because you're providing a simulation framework.
57:01And theoretically, one can fit other IPW estimators,
57:05certified estimators, regression based estimators,
57:07and even double robust estimators.
57:09And I would also imagine that they could have
57:12different operating characteristics,
57:14and so the impact of M and N could also,
57:19that could also be specific
57:21to not only the simulation parameters we choose,
57:26but also to the estimators we choose.
57:28I think it's an underappreciated point,
57:31but it's very important to emphasize is that the power
57:35we calculate is always gonna be based on approach.
57:38<v Donna>It's true that it's underappreciated.</v>
57:40Surprisingly, right?
57:41<v Vin>Yeah, like you could say I use the approach</v>
57:43to consider IPW-2 based power,
57:46but I think a regression based approach
57:48in terms of power would be different.
57:50It's actually very specific, too.
57:51And also, it curves to show could have some difference.
57:55<v ->Yeah, that's interesting.</v>
57:56So we can start, 'cause we have IPW-1, IPW-2 ready to go.
57:59So we could start, for this work, we could look at that.
58:02But I think maybe an idea would be to write the code.
58:05Like if we have our programs that we're gonna share for this
58:07to write it flexible enough so that the user-
58:10<v Vin>That's something people should be able to choose.</v>
58:11Or even if you have a estimate or specific program,
58:14that should be sort of emphasized and clarified.
58:17'Cause as a very simple example, if you are,
58:20like in the cluster (indistinct) literature,
58:22if you're assuming working independence
58:24and working exchangeable,
58:25the results can be very different
58:26in terms of the efficiency.
58:28And the extent to which the cluster size variation
58:33impact the study power is also specific to whether you adopt
58:37a independence working correlation or an exchangeable.
58:41So sometimes, we have a unified conclusion,
58:45but that's almost always coming from a specific estimate
58:51and cannot really be overly generalized.
58:53<v ->Yep, yeah, that's a great point.</v>
58:54Thanks, Vin.
58:56<v ->Well, this was a really interesting seminar, Ashley.</v>
58:59<v ->Thank you.</v> <v ->You presented it</v>
59:00very clearly,
59:01so we really appreciate it.
59:03Thank you so much and thanks to everybody else (indistinct).
59:06<v ->Thank you, thanks, everyone.</v>
59:13<v Donna>So go ahead and close the Zoom.</v>
59:14<v ->Sure, yeah, thanks, everyone, for joining.</v>
59:16We hope to see you at the online workshop.