# BIS Seminar: Dealing with observed and observed effect moderators wehn estimating population average treatment effects

September 22, 2020## Information

Elizabeth Stuart

Associate Dean for Education

Bloomberg PRofessor of American Health

September 22, 2020

ID5661

To CiteDCA Citation Guide

- 00:00- Maybe one or two minutes and then,
- 00:02I'll have you introduced.
- 00:03- And it's about, and so I...
- 00:05And it's gonna be more fun for me if it's a little
- 00:07interactive, as much as we can make it.
- 00:09So I won't be able to see all of you nodding and whatnot,
- 00:12but please feel free to jump in.
- 00:15And the talk's gonna be pretty non-technical.
- 00:17My goal is mostly to sort of help
- 00:19convey some of the concepts and ideas and so I will.
- 00:23Hopefully it will be a reasonable topic to do via Zoom.
- 00:30Great, so I think,
- 00:33Frank basically gave this stuff that's relevant
- 00:36on this slide.
- 00:37I do also wanna apologize, those of you guys
- 00:39who I was supposed to meet with this morning, we have a...
- 00:41My husband broke his collarbone over the weekend.
- 00:44So I've had to cancel things this morning,
- 00:47but I'm glad I'm able to still do this seminar,
- 00:51I didn't wanna,
- 00:52have to cancel that.
- 00:54So again,
- 00:56the topic is gonna be sort of this idea of external
- 00:59validity, which I think is a topic that people often
- 01:01are interested in because it's the sort of thing
- 01:04that we often think sort of qualitatively about,
- 01:06but there hasn't been a lot of work thinking about it
- 01:08quantitatively.
- 01:09So again, my goal today will be to sort of help
- 01:11give a framework for thinking about external validity
- 01:15in sort of a more formal way.
- 01:19So let's start out with the sorts of questions
- 01:22that might be relevant when you're thinking about
- 01:25external validity.
- 01:27So it might be research questions like a health insurer
- 01:30is deciding whether or not to approve some new treatment
- 01:34for back pain.
- 01:36There might be interested predicting overall population
- 01:39impacts of a broad public health media campaign.
- 01:43A physician practice might be deciding whether training
- 01:46providers in a new intervention would actually be cost
- 01:49effective given the patient population that they have.
- 01:53And that I felt like I needed to get some COVID
- 01:55example in...
- 01:57But, for example, a healthcare system,
- 01:59might wanna know whether it's sort of giving convalescent
- 02:02plasma to all of the individuals recently diagnosed
- 02:06with COVID-19 in their system, whether that would
- 02:08sort of lead to better outcomes overall.
- 02:12So all of these...
- 02:15What I'm distinguishing here or sort of trying to convey
- 02:17is that all of these reflect what I will call a population
- 02:20average treatment effect.
- 02:22So across some well-defined population,
- 02:25does some intervention work sort of on average.
- 02:28The population might be pretty narrow.
- 02:30Again, it might be the patients in one particular
- 02:33physician practice, or might be quite broad.
- 02:35It could be everyone in the State of Connecticut
- 02:38or in the entire country.
- 02:40But either way, it's a well-defined kind of population
- 02:44and we'll come back to that.
- 02:46What's really important,
- 02:48and this will sort of underlie much of the talk
- 02:50is that kind of the whole point is that there might
- 02:52be underlying treatment effect heterogeneity.
- 02:55So there might be some individuals
- 02:57for whom this treatment of interest is actually
- 02:59more effective than others.
- 03:01But what I wanna be clear about, is the goal of inference
- 03:04that I'm talking about today, is gonna be about
- 03:07this overall population average.
- 03:09So we're not trying to say like which people
- 03:11are gonna benefit more or sort of to which people
- 03:14should we give this treatment.
- 03:16It's really more a question of sort of more population
- 03:20level decisions, sort of if we have...
- 03:22If we're making a decision, that's sort of a policy
- 03:24kind of population level,
- 03:25on average is this gonna be something that makes sense.
- 03:28So I hope that distinction makes sense.
- 03:30I'm happy to come back to that.
- 03:35So again until I don't know, five or,
- 03:38well maybe now more than 10 years ago,
- 03:41there had been relatively little attention
- 03:43to the question of how well results from
- 03:46kind of well-designed studies like a randomized trial
- 03:50might carry over to a relevant target population.
- 03:53I think in much of statistics as well as fields
- 03:56like education research, public policy, even healthcare,
- 04:00there's really been a focus on randomized trials
- 04:03and getting internal validity,
- 04:05and I'll formalize this in a minute.
- 04:07But in the past 10 or so years, there's been more and more
- 04:10interest in this idea of how well can we take the results
- 04:13from a particular study and then project them
- 04:17to well-defined target population.
- 04:20And again, so today I'm gonna try to give
- 04:21sort of an overview of the thinking in this area,
- 04:24along with some of the limitations and in particular,
- 04:27the data limitations that we have in thinking about this.
- 04:33One thing I do wanna be clear about is there's a lot
- 04:36of reasons why results from randomized trials
- 04:38might not generalize.
- 04:40There's some classic examples in education
- 04:42where there are scale-up problems.
- 04:44The classic example is one I'm looking at,
- 04:50class size.
- 04:51And so, in Tennessee, they randomly assign kids
- 04:54to be in smaller versus larger classes
- 04:57and found quite large effects of smaller classes.
- 05:00But then, when the State of California tried to implement
- 05:03this, the problem is that you need a lot more teachers
- 05:06to kind of roll that out statewide.
- 05:08And so, it led actually to a different pool of teachers
- 05:11being hired.
- 05:12And so, there's sort of scale-up problems
- 05:14sometimes with the interventions and that might lead
- 05:16to different contexts or different implementation.
- 05:19Today, what I'm gonna be focusing on are differences
- 05:21between a sample and a population.
- 05:25Their difference is in sort of baseline characteristics,
- 05:28that moderate treatment effects.
- 05:29And again, I'll formalize this a little bit as we go along.
- 05:33Just as a little bit of an aside,
- 05:34but in case some of you know this field a little bit,
- 05:37just to give you a little, just...
- 05:39I wanna flag this.
- 05:40Some people might use the term transportability.
- 05:43So some of the literature in this field uses the term
- 05:46transportability.
- 05:47I tend to use generalizability.
- 05:50There's some subtle differences between the two,
- 05:52which we can come back to, but for all intents and purposes,
- 05:55like they basically can think of them interchangeably
- 05:59for now.
- 06:00I also wanna note, if any of you kind of come
- 06:02from like a survey world, these debates about
- 06:06kind of how well a particular sample reflects a target
- 06:09population are exactly, not exactly the same,
- 06:12but very similar to the debates happening in the survey
- 06:15world around non-probability samples and sort of concerns
- 06:19about,
- 06:21the use of like say online surveys and things that might not
- 06:25have a true formal sort of survey sampling design,
- 06:28and sort of some of the concerns that arise about
- 06:31generalizability.
- 06:32So there's this whole parallel literature in the survey
- 06:34world.
- 06:35Andrew Mercer has a nice summary of that.
- 06:37Again, I'm happy to talk more about that.
- 06:41Okay, any questions before I keep going?
- 06:49Okay.
- 06:49So let me formalize kind of what we're talking about
- 06:52a little bit.
- 06:53This is...
- 06:55This framework is now, 12 years old.
- 06:59Time goes quickly.
- 07:01But we're just to formalize what we're interested in.
- 07:05The goal is to estimate, again, this what I'll call
- 07:07a population average treatment effect or PATE.
- 07:10And so here,
- 07:12hopefully you're familiar with sort of potential outcomes
- 07:14and causal inference.
- 07:16But the idea is that we have some well-defined population
- 07:19of size N.
- 07:20And Y(1) is the potential outcomes, if people
- 07:24in that population receive the treatment condition
- 07:28of interest.
- 07:29Y(0) are the outcomes if they receive the control
- 07:32or comparison condition of interest.
- 07:34So here, we're just saying we're interested
- 07:35in the average effect, basically sort of the difference
- 07:40in potential outcomes, average across the population.
- 07:46We could be doing this with risk ratios
- 07:49or odds ratios or something.
- 07:51Those are a little more complicated because the math
- 07:53doesn't work as nicely.
- 07:55So for now think about it more like risk differences
- 07:57or something, if you have a binary outcome,
- 08:00the same fundamental points hold.
- 08:03So I'm not gonna tell you right now where
- 08:05the data we have came from, but imagine that we just
- 08:08have a simple estimate of this PATE,
- 08:11as the difference in means of some outcome
- 08:14between an observed treated group and an observed
- 08:16control group.
- 08:17So again, we see that there's a bunch of people
- 08:20who got treated, a bunch of people who got control,
- 08:22and we might estimate this PATE as just the simple
- 08:25difference in means between again, the treatment group
- 08:28and the control group.
- 08:29So what I wanna talk through for the next couple of minutes,
- 08:32is the bias in this sort of naive estimate of the PATE.
- 08:36So we'll call that Delta.
- 08:38So I'm being a little loose with notation here,
- 08:40but sort of the PATE that the bias essentially
- 08:43think of it as sort of the difference between
- 08:45the true population effect and our naive estimate of it.
- 08:49And what this paper did with Gary King and Kosuke Imai,
- 08:54we sort of laid how different choices of study designs
- 08:58impact the size of this bias.
- 09:01And in particular, we showed that sort of under
- 09:03some simplifying situations,
- 09:05sort of mathematical simplicity,
- 09:07you can decompose that overall bias into four pieces.
- 09:11So the two Delta S terms are what are called,
- 09:15what we call sample selection bias.
- 09:17So basically, the bias that comes in if our data sample
- 09:22is not representative of the target population
- 09:25that we care about.
- 09:27The Delta T terms are our typical sort of confounding bias.
- 09:31So bias that comes in if our treatment group is dissimilar
- 09:36from our control group.
- 09:38The X refers to the variables we observe,
- 09:40and the U refers to variables that we don't observe.
- 09:45So what we then did in the paper,
- 09:46and this is sort of what motivates a lot of this work
- 09:49is to think through these, again, the trade offs
- 09:51in these different designs.
- 09:53And essentially what we're trying to sort of point out
- 09:56is that...
- 09:59Let's go to the second row of this table first actually,
- 10:01a typical experiment.
- 10:02So a typical experiment, I would say is one where
- 10:06we kind of take whoever comes in the door,
- 10:08we kind of try to recruit people for a randomized trial,
- 10:11whether that's schools or patients or whatever it is.
- 10:16And we randomized them to treatment and control groups.
- 10:19So that is our typical randomized experiment.
- 10:22The treatment selection bias in that case is zero.
- 10:26In expectation, that's why we like randomized experiments.
- 10:29In expectation, there is no confounding
- 10:32and we get an unbiased treatment effect estimate
- 10:34for the sample at hand.
- 10:37The problem for population inference
- 10:40is that the Delta S terms might be big,
- 10:43because the people that agree to be in a randomized trial,
- 10:46might be quite different from the overall population
- 10:49that we care about.
- 10:51So in this paper, we're trying to just sort of...
- 10:53In some ways, be a little provocative and point this out
- 10:56that our standard thinking about study designs
- 10:59and sort of our prioritization of randomized trials,
- 11:03implicitly prioritizes internal validity over external
- 11:07validity.
- 11:08And in particular, if we really care about
- 11:12population effects, we really should be thinking about
- 11:15these together and trying to sort of have small
- 11:18sample selection bias and small treatment selection bias.
- 11:22So an ideal experiment would be one where we can randomly
- 11:25select people for our trial.
- 11:28Let's say we have...
- 11:30Well, actually, I'll come back to that in a second.
- 11:31Randomly select people for our trial and then randomly
- 11:34assign people to treatment or control groups.
- 11:37And in expectation, we will have zero bias in our population
- 11:41effect estimate.
- 11:42But these other designs, and again,
- 11:44like a typical experiment might end up having larger bias
- 11:47overall, than a well designed non-experimental study,
- 11:51where if we do a really good job like adjusting
- 11:54for confounders,
- 11:55it may be that well done non-experimental study
- 11:59conducted using say the electronic health records
- 12:02from a healthcare system might actually give us lower bias
- 12:06for a population effect estimate.
- 12:08Then does a non-representative small randomized trial.
- 12:12Again, a little provocative,
- 12:13but I think useful to be thinking about what is really our
- 12:17target of inference and how do we get data that is most
- 12:19relevant for that.
- 12:22I will also just as a small aside,
- 12:24maybe a little on the personal side,
- 12:26but it's been striking to me in the past two days.
- 12:28So my husband broke his collarbone over the weekend.
- 12:31And it turns out the break is one where there's a little bit
- 12:35of debate about whether you should have surgery or not.
- 12:38Although kind of recent thinking is that
- 12:39there should be surgery.
- 12:40And I was doing a PubMed search as a good statistician
- 12:44public health person whose family member
- 12:47needs medical treatment.
- 12:49And I found all these randomized trials that actually
- 12:52randomized people to get surgery or not.
- 12:55And then I came home...
- 12:56Oh, no, I didn't come home, we were home all the time.
- 12:59I asked my husband later, I was like,
- 13:00would you ever agree to be randomized?
- 13:02Like right now, we are trying to make this decision about,
- 13:05should you have surgery or not.
- 13:07And would we ever agree to be randomized?
- 13:09And he's like, no, we wouldn't.
- 13:11We're gonna go with what the physician recommends
- 13:15and what we feel is comfortable.
- 13:16And it really just hit home for me at this point that
- 13:19the people who agree to be randomized or the context
- 13:22under which we can sort of randomize
- 13:26are sometimes fairly limited.
- 13:28And again, so partly what this body of research is trying
- 13:31to do is sort of think through what are the implications
- 13:33of that when we do wanna make population inferences.
- 13:38Make sense so far?
- 13:39I can't see faces, so hopefully.
- 13:43Okay.
- 13:47So,
- 13:48I will say a lot of my work in this area has actually,
- 13:50in part been just helping or trying to raise awareness
- 13:53of thinking about external validity bias.
- 13:56So some of the research in this area has been trying
- 14:00to understand how big of a problem is this.
- 14:03If maybe people don't agree to be in randomized trials
- 14:06very often,
- 14:07but maybe that doesn't really cause bias in terms
- 14:10of our population effect estimates.
- 14:12So what I've done in a couple of papers on these
- 14:15other sides on this slide is basically trying to formalize
- 14:18this and it's pretty intuitive, but basically we show,
- 14:22and I'm not showing you the formulas here.
- 14:24But intuitively, there will be bias in a population effect
- 14:28estimate essentially if participation in the trial
- 14:33is associated with the size of the impacts.
- 14:35So in particular,
- 14:38what I'll call the external validity bias.
- 14:39So,
- 14:40those Delta S terms kind of the bias
- 14:42due to the lack of representativeness
- 14:45is a function of the variation of the probabilities
- 14:48of participating in a trial,
- 14:50variation and treatment effects,
- 14:52and then the correlation between those things.
- 14:54So if constant...
- 14:56If we have treat constant treatment effects
- 14:58or the treatment effect is zero
- 14:59or is two for everyone, there's gonna be no external
- 15:02validity bias.
- 15:03It doesn't matter who is in our study.
- 15:06Or if there...
- 15:08If everyone has an equal probability of participating
- 15:10in the study, we really do have a nice random selection,
- 15:14then again, there's gonna be no external validity bias.
- 15:17Or if the factors that influence whether or not you
- 15:20participate in the study are independent of the factors
- 15:23that moderate treatment effects,
- 15:25again, there'll be no external validity bias.
- 15:29The problem is that we often have very limited information
- 15:32about these pieces.
- 15:34We, as a field, I think medicine, public health, education,
- 15:38all the fields I worked in, there has not been much
- 15:41attention paid to these processes of how we actually
- 15:44enroll people in studies.
- 15:46And so it's hard to know kind of what factors relate
- 15:49to those and if those then also moderate treatment effects.
- 15:53(phone ringing)
- 15:54Oops, sorry.
- 15:55Incoming phone call, which I will ignore.
- 15:58So,
- 15:59there has been...
- 16:01Sorry.
- 16:03There has been a little bit of work trying to document this
- 16:05in real data and find empirical evidence on these sizes.
- 16:11The problem, and sorry, some of the...
- 16:13Some of you might...
- 16:14If any of you are familiar with the, like,
- 16:16within what it's called the within study comparison
- 16:18literature.
- 16:19So there's this whole literature on non-experimental studies
- 16:23that sort of try to estimate the bias due to non-random
- 16:28treatment assignment.
- 16:30This is sort of analogous to that.
- 16:32But the problem here is that what you need is you need
- 16:34an accurate estimate of the impact in the population.
- 16:37And then you also need sort of estimates of the impact
- 16:40in samples that are sort of obtained in kind of typical
- 16:44ways.
- 16:45So that's actually really hard to do.
- 16:47So I'll just briefly talk through two examples.
- 16:49And if any of you have data examples that you think might
- 16:52sort of be useful for generating evidence,
- 16:55that would be incredibly useful.
- 16:57So one of the examples is...
- 17:00So let me back up for a second.
- 17:02In the field of mental health research,
- 17:03there's been a push recently, or actually not so much
- 17:06recently in the past, like 10, 15 years
- 17:08to do what I call or what are called pragmatic trials
- 17:12with the idea of enrolling much more...
- 17:16A much broader set of people use a broader set of practices
- 17:21or locations around the country.
- 17:23And so what this Wisniewski et al people did was they took
- 17:27the data from one of those large pragmatic trials.
- 17:29And the idea they...
- 17:30Again, the idea was that it should be more representative
- 17:33of people in this case with depression
- 17:35across the U.S.
- 17:37And then, they said, well, what if...
- 17:38In fact, we didn't have that.
- 17:40What if we use sort of our normal study inclusion
- 17:44and exclusion criteria, it's sort of been, we'd like subset,
- 17:47this pragmatic trial data to the people that we think
- 17:50would have been more typically included in a sort of more
- 17:53standard randomized trial.
- 17:55And sort of not surprisingly, they found that
- 17:58the people in the sort of what they call
- 17:59the efficacy sample, those sort of typical trial sample
- 18:03had better outcomes and larger treatment effects
- 18:05than the overall pragmatic trial sample as a whole.
- 18:10We did something similar sort of in education research where
- 18:15it's a little bit in the weeds.
- 18:16I don't really wanna get into the details,
- 18:18but we essentially had a pretty reasonable regression
- 18:22discontinuity design.
- 18:23So we were able to get estimates of the effects of this
- 18:26reading first intervention across a number of states.
- 18:30And we then compared those state wide impact estimates
- 18:34to the estimates you would get if we enrolled only
- 18:38the sorts of schools and school districts that are typically
- 18:41included in educational evaluations.
- 18:44And there we found that this external validity bias
- 18:48was about 0.1 standard deviations,
- 18:50which in education world is fairly large.
- 18:53Certainly people would be concerned about an internal
- 18:56validity bias of that size.
- 18:58So we were able to sort of use this to say, look,
- 19:00if we really wanna be serious about external validity,
- 19:03it might be as much of a problem as sort of typical internal
- 19:06validity bias that people care about in that field.
- 19:13So again, the problem though, is we don't usually
- 19:15have these sorts of designs where we have a population
- 19:17effect estimate, and then sample estimates,
- 19:19and we can compare them.
- 19:21And so instead we can sometimes try to get evidence on sort
- 19:24of the pieces.
- 19:25So, but again, we basically often have very little
- 19:28information on why people end up participating in trials.
- 19:31And we also are having,
- 19:34I think there's growing numbers of methods,
- 19:36but there's still limited information on treatment effect
- 19:39heterogeneity.
- 19:40Individual randomized trials are almost never powered
- 19:43to detect subgroup effects.
- 19:45Although, there is really growing research in this field
- 19:48and that is maybe a topic for another day.
- 19:52Okay.
- 19:53But again, there is a little...
- 19:55I think I'll go through this really quickly, but,
- 19:58I will give credit to some fields which are trying to better
- 20:01understand kind of who are the people that enroll in trials
- 20:04and how do they compare policy populations of interest.
- 20:08So a lot of that has been done in sort of the substance
- 20:11use field.
- 20:12And you can see a bunch of sites here
- 20:14documenting that people who participate in randomized trials
- 20:18of substance use treatment do actually differ quite
- 20:22substantially from people seeking treatment for substance
- 20:25use problems more generally.
- 20:27So for example, the Okuda reference the eligibility criteria
- 20:32in cannabis treatment RCTs would exclude about 80%
- 20:36of patients across the U.S. seeking treatment
- 20:38for cannabis use.
- 20:40And so again, it's sort of there's indications
- 20:43that the people that participate in trials
- 20:45are not necessarily reflective of the people
- 20:48for whom decisions are having to be made.
- 20:54Okay, so hopefully that at least kind of give some
- 20:57motivation for why we want to think more carefully
- 21:01about the population average treatment effect
- 21:04and why we might wanna think about designing studies
- 21:06or analyzing data in ways that help us estimate that.
- 21:10Any questions before I move to, how do we do that?
- 21:19Okay.
- 21:20I will end...
- 21:21I'm gonna hopefully end it at about 12:45, 1250,
- 21:24so we'll have time at the end, too.
- 21:27So, as a statistician, I feel obligated to say,
- 21:31and actually I have a quote on this at the very end
- 21:32of the talk.
- 21:33If we wanna be serious about estimating something,
- 21:36it's better to incorporate that through the design
- 21:38of our study, rather than trying to do it post talk
- 21:41at the end.
- 21:44So let's talk briefly about how we can improve external
- 21:47validity through study or randomized trial design.
- 21:52So again,
- 21:53as I alluded to earlier with the sort of ideal experiment.
- 21:56An ideal scenario is one where we can randomly sample
- 21:59from a population and then randomly assign treatment
- 22:02and control conditions.
- 22:04Doing this will give us a formerly unbiased treatment effect
- 22:07estimate in the population of interest.
- 22:10This is wonderful.
- 22:11I know of about six examples of this type.
- 22:17Most of the examples I know of are actually a federal
- 22:19government programs where they are administered through
- 22:23like centers or sites.
- 22:25And the federal government was able to mandate participation
- 22:28in an evaluation.
- 22:29So classic example is the Head Start Impact Study,
- 22:33where they were able to randomly select headstart centers
- 22:36to participate.
- 22:37And then within each center,
- 22:39they randomized kids to be able to get in off the wait list
- 22:42versus not.
- 22:44An upward bound evaluation had a very similar design.
- 22:48It's funny, I was...
- 22:50I gave a talk on this topic at Facebook and I was like,
- 22:52why is Facebook gonna care about this?
- 22:54Because you would think at a place like Facebook,
- 22:56they have their user sample,
- 22:59they should be able to do randomization within,
- 23:02like they should be able to pick users randomly
- 23:04and then do any sort of random assignment they want
- 23:06within that.
- 23:07It turns out it's more complicated than that, and so,
- 23:10they were interested in this topic,
- 23:12but I think that's another sort of example where people
- 23:15should be thinking, could we do this?
- 23:16Like,
- 23:18in a health system.
- 23:20I can imagine Geisinger or something implement something
- 23:22in their electronic health record where
- 23:24it's about messaging or something.
- 23:26And you could imagine actually picking people randomly
- 23:29to then randomize.
- 23:31But again, that's pretty rare.
- 23:33There's an idea that's called purpose of sampling.
- 23:35And this goes back to like the 1960s or 70s
- 23:39and the idea is sort of picking subjects purposefully.
- 23:44So one example here is like maybe we think
- 23:47that this intervention might look different
- 23:49or have different effects for large versus small
- 23:52school districts.
- 23:53So in our study, we just make an effort to enroll
- 23:56both large and small districts.
- 23:59This is sort of nice.
- 24:00It kind of gives you some variability in the types of people
- 24:05or subjects in the trial, but, it doesn't have the formal
- 24:09representativeness and sort of the formal unbiasness,
- 24:12like the random sampling I just talked about.
- 24:15And then again, sort of similar is this idea and this push
- 24:17in many fields towards pragmatic or practical clinical
- 24:20trials, where the idea is just to sort of try to enroll
- 24:24like kind of more representative sample
- 24:27in sort of a hand wavy way like I'm doing now.
- 24:29So not, it doesn't have this sort of formal statistical
- 24:31underpinning, but at least it's trying to make sure
- 24:35that it's not just patients from the Yale hospital
- 24:38and the Hopkins hospital and whatever sort of large medical
- 24:41centers, at least they might be trying to enroll patients
- 24:45from a broader spectrum across the U.S.
- 24:49Unfortunately, though, as much as I want to do things
- 24:53for design often, we're in a case where there's a study
- 24:56that's already been conducted and we are just
- 25:00sort of stuck analyzing it.
- 25:01And we wanna get a sense for how representative
- 25:04the results might be for a population.
- 25:09Sometimes people, when I talk about this,
- 25:10people are like, well, isn't this what meta-analysis does?
- 25:13Like meta-analysis enables you to combine multiple
- 25:16randomized trials and come up with sort of an overall
- 25:20effect estimate.
- 25:23And my answer to that is sort of yes maybe, or no maybe.
- 25:26Basically, the challenge with meta-analysis,
- 25:30is that until recently, no one really had a potential target
- 25:34population.
- 25:35It was not very formal about what the target population is.
- 25:38I think underlying that analysis is generally
- 25:41sort of a belief that the effects are constant
- 25:44and we're just trying to pool data.
- 25:48And it...
- 25:48And even just like, you can sort of see this,
- 25:50like if all of the trials sampled the same
- 25:52non-representative population,
- 25:54combining them is not going to help you get towards
- 25:57representativeness.
- 25:59That's that I have a former Postdoc Hwanhee Hong,
- 26:01who's now at Duke.
- 26:03And she has been doing some work to try to bridge
- 26:06these worlds and sort of really try to think through,
- 26:08well, how can we better use multiple trials
- 26:12to get to target population effects?
- 26:16There's another field it's called risk cross-design
- 26:18synthesis or research synthesis.
- 26:21This is sort of neat.
- 26:22It's one where you kind of combine randomized trial data,
- 26:26which might be not representative with non-experimental
- 26:30study data.
- 26:31So sort of explicitly trading off the internal and external
- 26:34validity.
- 26:36I'm not gonna get into the details,
- 26:37there's some references here.
- 26:38Ellie Kaizar at Ohio State, is one of the people
- 26:41that's done a lot of work on this.
- 26:45And part of the reason I'm not focused on this is that
- 26:48I work in a lot of areas like education and public health,
- 26:53sort of social science areas,
- 26:54where we often don't have multiple studies.
- 26:56So we often are stuck with just one study and we're trying
- 27:00to use that to learn about target populations.
- 27:04So I'm gonna briefly talk about an example
- 27:07where we trying to sort of do this.
- 27:12And basically, the fundamental idea is to re-weight
- 27:16the study sample to look like the target population.
- 27:21This idea is related to post stratification
- 27:25or, oh my gosh, I'm blanking now.
- 27:27Raking adjustments in surveys.
- 27:31So post stratification would be sort of at a simple level,
- 27:33would be something like...
- 27:35Well, if we know that males and females
- 27:38have different effects, or let's say young and old
- 27:41have different effects, let's estimate the effects
- 27:44separately for young versus old.
- 27:47And then re-weight those using the population proportions
- 27:51of sort of young versus old.
- 27:54That sort of stratification doesn't work if you have more
- 27:58than like one or two categorical effect moderators.
- 28:02And so,
- 28:03what I'm gonna show today is an approach where we use
- 28:06weighting, where we fit a model,
- 28:08predicting participation in the trial,
- 28:10and then weight the trial sample to look like the target
- 28:13population.
- 28:14So similar idea to things like propensity score weights
- 28:17or non-response adjustment weights in samples.
- 28:21There is a different approach,
- 28:23So what I'm gonna illustrate today is sort of this sample
- 28:27selection weighting strategy.
- 28:29You also can tackle this external validity
- 28:32by trying to model the outcome very flexibly
- 28:35and then project outcomes in the population.
- 28:40In some work I did with Jennifer Hill and others,
- 28:43we showed that BARTs, Bayesian Additive Regression Trees
- 28:46can actually work quite well for that purpose.
- 28:49And more recently, Issa Dahabreh at Brown has done some
- 28:53nice work sort of bridging these two and showing
- 28:55basically a doubly robust kind of idea where we can use
- 28:58both the sample membership model and the outcome model
- 29:04to have better performance.
- 29:06But today, I'm gonna just illustrate the weighting approach,
- 29:08partly because it's a really nice sort of pedagogical
- 29:11example and helps you kind of see what's going on
- 29:14in the data.
- 29:16Okay, any questions before I continue?
- 29:21Okay.
- 29:22So the example I'm gonna use is...
- 29:26There was this, I mean, some of you probably know much more
- 29:28about HIV treatment than I do, but the ACTG Trial,
- 29:33which was now quite an old trial,
- 29:36but it was one of the ones that basically showed that
- 29:39HAART therapy, highly active antiretroviral therapy
- 29:42was quite effective at reducing time to AIDS or death
- 29:46compared to standard combination therapy at the time.
- 29:49So it randomized about 1200 U.S. HIV positive adults
- 29:54to treatment versus control.
- 29:56And the intent to tree analysis in the trial
- 29:59had a hazard ratio of 0.51.
- 30:01So again, very effective at reducing time to AIDS or death.
- 30:07So Steve Cole and I though kind of asked the question, well,
- 30:10we don't necessarily just care about the people
- 30:13in the trial.
- 30:14This seems to be a very effective treatment.
- 30:16What could we use this data to project out
- 30:19sort of what the effects of the treatment would be
- 30:22if it were implemented nationwide?
- 30:25So we from CDC got estimates of the number of people
- 30:28newly infected with HIV in 2006.
- 30:32And basically, asked the question sort of if hypothetically,
- 30:35everyone in that group were able to get HAART versus
- 30:40standard combination therapy,
- 30:42what would be the population impacts of this treatment?
- 30:48In this case, because of sort of data availability,
- 30:50we only had the joint distribution of age, sex and race
- 30:55for the population.
- 30:56So we made sort of a pseudo population, again,
- 30:59sort of representing the U.S. population
- 31:02of newly infected people.
- 31:03But again, all we have is sex, race and age,
- 31:06which I will come back to.
- 31:08So this table documents the trial and the population.
- 31:12So you can see for example,
- 31:15that the trial tended to have more sort of 30 to 39 year
- 31:20olds, many fewer people under 30.
- 31:25The trial had more males and also had more whites
- 31:29and fewer blacks, Hispanic was similar.
- 31:32But I wanna flag and we'll come back to this in a minute
- 31:35that, in what I'm gonna show,
- 31:38we can adjust for the age, sex, race distribution.
- 31:41But, there's a real limitation,
- 31:43which is that the CD4 cell count as sort of a measure
- 31:46of disease severity is not available in the population.
- 31:50So this is a potential effect moderator,
- 31:53which we don't observe in the population.
- 31:56So in sort of projecting the impacts, we can say, well,
- 31:59here is the predicted impact given the age, sex,
- 32:03race distribution, but there's this unobserved
- 32:06potential effect moderator that we sort of might be worried
- 32:09about kind of in the back of our heads.
- 32:15So again, I briefly mentioned this,
- 32:17this is like the super basic description
- 32:20of what can be done.
- 32:22There are more nuances and I have some sites at the end
- 32:24for sort of more details.
- 32:26But basically fundamentally will, again,
- 32:28we sort of think about it as we kind of stack
- 32:30our data sets together.
- 32:31So we put our trial sample and our population data set
- 32:34together.
- 32:35We have an indicator for whether someone is in the trial
- 32:38versus the population.
- 32:40And then, we're gonna wait the trial members
- 32:43by their inverse probability of being in the trial
- 32:46as a function of the observed covariance.
- 32:48And again, very similar intuition and ideas
- 32:51and theory underlying this as underlying things
- 32:55like Horvitz-Thomson estimation in sample surveys
- 32:58and inverse probability of treatment waiting
- 33:01in non-experimental studies.
- 33:06So I showed you earlier that age, sex and race
- 33:09are all related to participation in the trial.
- 33:13What I'm not showing you the details of,
- 33:15but just trust me is that those factors also moderate
- 33:19effects in the trial.
- 33:20So the trial showed the largest effects for those ages,
- 33:2430 to 39, males and black individuals.
- 33:28And so, this is exactly why then what we might think
- 33:31that the overall trial estimate might not reflect
- 33:34what we would see population-wide.
- 33:39Ironically though, it turns out actually
- 33:40it kind of all cancels out.
- 33:41So this table shows the estimated population effects.
- 33:45So the first row again, is just the sort of naive trial
- 33:48results.
- 33:50We can then sort of weight by each characteristic
- 33:52separately, and then the bottom row is the combined
- 33:56age, sex, race adjustments.
- 33:58And you can see sort of actually the hazard ratio
- 34:01was remarkably similar.
- 34:03It's partly because like the age weightings
- 34:05sort of makes the impact smaller,
- 34:07but then the race weighting makes it bigger.
- 34:10And so then it kind of just washes out.
- 34:13But again, it's sort of a nice example,
- 34:15cause you can sort of see how the patterns
- 34:17evolve based on the size of the effects
- 34:20and the sample selection.
- 34:23I also wanna point out though that, of course,
- 34:25the confidence interval is wider,
- 34:27and that is sort of reflecting the fact that we are doing
- 34:30this extrapolation from the trial sample to the population.
- 34:33And so there's sort of a variance price we'll pay for that.
- 34:39Okay.
- 34:40So I haven't been super formal on the assumptions,
- 34:44but I'm I alluded to this?
- 34:45So I wanna just take a few minutes to turn
- 34:48to what about unobserved moderators?
- 34:50Because again, we can interpret this 0.57
- 34:54as the sort of overall population effect estimate
- 34:58only under an assumption that there are no unobserved
- 35:01moderators that differ between sample and population,
- 35:06once we adjust for age, sex, race.
- 35:11Okay, and in reality,
- 35:14such unobserved effect moderators are likely the rule,
- 35:17not the exception.
- 35:18So again, sort of, as I just said,
- 35:20the key assumption is that we've basically adjusted
- 35:23for all of the effect moderators.
- 35:26Very kind of comparable assumption to the assumption
- 35:30of no an observed confounding in a non-experimental study.
- 35:35And one of the reasons this is an important assumption
- 35:38to think about, is that, it is quite rare actually
- 35:42to have extensive covariate data overlap
- 35:46between the sample and the population.
- 35:48I have been working in this area for...
- 35:51How many years now?
- 35:52At least 10 years.
- 35:53And I've found time and time again,
- 35:56across a number of content areas,
- 35:58that it is quite rare to have a randomized trial sample
- 36:01and the target population dataset
- 36:03with very many comparable measures.
- 36:06So in the Stuart and Rhodes paper,
- 36:08this was in like early childhood setting
- 36:12and each data set, the trial and the population data
- 36:15had like over 400 variables observed at baseline.
- 36:19There were literally only seven that were measured
- 36:22consistently between the two samples.
- 36:25So essentially we have very limited ability then to adjust
- 36:28for these factors because they just don't have much overlap.
- 36:32So what that then motivated us to create some sensitivity
- 36:37analysis to basically probe and say, well,
- 36:40what if there is an unobserved effect moderator,
- 36:43how much would that change our population effect estimate?
- 36:47Again, this is very comparable to analysis of sensitivity,
- 36:51to unobserved confounding and non-experimental studies
- 36:54sort of adapted for this purpose of trial population,
- 36:59generalized ability.
- 37:03I think I can skip this in the interest of time and not go
- 37:06through all the details.
- 37:07If anyone wants the slides by the way,
- 37:08feel free to email me, I'm happy to send them.
- 37:13I'm gonna skip this too cause I've already said
- 37:15sort of the key assumption that is relevant for right now,
- 37:19but basically what we propose is,
- 37:24I'm gonna talk about two cases.
- 37:26So the easier case is this one where we're gonna assume
- 37:29that the randomized trial observes all of the effect
- 37:32moderators.
- 37:33And the issue is that our target population dataset
- 37:36does not have some moderators observed.
- 37:41I think this is fairly realistic because at least
- 37:43like to think that the people running the randomized trials
- 37:47have enough scientific knowledge and expertise
- 37:50that they sort of know what the likely effect moderators
- 37:52are and that they measure them in the trial.
- 37:55That is probably not fully realistic, but I'm...
- 37:58I like to give them sort of the benefit of the doubt
- 38:00on that.
- 38:01And that sort of that's what the ACTG example,
- 38:05was like CD4 count would be an example of this,
- 38:07where we have CD4 count in the trial,
- 38:11but we just don't have it in the population.
- 38:14So what we showed is that there's actually,
- 38:16a couple of different ways you can implement
- 38:18this sort of sensitivity analysis.
- 38:22One is essentially kind of an outcome model based one
- 38:25where you,
- 38:28basically, we just sort of specify a range
- 38:30for the unobserved moderator V in the population.
- 38:34So we kind of say, well, we don't know
- 38:36the distribution of this moderator in the population,
- 38:40but we're gonna guess that it's in some range.
- 38:43And then, we kind of projected out using data from the trial
- 38:48to understand like the extent of the moderation
- 38:51due to that variable.
- 38:53There's another variation on this,
- 38:55which is sort of the weighting variation
- 38:58where you kind of adjust the weights,
- 39:00essentially again for this unobserved moderator.
- 39:03Again, either way you sort of basically just have to specify
- 39:07a potential range for this V, the unobserved moderator
- 39:11in the population.
- 39:14So here's an example of that.
- 39:16This is a different example, where we were looking
- 39:18at the effects of a smoking cessation intervention
- 39:21among people in substance use treatment.
- 39:24And in the randomized trial, the mean addiction score
- 39:31was four.
- 39:33But we didn't have this addiction score,
- 39:35in the target population of interest.
- 39:37And so, what the sensitivity analysis allows us to do
- 39:40is to say, well, let's imagine that range is anywhere
- 39:44from three to five.
- 39:45And how much does that change our population effect
- 39:49estimates?
- 39:51Essentially, how steep this line is, is gonna be
- 39:54sort of determine how much it matters.
- 39:57And the steepness of the line basically
- 39:59is how much of a moderator is it,
- 40:02sort of how much effect heterogeneity is there in the trial
- 40:05as a result of that variable.
- 40:07But again, this is at least one way to sort of turn
- 40:11this sort of worry about an unobserved moderator
- 40:13into a more formal statement about how much
- 40:16it really might matter.
- 40:21I'm not gonna get into this partly,
- 40:22so you might also be thinking, well,
- 40:24what if the trial doesn't know what all the moderators are?
- 40:27And what if there's some fully unobserved moderator
- 40:31that will call U?
- 40:34This is a much much harder, basically,
- 40:36if anyone wants to try to dig into it, that would be great.
- 40:39Part of the reason it's harder is because you have to make
- 40:42very strong assumptions about the distribution
- 40:44of the observed covariance and U together.
- 40:48We put out one approach,
- 40:49but it is a fairly special case and not very general.
- 40:53So again, hopefully we're not in this sort of scenario
- 40:56very often.
- 41:01This is a little bit of a technicality,
- 41:03but often epidemiologists ask this question.
- 41:05So I've laid stuff out again with respect to kind of a risk
- 41:09difference or a difference in outcomes
- 41:12and sort of like more of like an additive treatment scale.
- 41:15There is this real complication that arises,
- 41:17which is that if you have like a binary,
- 41:20like the scale of the outcome matters in terms of effect
- 41:25moderation.
- 41:26And in particular, there might be sort of more apparent
- 41:30effect heterogeneity on one scale versus another.
- 41:33So I'm just kind of flagging this, that like this exists,
- 41:37there are some people sort of looking at this in more
- 41:39formal, but again for now sort of just think about like risk
- 41:44difference kind of scale.
- 41:47Okay, great.
- 41:48So let me just conclude with a few kind of final thoughts.
- 41:51So, I think all of us, not all of us,
- 41:54but often we sort of want to assume that study results
- 41:58generalize.
- 41:58Often people write a discussion section in a paper,
- 42:01where they kind of qualitatively have some sentences
- 42:05about why they do or don't think that the results
- 42:08in this paper kind of extend to other groups
- 42:10or other populations.
- 42:13But I think until the past again, sort of five or so years,
- 42:16a lot of that discussion was very hand-wavy
- 42:19and sort of qualitative.
- 42:21I think that what we are seeing in epidemiology
- 42:24and statistics and bias statistics
- 42:26recently has been a push towards having more
- 42:29ability to quantify this and make it sort of more formal
- 42:33statements.
- 42:35So I think if we do wanna be serious though,
- 42:37about assessing and enhancing external validity,
- 42:41again, we really need these different pieces.
- 42:43We need information on the factors that influence effect
- 42:46heterogeneity the moderators.
- 42:49We need information on the factors that influence
- 42:51participation in rigorous studies like randomized trials.
- 42:55And we need data on all of those things,
- 42:57in the trial and the population.
- 43:00And then finally, we need statistical methods that allow us
- 43:04to use that data to estimate population treatment effects.
- 43:08I would argue that that last bullet is sort of much further
- 43:12along than any of the others.
- 43:13That in my experience,
- 43:15the limiting factor is usually not the methods.
- 43:19The limiting factor at this point in time is the data
- 43:22and sort of the scientific knowledge
- 43:25about these different factors.
- 43:29And that's what this slide is.
- 43:30So I think I've already said, but that again,
- 43:33is sort of one of the motivations for the sensitivity
- 43:35analysis is just a recognition that it's often,
- 43:39really quite hard to get data that
- 43:42is consistently measured between a trial and a population.
- 43:47So on that point, recommendations again,
- 43:49if we wanna be serious about effect heterogeneity
- 43:51or about estimating population treatment effects,
- 43:55we need better information on treatment effect heterogeneity
- 43:59that might be better analysis of existing trials,
- 44:02that might be meta-analysis of existing trials.
- 44:05That might also be theoretical models for the interventions
- 44:07to understand what the likely moderators are.
- 44:12We also need better information on the factors
- 44:14that influence participation in trials and more discussion
- 44:17of how trial samples are selected.
- 44:22We need to standardize measures.
- 44:23So again, it's incredibly frustrating when you have trial
- 44:26and population data, but the measures in them are not
- 44:30consistent.
- 44:31There are methods that can be used for this,
- 44:33some data harmonization approaches,
- 44:36but, they require assumptions.
- 44:39It's better if we can be thoughtful and strategic about,
- 44:42for example, common measures across studies.
- 44:45I will say one of the frustrations too,
- 44:47is that in some fields like the early childhood data
- 44:51I talked about,
- 44:52part of the problem was like the two data sets might
- 44:55actually have the same measure,
- 44:56but they didn't give the raw data,
- 44:58and they're like standardized scales differently.
- 45:01Like they standardized them to their own population,
- 45:03not sort of more generally.
- 45:05And so they, weren't sort of on the same scale in the end.
- 45:10As a statistician, of course, I will say we do need more
- 45:12research on the methods and understanding when they work
- 45:15and when they don't.
- 45:16There are some pretty strong assumptions
- 45:19in these approaches.
- 45:20But again, I think that sort of in some ways,
- 45:24that is further along and then some of the data situations.
- 45:29So I just wanted to take one minute to flag some current
- 45:32work in case partly if anyone wants to ask questions about
- 45:34these.
- 45:36One thing I'm kind of excited about,
- 45:38especially in my education world is...
- 45:42So what I've been talking about today has mostly been,
- 45:44if we have a trial sample and we wanna project
- 45:46to kind of a larger target population.
- 45:49But there's an equally interesting question,
- 45:51which is sort of how well can randomized trial informs
- 45:54or local decision making?
- 45:56So if we have a randomized trial with 60 schools in it,
- 46:01how well can the results from that trial be used to inform
- 46:04individual school districts decisions?
- 46:07Turns out, not particularly well.
- 46:09(laughs)
- 46:10We can talk more about that.
- 46:12I mentioned earlier, Issa Dahabreh, who's at Brown,
- 46:15and he's really interested in developing sort of the formal
- 46:18theories underlying different ways of estimating
- 46:21these population effects, again, including some
- 46:23doubly robust approaches.
- 46:26Trang Nguyen, who works at Hopkins with me,
- 46:29we are still looking at sort of the sensitivity analysis
- 46:32for unobserved moderators.
- 46:34I mentioned Hwanhee Hong already, who's now at Duke.
- 46:37And she, again, sort of straddles the meta-analysis world
- 46:40in this world, which has some really interesting
- 46:43connections.
- 46:45My former student now he's at Flatiron Health
- 46:48as of a few months ago.
- 46:50Ben Ackerman, did some work on sort of measurement error
- 46:53and sort of partly how to deal with some of these
- 46:55measurement challenges between the sample and population.
- 47:00And then I'll just briefly mention Daniel Westreich at UNC,
- 47:04who is really...
- 47:05If you come from sort of more of an epidemiology world,
- 47:09Daniel has some really nice papers that are sort of trying
- 47:11to translate these ideas to epidemiology,
- 47:14and this concept of what he calls target validity.
- 47:17So sort of rather than thinking about internal and external
- 47:20validity separately, and as potentially,
- 47:23in kind of conflict with each other,
- 47:26instead really think carefully about a target of inference
- 47:29and then thinking of internal and external validity
- 47:31sort of within that and not sort of trying to prioritize
- 47:35one over the other.
- 47:37And then just an aside, one thing,
- 47:40I would love to do more in the coming years is thinking
- 47:43about combining experimental and non-experimental evidence.
- 47:46I think that is probably where it would be very beneficial
- 47:49to go instead of more of that cross designed synthesis
- 47:52kind of idea.
- 47:55But again, I wanna conclude with this,
- 47:57which is gets us back to design and that again,
- 48:01sort of what is often the limiting factor here is the data
- 48:04and just sort of strong designs.
- 48:07So Rubin, 2005 with better data, fewer assumptions
- 48:10are needed and then Light, Singer and Willett,
- 48:13who are sort of big education methodologists.
- 48:16You can't fix by analysis what you've bungled by design.
- 48:19So again, just wanna highlight that if we wanna be serious
- 48:22about estimating population effects,
- 48:24we need to be serious about that in our study designs,
- 48:27both in terms of who we recruit,
- 48:30but then also what variables we collect on them.
- 48:32But if we do that,
- 48:33I think that we can have the potential to really help guide
- 48:37policy and practice by thinking more carefully
- 48:39about the populations that we care about.
- 48:43So for more...
- 48:44Here's this, there's my email, if you wanna email me
- 48:47for the slides.
- 48:49And thanks to various funders, and then I'll leave this up
- 48:53for a couple minutes,
- 48:55which are all big, tiny font, some of the references,
- 48:59but then I'll take that down in a minute so that we can see
- 49:01each other more.
- 49:02So thank you, and I'm very happy to take some questions.
- 49:14I don't know if you all have a way to organize
- 49:16or people just can
- 49:19jump in.
- 49:24- So maybe I'll ask the question.
- 49:25Thanks Liz, for this very interesting and great talk.
- 49:29So I noticed that you've talked about the target population
- 49:34in this framework.
- 49:35And I think there are situations where the population sample
- 49:39is actually a survey from a larger population.
- 49:43- Yeah.
- 49:44- Cause we do not really afford to absorb everything,
- 49:47actual population, which will contain
- 49:49like millions of individuals.
- 49:50And so in that situation, does the framework still apply
- 49:55particularly in terms of the sensitivity analysis?
- 49:58And is there any caveat that we should also know in dealing
- 50:01with those data?
- 50:03- Great question.
- 50:05And actually, thank you for asking that because I forgot
- 50:07to mention that Ben Ackerman's dissertation,
- 50:10also looked at that.
- 50:11So I mentioned his measurement error stuff.
- 50:13But yes, actually, so Ben's second dissertation paper
- 50:17did exactly that, where we sort of laid out the theory
- 50:21for when these the target population data
- 50:24comes from a complex survey itself.
- 50:29Short answer is yes, it all still works.
- 50:31Like you have to use the weights, there are some nuances,
- 50:34but, and you're right, like essentially,
- 50:36especially like in...
- 50:38Like for representing the U.S. population, often, the data
- 50:41we have is like the National Health Interview Survey
- 50:44or the Add Health Survey of Adolescents,
- 50:47which are these complex surveys.
- 50:49So short answer is, yeah, it still can work.
- 50:53Your question about the sensitivity analysis is actually
- 50:55a really good one and we have not extended...
- 50:58I'd have to think, I don't know, off hand, like,
- 51:00I think it would be sort of straightforward to extend
- 51:04the sensitivity analysis to that, but we haven't actually
- 51:07done it.
- 51:08- Thanks Liz.
- 51:11The other short question is that I noticed that
- 51:12in your slide, you first define, PATE as population ate,
- 51:16but then in one slide you have this Tate,
- 51:19which I assume is target ate.
- 51:21And so, I'm just really curious as to like, is there any,
- 51:25like differences or nuances in the choice of this
- 51:27terminology?
- 51:29- Good question.
- 51:30And no, yeah, I'm not...
- 51:31I wasn't very precise with that, but in my mind, no.
- 51:35Over time I've been trying to use Tate,
- 51:38but you can see that kind of just by default,
- 51:40I still sometimes use PATE.
- 51:43Part of the reason I use Tate is because I think
- 51:46the target is just a slightly more general term.
- 51:48Like people sometimes I think, think if we meet,
- 51:50if we say PATE, the population has to be like
- 51:53the U.S. population or some like very sort of big,
- 51:58very official population in some sense.
- 52:01Whereas, the target average treatment effect,
- 52:04Tate terminology, I think reflects that sometimes
- 52:06it's just a target group that's well-defined.
- 52:10- Gotcha.
- 52:11Thanks, that's very helpful.
- 52:12And I think we have a question coming from the chat as well.
- 52:15- Yeah, I just saw that.
- 52:16So I can read that.
- 52:17We have theory for inference from a sample to a target
- 52:20population needs to find that internal validity approaches,
- 52:23what theory is there for connecting the internal validity
- 52:25methods to external validity?
- 52:29So I think, what you mean is sort of,
- 52:33what is the formal theory for projecting the impact
- 52:37to the target population?
- 52:38That is exactly what some of those people that I referenced
- 52:41sort of lay out.
- 52:42Like I didn't...
- 52:42For this talk, I didn't get into all the theoretical weeds,
- 52:45but if you're interested in that stuff,
- 52:46probably some of Issa Dahabreh's work would be the most
- 52:49relevant to look at.
- 52:51Cause he really lays out sort of the formal theory.
- 52:54I mean, some of my early papers on this topic did it,
- 52:58but his is like a little bit more formal and sort of makes
- 53:01connections to the doubly robust literature
- 53:04and things like that.
- 53:04And so it's really...
- 53:06Anyway, that's what this whole literature
- 53:08and part of it is sort of building is that theoretical base
- 53:11for doing this.
- 53:17Any other questions?
- 53:28- [Ofer] Liz,
- 53:29I'm Ofer Harel.
- 53:30- Oh, hi Ofer?
- 53:31- [Ofer] Hi.
- 53:33(mumbles)
- 53:34Just jump on the corridor, so it's make it great.
- 53:39So in most of the studies that I would work on,
- 53:43they don't do really have a great idea about
- 53:46what really the population is and how really to measure
- 53:50those.
- 53:51So it's great if I have some measure of the population,
- 53:54but most of the time it is the studies that I work.
- 53:57I have no real measurements on that population.
- 54:02What happens then?
- 54:03- Yeah, great question.
- 54:04And in part, I meant to say this,
- 54:06but that's one of the reasons why the analogy...
- 54:08Why the design strategies don't always work particularly
- 54:10well is like, especially when you're just starting out
- 54:13a study, right?
- 54:14We don't really know the target population.
- 54:17I think certainly to do any of these procedures,
- 54:21you need eventually to have a well defined population.
- 54:25But I think that's partly why some of the analysis
- 54:27approaches are useful is that,
- 54:29you might have multiple target populations.
- 54:31Like we might have one trial,
- 54:33and we might be interested in saying,
- 54:35how well does this generalize to the State of New Hampshire
- 54:39or the State of Vermont or the State of Connecticut?
- 54:41And so, you could imagine one study that's used to inform
- 54:45multiple target populations.
- 54:48With different assumptions,
- 54:49sort of you have to think through the assumptions
- 54:50for each one.
- 54:52If you don't even,
- 54:54I guess I would say if you don't even know
- 54:56who your population is, you shouldn't be using these methods
- 54:59at all, cause like the whole premise is that there is some
- 55:02well-defined target population and you do need data on it
- 55:05or at least...
- 55:07Yeah, the joint distribution of some covariance
- 55:09or something.
- 55:10Without that, you're kind of just like,
- 55:13I don't know, what a good analogy is,
- 55:15but you're kinda just like guessing at everything.
- 55:24(mumbles)
- 55:26- No, go ahead.
- 55:27Go ahead.
- 55:29- Oh, Vinod, yeah.
- 55:30All my friends are popping up, it's great.
- 55:32(laughs)
- 55:34- [Vinod] Can I go ahead?
- 55:35I feel like I'm talking to someone.
- 55:39- Yeah, go ahead Vinod.
- 55:40- [Vinod] That was a great talk.
- 55:42So I have a little ill formulated question,
- 55:44but it's queuing after just the last question
- 55:47that was asked is,
- 55:49in clinical set populations where,
- 55:55in some ways we're using this clinical samples
- 55:58to learn about the population because unless they seek help,
- 56:02we often don't know what they are in the wild, so to speak.
- 56:05And so, each sampling of that clinical population
- 56:09is a maybe by sampling of that larger population
- 56:13in the wild.
- 56:14So I guess my question is, how do you get around this,
- 56:18I guess Rumsfeld problem, which is every time you sample
- 56:22there's this unknown, unknown, but there's no way to get
- 56:24at them because in some ways, your sampling relies on...
- 56:27If we could say it relies on help seeking,
- 56:30which is by itself as process.
- 56:33And if we could just stipulate, there's no way to get
- 56:35around that.
- 56:36How do you see this going forward?
- 56:40- Yeah, good question.
- 56:40I think right, particularly relevant in mental health
- 56:43research where there's a lot of people who are not seeking
- 56:46treatment.
- 56:47These methods are not gonna help with that in a sense
- 56:50like again, they are gonna be sort of tuned to whatever
- 56:53population you have.
- 56:55I think though there are...
- 56:57If you really wanna be thoughtful about that's
- 57:00problem, that's where sort of some of the strategies
- 57:03that were used like the Epidemiologic Catchment Area
- 57:05Surveys, where they would go door to door and knock on doors
- 57:08and do diagnostic interviews.
- 57:11Like if we wanna be really serious about trying to reach
- 57:14everyone and get an estimate of the really sort of true
- 57:17population, then we really have to tackle that
- 57:20very creatively and with a lot of resources probably.
- 57:25- [Vinod] Thanks.
- 57:27- Welcome.
- 57:29- Hi Liz?
- 57:30Yeah, it's gonna be a true question and great talk
- 57:33by the way.
- 57:35I'm curious, you mentioned there could be a slight
- 57:38difference between the terms transportability
- 57:40and generalizability.
- 57:41Yeah, I'm curious about that.
- 57:43- Yeah, briefly, this is a little bit of a...
- 57:48What's the word?
- 57:48Simplification, but briefly I think of generalizability
- 57:51as one where the sample that, like the trial sample
- 57:55is a proper subset of the population.
- 57:57So we do a trial in New Hampshire,
- 58:01and we're trying to generalize to new England.
- 58:04Whereas transportability is one where it is not a proper
- 58:08subset, so we do a trial in the United States
- 58:10and we wanna transport to Europe.
- 58:14Underlying both, the reason I don't worry too much about it,
- 58:17the terms is because either way,
- 58:19the assumption is essentially the same.
- 58:21Like you still have to make this assumption about
- 58:23no unobserved moderators.
- 58:25It's just that it's probably gonna be a stronger assumption
- 58:28and harder to believe,
- 58:30when transporting rather than when generalizing.
- 58:33Cause you sort of know that you're going from one place
- 58:36to another in some sense.
- 58:39- Thanks, makes sense.
- 58:41- Sure.
- 58:43- I think there's another question in the chat.
- 58:45- Yeah, so this is a great question.
- 58:46I'm glad shows you on.
- 58:48I hope I got that.
- 58:50It seems there are multiple ways to calculate the Tate
- 58:53from standardization to waiting to the outcome model.
- 58:55Do you have comments for their performance under different
- 58:57circumstances?
- 58:58Great question, and I don't.
- 59:01I mean, there has been...
- 59:02This is an area where I think
- 59:04it'd be great to have more research on this topic.
- 59:06So I have this one paper with Holger Kern and Jennifer Hill
- 59:09where we sort of did try to kind of explore that.
- 59:14And honestly, what we found not surprisingly
- 59:16is that if that no unmeasured moderator assumption holds,
- 59:20all the different methods are pretty good and fine.
- 59:23And like, we didn't see much difference in them.
- 59:25If that no unobserved moderator assumption doesn't hold
- 59:28then of course, none of them are good.
- 59:29So it sort of is like similar to propensity score world.
- 59:33Like, the data you have is more important than what you do
- 59:35with the data in a sense.
- 59:38But anyway, I think that that is something that like,
- 59:40we need a lot more work on.
- 59:42One thing, for example, I do have a student working on this.
- 59:45Like, we're trying to see if your sample
- 59:47is a tiny proportion of the population, like how...
- 59:51Cause like there's different.
- 59:52That's one where like waiting might not work as well
- 59:54actually, who knows.
- 59:56Anyways, so like all of these different data scenarios,
- 59:58I think need a lot more investigation to have better
- 01:00:01guidance on when the different methods work well.
- 01:00:09Anything else or maybe we're out of time?
- 01:00:11I don't know, how tight you are at one o'clock.
- 01:00:20- I think we're at an hour, so let's...