BIS Seminar: Dealing with observed and observed effect moderators wehn estimating population average treatment effects
September 22, 2020Information
Elizabeth Stuart
Associate Dean for Education
Bloomberg PRofessor of American Health
September 22, 2020
ID5661
To CiteDCA Citation Guide
- 00:00- Maybe one or two minutes and then,
- 00:02I'll have you introduced.
- 00:03- And it's about, and so I...
- 00:05And it's gonna be more fun for me if it's a little
- 00:07interactive, as much as we can make it.
- 00:09So I won't be able to see all of you nodding and whatnot,
- 00:12but please feel free to jump in.
- 00:15And the talk's gonna be pretty non-technical.
- 00:17My goal is mostly to sort of help
- 00:19convey some of the concepts and ideas and so I will.
- 00:23Hopefully it will be a reasonable topic to do via Zoom.
- 00:30Great, so I think,
- 00:33Frank basically gave this stuff that's relevant
- 00:36on this slide.
- 00:37I do also wanna apologize, those of you guys
- 00:39who I was supposed to meet with this morning, we have a...
- 00:41My husband broke his collarbone over the weekend.
- 00:44So I've had to cancel things this morning,
- 00:47but I'm glad I'm able to still do this seminar,
- 00:51I didn't wanna,
- 00:52have to cancel that.
- 00:54So again,
- 00:56the topic is gonna be sort of this idea of external
- 00:59validity, which I think is a topic that people often
- 01:01are interested in because it's the sort of thing
- 01:04that we often think sort of qualitatively about,
- 01:06but there hasn't been a lot of work thinking about it
- 01:08quantitatively.
- 01:09So again, my goal today will be to sort of help
- 01:11give a framework for thinking about external validity
- 01:15in sort of a more formal way.
- 01:19So let's start out with the sorts of questions
- 01:22that might be relevant when you're thinking about
- 01:25external validity.
- 01:27So it might be research questions like a health insurer
- 01:30is deciding whether or not to approve some new treatment
- 01:34for back pain.
- 01:36There might be interested predicting overall population
- 01:39impacts of a broad public health media campaign.
- 01:43A physician practice might be deciding whether training
- 01:46providers in a new intervention would actually be cost
- 01:49effective given the patient population that they have.
- 01:53And that I felt like I needed to get some COVID
- 01:55example in...
- 01:57But, for example, a healthcare system,
- 01:59might wanna know whether it's sort of giving convalescent
- 02:02plasma to all of the individuals recently diagnosed
- 02:06with COVID-19 in their system, whether that would
- 02:08sort of lead to better outcomes overall.
- 02:12So all of these...
- 02:15What I'm distinguishing here or sort of trying to convey
- 02:17is that all of these reflect what I will call a population
- 02:20average treatment effect.
- 02:22So across some well-defined population,
- 02:25does some intervention work sort of on average.
- 02:28The population might be pretty narrow.
- 02:30Again, it might be the patients in one particular
- 02:33physician practice, or might be quite broad.
- 02:35It could be everyone in the State of Connecticut
- 02:38or in the entire country.
- 02:40But either way, it's a well-defined kind of population
- 02:44and we'll come back to that.
- 02:46What's really important,
- 02:48and this will sort of underlie much of the talk
- 02:50is that kind of the whole point is that there might
- 02:52be underlying treatment effect heterogeneity.
- 02:55So there might be some individuals
- 02:57for whom this treatment of interest is actually
- 02:59more effective than others.
- 03:01But what I wanna be clear about, is the goal of inference
- 03:04that I'm talking about today, is gonna be about
- 03:07this overall population average.
- 03:09So we're not trying to say like which people
- 03:11are gonna benefit more or sort of to which people
- 03:14should we give this treatment.
- 03:16It's really more a question of sort of more population
- 03:20level decisions, sort of if we have...
- 03:22If we're making a decision, that's sort of a policy
- 03:24kind of population level,
- 03:25on average is this gonna be something that makes sense.
- 03:28So I hope that distinction makes sense.
- 03:30I'm happy to come back to that.
- 03:35So again until I don't know, five or,
- 03:38well maybe now more than 10 years ago,
- 03:41there had been relatively little attention
- 03:43to the question of how well results from
- 03:46kind of well-designed studies like a randomized trial
- 03:50might carry over to a relevant target population.
- 03:53I think in much of statistics as well as fields
- 03:56like education research, public policy, even healthcare,
- 04:00there's really been a focus on randomized trials
- 04:03and getting internal validity,
- 04:05and I'll formalize this in a minute.
- 04:07But in the past 10 or so years, there's been more and more
- 04:10interest in this idea of how well can we take the results
- 04:13from a particular study and then project them
- 04:17to well-defined target population.
- 04:20And again, so today I'm gonna try to give
- 04:21sort of an overview of the thinking in this area,
- 04:24along with some of the limitations and in particular,
- 04:27the data limitations that we have in thinking about this.
- 04:33One thing I do wanna be clear about is there's a lot
- 04:36of reasons why results from randomized trials
- 04:38might not generalize.
- 04:40There's some classic examples in education
- 04:42where there are scale-up problems.
- 04:44The classic example is one I'm looking at,
- 04:50class size.
- 04:51And so, in Tennessee, they randomly assign kids
- 04:54to be in smaller versus larger classes
- 04:57and found quite large effects of smaller classes.
- 05:00But then, when the State of California tried to implement
- 05:03this, the problem is that you need a lot more teachers
- 05:06to kind of roll that out statewide.
- 05:08And so, it led actually to a different pool of teachers
- 05:11being hired.
- 05:12And so, there's sort of scale-up problems
- 05:14sometimes with the interventions and that might lead
- 05:16to different contexts or different implementation.
- 05:19Today, what I'm gonna be focusing on are differences
- 05:21between a sample and a population.
- 05:25Their difference is in sort of baseline characteristics,
- 05:28that moderate treatment effects.
- 05:29And again, I'll formalize this a little bit as we go along.
- 05:33Just as a little bit of an aside,
- 05:34but in case some of you know this field a little bit,
- 05:37just to give you a little, just...
- 05:39I wanna flag this.
- 05:40Some people might use the term transportability.
- 05:43So some of the literature in this field uses the term
- 05:46transportability.
- 05:47I tend to use generalizability.
- 05:50There's some subtle differences between the two,
- 05:52which we can come back to, but for all intents and purposes,
- 05:55like they basically can think of them interchangeably
- 05:59for now.
- 06:00I also wanna note, if any of you kind of come
- 06:02from like a survey world, these debates about
- 06:06kind of how well a particular sample reflects a target
- 06:09population are exactly, not exactly the same,
- 06:12but very similar to the debates happening in the survey
- 06:15world around non-probability samples and sort of concerns
- 06:19about,
- 06:21the use of like say online surveys and things that might not
- 06:25have a true formal sort of survey sampling design,
- 06:28and sort of some of the concerns that arise about
- 06:31generalizability.
- 06:32So there's this whole parallel literature in the survey
- 06:34world.
- 06:35Andrew Mercer has a nice summary of that.
- 06:37Again, I'm happy to talk more about that.
- 06:41Okay, any questions before I keep going?
- 06:49Okay.
- 06:49So let me formalize kind of what we're talking about
- 06:52a little bit.
- 06:53This is...
- 06:55This framework is now, 12 years old.
- 06:59Time goes quickly.
- 07:01But we're just to formalize what we're interested in.
- 07:05The goal is to estimate, again, this what I'll call
- 07:07a population average treatment effect or PATE.
- 07:10And so here,
- 07:12hopefully you're familiar with sort of potential outcomes
- 07:14and causal inference.
- 07:16But the idea is that we have some well-defined population
- 07:19of size N.
- 07:20And Y(1) is the potential outcomes, if people
- 07:24in that population receive the treatment condition
- 07:28of interest.
- 07:29Y(0) are the outcomes if they receive the control
- 07:32or comparison condition of interest.
- 07:34So here, we're just saying we're interested
- 07:35in the average effect, basically sort of the difference
- 07:40in potential outcomes, average across the population.
- 07:46We could be doing this with risk ratios
- 07:49or odds ratios or something.
- 07:51Those are a little more complicated because the math
- 07:53doesn't work as nicely.
- 07:55So for now think about it more like risk differences
- 07:57or something, if you have a binary outcome,
- 08:00the same fundamental points hold.
- 08:03So I'm not gonna tell you right now where
- 08:05the data we have came from, but imagine that we just
- 08:08have a simple estimate of this PATE,
- 08:11as the difference in means of some outcome
- 08:14between an observed treated group and an observed
- 08:16control group.
- 08:17So again, we see that there's a bunch of people
- 08:20who got treated, a bunch of people who got control,
- 08:22and we might estimate this PATE as just the simple
- 08:25difference in means between again, the treatment group
- 08:28and the control group.
- 08:29So what I wanna talk through for the next couple of minutes,
- 08:32is the bias in this sort of naive estimate of the PATE.
- 08:36So we'll call that Delta.
- 08:38So I'm being a little loose with notation here,
- 08:40but sort of the PATE that the bias essentially
- 08:43think of it as sort of the difference between
- 08:45the true population effect and our naive estimate of it.
- 08:49And what this paper did with Gary King and Kosuke Imai,
- 08:54we sort of laid how different choices of study designs
- 08:58impact the size of this bias.
- 09:01And in particular, we showed that sort of under
- 09:03some simplifying situations,
- 09:05sort of mathematical simplicity,
- 09:07you can decompose that overall bias into four pieces.
- 09:11So the two Delta S terms are what are called,
- 09:15what we call sample selection bias.
- 09:17So basically, the bias that comes in if our data sample
- 09:22is not representative of the target population
- 09:25that we care about.
- 09:27The Delta T terms are our typical sort of confounding bias.
- 09:31So bias that comes in if our treatment group is dissimilar
- 09:36from our control group.
- 09:38The X refers to the variables we observe,
- 09:40and the U refers to variables that we don't observe.
- 09:45So what we then did in the paper,
- 09:46and this is sort of what motivates a lot of this work
- 09:49is to think through these, again, the trade offs
- 09:51in these different designs.
- 09:53And essentially what we're trying to sort of point out
- 09:56is that...
- 09:59Let's go to the second row of this table first actually,
- 10:01a typical experiment.
- 10:02So a typical experiment, I would say is one where
- 10:06we kind of take whoever comes in the door,
- 10:08we kind of try to recruit people for a randomized trial,
- 10:11whether that's schools or patients or whatever it is.
- 10:16And we randomized them to treatment and control groups.
- 10:19So that is our typical randomized experiment.
- 10:22The treatment selection bias in that case is zero.
- 10:26In expectation, that's why we like randomized experiments.
- 10:29In expectation, there is no confounding
- 10:32and we get an unbiased treatment effect estimate
- 10:34for the sample at hand.
- 10:37The problem for population inference
- 10:40is that the Delta S terms might be big,
- 10:43because the people that agree to be in a randomized trial,
- 10:46might be quite different from the overall population
- 10:49that we care about.
- 10:51So in this paper, we're trying to just sort of...
- 10:53In some ways, be a little provocative and point this out
- 10:56that our standard thinking about study designs
- 10:59and sort of our prioritization of randomized trials,
- 11:03implicitly prioritizes internal validity over external
- 11:07validity.
- 11:08And in particular, if we really care about
- 11:12population effects, we really should be thinking about
- 11:15these together and trying to sort of have small
- 11:18sample selection bias and small treatment selection bias.
- 11:22So an ideal experiment would be one where we can randomly
- 11:25select people for our trial.
- 11:28Let's say we have...
- 11:30Well, actually, I'll come back to that in a second.
- 11:31Randomly select people for our trial and then randomly
- 11:34assign people to treatment or control groups.
- 11:37And in expectation, we will have zero bias in our population
- 11:41effect estimate.
- 11:42But these other designs, and again,
- 11:44like a typical experiment might end up having larger bias
- 11:47overall, than a well designed non-experimental study,
- 11:51where if we do a really good job like adjusting
- 11:54for confounders,
- 11:55it may be that well done non-experimental study
- 11:59conducted using say the electronic health records
- 12:02from a healthcare system might actually give us lower bias
- 12:06for a population effect estimate.
- 12:08Then does a non-representative small randomized trial.
- 12:12Again, a little provocative,
- 12:13but I think useful to be thinking about what is really our
- 12:17target of inference and how do we get data that is most
- 12:19relevant for that.
- 12:22I will also just as a small aside,
- 12:24maybe a little on the personal side,
- 12:26but it's been striking to me in the past two days.
- 12:28So my husband broke his collarbone over the weekend.
- 12:31And it turns out the break is one where there's a little bit
- 12:35of debate about whether you should have surgery or not.
- 12:38Although kind of recent thinking is that
- 12:39there should be surgery.
- 12:40And I was doing a PubMed search as a good statistician
- 12:44public health person whose family member
- 12:47needs medical treatment.
- 12:49And I found all these randomized trials that actually
- 12:52randomized people to get surgery or not.
- 12:55And then I came home...
- 12:56Oh, no, I didn't come home, we were home all the time.
- 12:59I asked my husband later, I was like,
- 13:00would you ever agree to be randomized?
- 13:02Like right now, we are trying to make this decision about,
- 13:05should you have surgery or not.
- 13:07And would we ever agree to be randomized?
- 13:09And he's like, no, we wouldn't.
- 13:11We're gonna go with what the physician recommends
- 13:15and what we feel is comfortable.
- 13:16And it really just hit home for me at this point that
- 13:19the people who agree to be randomized or the context
- 13:22under which we can sort of randomize
- 13:26are sometimes fairly limited.
- 13:28And again, so partly what this body of research is trying
- 13:31to do is sort of think through what are the implications
- 13:33of that when we do wanna make population inferences.
- 13:38Make sense so far?
- 13:39I can't see faces, so hopefully.
- 13:43Okay.
- 13:47So,
- 13:48I will say a lot of my work in this area has actually,
- 13:50in part been just helping or trying to raise awareness
- 13:53of thinking about external validity bias.
- 13:56So some of the research in this area has been trying
- 14:00to understand how big of a problem is this.
- 14:03If maybe people don't agree to be in randomized trials
- 14:06very often,
- 14:07but maybe that doesn't really cause bias in terms
- 14:10of our population effect estimates.
- 14:12So what I've done in a couple of papers on these
- 14:15other sides on this slide is basically trying to formalize
- 14:18this and it's pretty intuitive, but basically we show,
- 14:22and I'm not showing you the formulas here.
- 14:24But intuitively, there will be bias in a population effect
- 14:28estimate essentially if participation in the trial
- 14:33is associated with the size of the impacts.
- 14:35So in particular,
- 14:38what I'll call the external validity bias.
- 14:39So,
- 14:40those Delta S terms kind of the bias
- 14:42due to the lack of representativeness
- 14:45is a function of the variation of the probabilities
- 14:48of participating in a trial,
- 14:50variation and treatment effects,
- 14:52and then the correlation between those things.
- 14:54So if constant...
- 14:56If we have treat constant treatment effects
- 14:58or the treatment effect is zero
- 14:59or is two for everyone, there's gonna be no external
- 15:02validity bias.
- 15:03It doesn't matter who is in our study.
- 15:06Or if there...
- 15:08If everyone has an equal probability of participating
- 15:10in the study, we really do have a nice random selection,
- 15:14then again, there's gonna be no external validity bias.
- 15:17Or if the factors that influence whether or not you
- 15:20participate in the study are independent of the factors
- 15:23that moderate treatment effects,
- 15:25again, there'll be no external validity bias.
- 15:29The problem is that we often have very limited information
- 15:32about these pieces.
- 15:34We, as a field, I think medicine, public health, education,
- 15:38all the fields I worked in, there has not been much
- 15:41attention paid to these processes of how we actually
- 15:44enroll people in studies.
- 15:46And so it's hard to know kind of what factors relate
- 15:49to those and if those then also moderate treatment effects.
- 15:53(phone ringing)
- 15:54Oops, sorry.
- 15:55Incoming phone call, which I will ignore.
- 15:58So,
- 15:59there has been...
- 16:01Sorry.
- 16:03There has been a little bit of work trying to document this
- 16:05in real data and find empirical evidence on these sizes.
- 16:11The problem, and sorry, some of the...
- 16:13Some of you might...
- 16:14If any of you are familiar with the, like,
- 16:16within what it's called the within study comparison
- 16:18literature.
- 16:19So there's this whole literature on non-experimental studies
- 16:23that sort of try to estimate the bias due to non-random
- 16:28treatment assignment.
- 16:30This is sort of analogous to that.
- 16:32But the problem here is that what you need is you need
- 16:34an accurate estimate of the impact in the population.
- 16:37And then you also need sort of estimates of the impact
- 16:40in samples that are sort of obtained in kind of typical
- 16:44ways.
- 16:45So that's actually really hard to do.
- 16:47So I'll just briefly talk through two examples.
- 16:49And if any of you have data examples that you think might
- 16:52sort of be useful for generating evidence,
- 16:55that would be incredibly useful.
- 16:57So one of the examples is...
- 17:00So let me back up for a second.
- 17:02In the field of mental health research,
- 17:03there's been a push recently, or actually not so much
- 17:06recently in the past, like 10, 15 years
- 17:08to do what I call or what are called pragmatic trials
- 17:12with the idea of enrolling much more...
- 17:16A much broader set of people use a broader set of practices
- 17:21or locations around the country.
- 17:23And so what this Wisniewski et al people did was they took
- 17:27the data from one of those large pragmatic trials.
- 17:29And the idea they...
- 17:30Again, the idea was that it should be more representative
- 17:33of people in this case with depression
- 17:35across the U.S.
- 17:37And then, they said, well, what if...
- 17:38In fact, we didn't have that.
- 17:40What if we use sort of our normal study inclusion
- 17:44and exclusion criteria, it's sort of been, we'd like subset,
- 17:47this pragmatic trial data to the people that we think
- 17:50would have been more typically included in a sort of more
- 17:53standard randomized trial.
- 17:55And sort of not surprisingly, they found that
- 17:58the people in the sort of what they call
- 17:59the efficacy sample, those sort of typical trial sample
- 18:03had better outcomes and larger treatment effects
- 18:05than the overall pragmatic trial sample as a whole.
- 18:10We did something similar sort of in education research where
- 18:15it's a little bit in the weeds.
- 18:16I don't really wanna get into the details,
- 18:18but we essentially had a pretty reasonable regression
- 18:22discontinuity design.
- 18:23So we were able to get estimates of the effects of this
- 18:26reading first intervention across a number of states.
- 18:30And we then compared those state wide impact estimates
- 18:34to the estimates you would get if we enrolled only
- 18:38the sorts of schools and school districts that are typically
- 18:41included in educational evaluations.
- 18:44And there we found that this external validity bias
- 18:48was about 0.1 standard deviations,
- 18:50which in education world is fairly large.
- 18:53Certainly people would be concerned about an internal
- 18:56validity bias of that size.
- 18:58So we were able to sort of use this to say, look,
- 19:00if we really wanna be serious about external validity,
- 19:03it might be as much of a problem as sort of typical internal
- 19:06validity bias that people care about in that field.
- 19:13So again, the problem though, is we don't usually
- 19:15have these sorts of designs where we have a population
- 19:17effect estimate, and then sample estimates,
- 19:19and we can compare them.
- 19:21And so instead we can sometimes try to get evidence on sort
- 19:24of the pieces.
- 19:25So, but again, we basically often have very little
- 19:28information on why people end up participating in trials.
- 19:31And we also are having,
- 19:34I think there's growing numbers of methods,
- 19:36but there's still limited information on treatment effect
- 19:39heterogeneity.
- 19:40Individual randomized trials are almost never powered
- 19:43to detect subgroup effects.
- 19:45Although, there is really growing research in this field
- 19:48and that is maybe a topic for another day.
- 19:52Okay.
- 19:53But again, there is a little...
- 19:55I think I'll go through this really quickly, but,
- 19:58I will give credit to some fields which are trying to better
- 20:01understand kind of who are the people that enroll in trials
- 20:04and how do they compare policy populations of interest.
- 20:08So a lot of that has been done in sort of the substance
- 20:11use field.
- 20:12And you can see a bunch of sites here
- 20:14documenting that people who participate in randomized trials
- 20:18of substance use treatment do actually differ quite
- 20:22substantially from people seeking treatment for substance
- 20:25use problems more generally.
- 20:27So for example, the Okuda reference the eligibility criteria
- 20:32in cannabis treatment RCTs would exclude about 80%
- 20:36of patients across the U.S. seeking treatment
- 20:38for cannabis use.
- 20:40And so again, it's sort of there's indications
- 20:43that the people that participate in trials
- 20:45are not necessarily reflective of the people
- 20:48for whom decisions are having to be made.
- 20:54Okay, so hopefully that at least kind of give some
- 20:57motivation for why we want to think more carefully
- 21:01about the population average treatment effect
- 21:04and why we might wanna think about designing studies
- 21:06or analyzing data in ways that help us estimate that.
- 21:10Any questions before I move to, how do we do that?
- 21:19Okay.
- 21:20I will end...
- 21:21I'm gonna hopefully end it at about 12:45, 1250,
- 21:24so we'll have time at the end, too.
- 21:27So, as a statistician, I feel obligated to say,
- 21:31and actually I have a quote on this at the very end
- 21:32of the talk.
- 21:33If we wanna be serious about estimating something,
- 21:36it's better to incorporate that through the design
- 21:38of our study, rather than trying to do it post talk
- 21:41at the end.
- 21:44So let's talk briefly about how we can improve external
- 21:47validity through study or randomized trial design.
- 21:52So again,
- 21:53as I alluded to earlier with the sort of ideal experiment.
- 21:56An ideal scenario is one where we can randomly sample
- 21:59from a population and then randomly assign treatment
- 22:02and control conditions.
- 22:04Doing this will give us a formerly unbiased treatment effect
- 22:07estimate in the population of interest.
- 22:10This is wonderful.
- 22:11I know of about six examples of this type.
- 22:17Most of the examples I know of are actually a federal
- 22:19government programs where they are administered through
- 22:23like centers or sites.
- 22:25And the federal government was able to mandate participation
- 22:28in an evaluation.
- 22:29So classic example is the Head Start Impact Study,
- 22:33where they were able to randomly select headstart centers
- 22:36to participate.
- 22:37And then within each center,
- 22:39they randomized kids to be able to get in off the wait list
- 22:42versus not.
- 22:44An upward bound evaluation had a very similar design.
- 22:48It's funny, I was...
- 22:50I gave a talk on this topic at Facebook and I was like,
- 22:52why is Facebook gonna care about this?
- 22:54Because you would think at a place like Facebook,
- 22:56they have their user sample,
- 22:59they should be able to do randomization within,
- 23:02like they should be able to pick users randomly
- 23:04and then do any sort of random assignment they want
- 23:06within that.
- 23:07It turns out it's more complicated than that, and so,
- 23:10they were interested in this topic,
- 23:12but I think that's another sort of example where people
- 23:15should be thinking, could we do this?
- 23:16Like,
- 23:18in a health system.
- 23:20I can imagine Geisinger or something implement something
- 23:22in their electronic health record where
- 23:24it's about messaging or something.
- 23:26And you could imagine actually picking people randomly
- 23:29to then randomize.
- 23:31But again, that's pretty rare.
- 23:33There's an idea that's called purpose of sampling.
- 23:35And this goes back to like the 1960s or 70s
- 23:39and the idea is sort of picking subjects purposefully.
- 23:44So one example here is like maybe we think
- 23:47that this intervention might look different
- 23:49or have different effects for large versus small
- 23:52school districts.
- 23:53So in our study, we just make an effort to enroll
- 23:56both large and small districts.
- 23:59This is sort of nice.
- 24:00It kind of gives you some variability in the types of people
- 24:05or subjects in the trial, but, it doesn't have the formal
- 24:09representativeness and sort of the formal unbiasness,
- 24:12like the random sampling I just talked about.
- 24:15And then again, sort of similar is this idea and this push
- 24:17in many fields towards pragmatic or practical clinical
- 24:20trials, where the idea is just to sort of try to enroll
- 24:24like kind of more representative sample
- 24:27in sort of a hand wavy way like I'm doing now.
- 24:29So not, it doesn't have this sort of formal statistical
- 24:31underpinning, but at least it's trying to make sure
- 24:35that it's not just patients from the Yale hospital
- 24:38and the Hopkins hospital and whatever sort of large medical
- 24:41centers, at least they might be trying to enroll patients
- 24:45from a broader spectrum across the U.S.
- 24:49Unfortunately, though, as much as I want to do things
- 24:53for design often, we're in a case where there's a study
- 24:56that's already been conducted and we are just
- 25:00sort of stuck analyzing it.
- 25:01And we wanna get a sense for how representative
- 25:04the results might be for a population.
- 25:09Sometimes people, when I talk about this,
- 25:10people are like, well, isn't this what meta-analysis does?
- 25:13Like meta-analysis enables you to combine multiple
- 25:16randomized trials and come up with sort of an overall
- 25:20effect estimate.
- 25:23And my answer to that is sort of yes maybe, or no maybe.
- 25:26Basically, the challenge with meta-analysis,
- 25:30is that until recently, no one really had a potential target
- 25:34population.
- 25:35It was not very formal about what the target population is.
- 25:38I think underlying that analysis is generally
- 25:41sort of a belief that the effects are constant
- 25:44and we're just trying to pool data.
- 25:48And it...
- 25:48And even just like, you can sort of see this,
- 25:50like if all of the trials sampled the same
- 25:52non-representative population,
- 25:54combining them is not going to help you get towards
- 25:57representativeness.
- 25:59That's that I have a former Postdoc Hwanhee Hong,
- 26:01who's now at Duke.
- 26:03And she has been doing some work to try to bridge
- 26:06these worlds and sort of really try to think through,
- 26:08well, how can we better use multiple trials
- 26:12to get to target population effects?
- 26:16There's another field it's called risk cross-design
- 26:18synthesis or research synthesis.
- 26:21This is sort of neat.
- 26:22It's one where you kind of combine randomized trial data,
- 26:26which might be not representative with non-experimental
- 26:30study data.
- 26:31So sort of explicitly trading off the internal and external
- 26:34validity.
- 26:36I'm not gonna get into the details,
- 26:37there's some references here.
- 26:38Ellie Kaizar at Ohio State, is one of the people
- 26:41that's done a lot of work on this.
- 26:45And part of the reason I'm not focused on this is that
- 26:48I work in a lot of areas like education and public health,
- 26:53sort of social science areas,
- 26:54where we often don't have multiple studies.
- 26:56So we often are stuck with just one study and we're trying
- 27:00to use that to learn about target populations.
- 27:04So I'm gonna briefly talk about an example
- 27:07where we trying to sort of do this.
- 27:12And basically, the fundamental idea is to re-weight
- 27:16the study sample to look like the target population.
- 27:21This idea is related to post stratification
- 27:25or, oh my gosh, I'm blanking now.
- 27:27Raking adjustments in surveys.
- 27:31So post stratification would be sort of at a simple level,
- 27:33would be something like...
- 27:35Well, if we know that males and females
- 27:38have different effects, or let's say young and old
- 27:41have different effects, let's estimate the effects
- 27:44separately for young versus old.
- 27:47And then re-weight those using the population proportions
- 27:51of sort of young versus old.
- 27:54That sort of stratification doesn't work if you have more
- 27:58than like one or two categorical effect moderators.
- 28:02And so,
- 28:03what I'm gonna show today is an approach where we use
- 28:06weighting, where we fit a model,
- 28:08predicting participation in the trial,
- 28:10and then weight the trial sample to look like the target
- 28:13population.
- 28:14So similar idea to things like propensity score weights
- 28:17or non-response adjustment weights in samples.
- 28:21There is a different approach,
- 28:23So what I'm gonna illustrate today is sort of this sample
- 28:27selection weighting strategy.
- 28:29You also can tackle this external validity
- 28:32by trying to model the outcome very flexibly
- 28:35and then project outcomes in the population.
- 28:40In some work I did with Jennifer Hill and others,
- 28:43we showed that BARTs, Bayesian Additive Regression Trees
- 28:46can actually work quite well for that purpose.
- 28:49And more recently, Issa Dahabreh at Brown has done some
- 28:53nice work sort of bridging these two and showing
- 28:55basically a doubly robust kind of idea where we can use
- 28:58both the sample membership model and the outcome model
- 29:04to have better performance.
- 29:06But today, I'm gonna just illustrate the weighting approach,
- 29:08partly because it's a really nice sort of pedagogical
- 29:11example and helps you kind of see what's going on
- 29:14in the data.
- 29:16Okay, any questions before I continue?
- 29:21Okay.
- 29:22So the example I'm gonna use is...
- 29:26There was this, I mean, some of you probably know much more
- 29:28about HIV treatment than I do, but the ACTG Trial,
- 29:33which was now quite an old trial,
- 29:36but it was one of the ones that basically showed that
- 29:39HAART therapy, highly active antiretroviral therapy
- 29:42was quite effective at reducing time to AIDS or death
- 29:46compared to standard combination therapy at the time.
- 29:49So it randomized about 1200 U.S. HIV positive adults
- 29:54to treatment versus control.
- 29:56And the intent to tree analysis in the trial
- 29:59had a hazard ratio of 0.51.
- 30:01So again, very effective at reducing time to AIDS or death.
- 30:07So Steve Cole and I though kind of asked the question, well,
- 30:10we don't necessarily just care about the people
- 30:13in the trial.
- 30:14This seems to be a very effective treatment.
- 30:16What could we use this data to project out
- 30:19sort of what the effects of the treatment would be
- 30:22if it were implemented nationwide?
- 30:25So we from CDC got estimates of the number of people
- 30:28newly infected with HIV in 2006.
- 30:32And basically, asked the question sort of if hypothetically,
- 30:35everyone in that group were able to get HAART versus
- 30:40standard combination therapy,
- 30:42what would be the population impacts of this treatment?
- 30:48In this case, because of sort of data availability,
- 30:50we only had the joint distribution of age, sex and race
- 30:55for the population.
- 30:56So we made sort of a pseudo population, again,
- 30:59sort of representing the U.S. population
- 31:02of newly infected people.
- 31:03But again, all we have is sex, race and age,
- 31:06which I will come back to.
- 31:08So this table documents the trial and the population.
- 31:12So you can see for example,
- 31:15that the trial tended to have more sort of 30 to 39 year
- 31:20olds, many fewer people under 30.
- 31:25The trial had more males and also had more whites
- 31:29and fewer blacks, Hispanic was similar.
- 31:32But I wanna flag and we'll come back to this in a minute
- 31:35that, in what I'm gonna show,
- 31:38we can adjust for the age, sex, race distribution.
- 31:41But, there's a real limitation,
- 31:43which is that the CD4 cell count as sort of a measure
- 31:46of disease severity is not available in the population.
- 31:50So this is a potential effect moderator,
- 31:53which we don't observe in the population.
- 31:56So in sort of projecting the impacts, we can say, well,
- 31:59here is the predicted impact given the age, sex,
- 32:03race distribution, but there's this unobserved
- 32:06potential effect moderator that we sort of might be worried
- 32:09about kind of in the back of our heads.
- 32:15So again, I briefly mentioned this,
- 32:17this is like the super basic description
- 32:20of what can be done.
- 32:22There are more nuances and I have some sites at the end
- 32:24for sort of more details.
- 32:26But basically fundamentally will, again,
- 32:28we sort of think about it as we kind of stack
- 32:30our data sets together.
- 32:31So we put our trial sample and our population data set
- 32:34together.
- 32:35We have an indicator for whether someone is in the trial
- 32:38versus the population.
- 32:40And then, we're gonna wait the trial members
- 32:43by their inverse probability of being in the trial
- 32:46as a function of the observed covariance.
- 32:48And again, very similar intuition and ideas
- 32:51and theory underlying this as underlying things
- 32:55like Horvitz-Thomson estimation in sample surveys
- 32:58and inverse probability of treatment waiting
- 33:01in non-experimental studies.
- 33:06So I showed you earlier that age, sex and race
- 33:09are all related to participation in the trial.
- 33:13What I'm not showing you the details of,
- 33:15but just trust me is that those factors also moderate
- 33:19effects in the trial.
- 33:20So the trial showed the largest effects for those ages,
- 33:2430 to 39, males and black individuals.
- 33:28And so, this is exactly why then what we might think
- 33:31that the overall trial estimate might not reflect
- 33:34what we would see population-wide.
- 33:39Ironically though, it turns out actually
- 33:40it kind of all cancels out.
- 33:41So this table shows the estimated population effects.
- 33:45So the first row again, is just the sort of naive trial
- 33:48results.
- 33:50We can then sort of weight by each characteristic
- 33:52separately, and then the bottom row is the combined
- 33:56age, sex, race adjustments.
- 33:58And you can see sort of actually the hazard ratio
- 34:01was remarkably similar.
- 34:03It's partly because like the age weightings
- 34:05sort of makes the impact smaller,
- 34:07but then the race weighting makes it bigger.
- 34:10And so then it kind of just washes out.
- 34:13But again, it's sort of a nice example,
- 34:15cause you can sort of see how the patterns
- 34:17evolve based on the size of the effects
- 34:20and the sample selection.
- 34:23I also wanna point out though that, of course,
- 34:25the confidence interval is wider,
- 34:27and that is sort of reflecting the fact that we are doing
- 34:30this extrapolation from the trial sample to the population.
- 34:33And so there's sort of a variance price we'll pay for that.
- 34:39Okay.
- 34:40So I haven't been super formal on the assumptions,
- 34:44but I'm I alluded to this?
- 34:45So I wanna just take a few minutes to turn
- 34:48to what about unobserved moderators?
- 34:50Because again, we can interpret this 0.57
- 34:54as the sort of overall population effect estimate
- 34:58only under an assumption that there are no unobserved
- 35:01moderators that differ between sample and population,
- 35:06once we adjust for age, sex, race.
- 35:11Okay, and in reality,
- 35:14such unobserved effect moderators are likely the rule,
- 35:17not the exception.
- 35:18So again, sort of, as I just said,
- 35:20the key assumption is that we've basically adjusted
- 35:23for all of the effect moderators.
- 35:26Very kind of comparable assumption to the assumption
- 35:30of no an observed confounding in a non-experimental study.
- 35:35And one of the reasons this is an important assumption
- 35:38to think about, is that, it is quite rare actually
- 35:42to have extensive covariate data overlap
- 35:46between the sample and the population.
- 35:48I have been working in this area for...
- 35:51How many years now?
- 35:52At least 10 years.
- 35:53And I've found time and time again,
- 35:56across a number of content areas,
- 35:58that it is quite rare to have a randomized trial sample
- 36:01and the target population dataset
- 36:03with very many comparable measures.
- 36:06So in the Stuart and Rhodes paper,
- 36:08this was in like early childhood setting
- 36:12and each data set, the trial and the population data
- 36:15had like over 400 variables observed at baseline.
- 36:19There were literally only seven that were measured
- 36:22consistently between the two samples.
- 36:25So essentially we have very limited ability then to adjust
- 36:28for these factors because they just don't have much overlap.
- 36:32So what that then motivated us to create some sensitivity
- 36:37analysis to basically probe and say, well,
- 36:40what if there is an unobserved effect moderator,
- 36:43how much would that change our population effect estimate?
- 36:47Again, this is very comparable to analysis of sensitivity,
- 36:51to unobserved confounding and non-experimental studies
- 36:54sort of adapted for this purpose of trial population,
- 36:59generalized ability.
- 37:03I think I can skip this in the interest of time and not go
- 37:06through all the details.
- 37:07If anyone wants the slides by the way,
- 37:08feel free to email me, I'm happy to send them.
- 37:13I'm gonna skip this too cause I've already said
- 37:15sort of the key assumption that is relevant for right now,
- 37:19but basically what we propose is,
- 37:24I'm gonna talk about two cases.
- 37:26So the easier case is this one where we're gonna assume
- 37:29that the randomized trial observes all of the effect
- 37:32moderators.
- 37:33And the issue is that our target population dataset
- 37:36does not have some moderators observed.
- 37:41I think this is fairly realistic because at least
- 37:43like to think that the people running the randomized trials
- 37:47have enough scientific knowledge and expertise
- 37:50that they sort of know what the likely effect moderators
- 37:52are and that they measure them in the trial.
- 37:55That is probably not fully realistic, but I'm...
- 37:58I like to give them sort of the benefit of the doubt
- 38:00on that.
- 38:01And that sort of that's what the ACTG example,
- 38:05was like CD4 count would be an example of this,
- 38:07where we have CD4 count in the trial,
- 38:11but we just don't have it in the population.
- 38:14So what we showed is that there's actually,
- 38:16a couple of different ways you can implement
- 38:18this sort of sensitivity analysis.
- 38:22One is essentially kind of an outcome model based one
- 38:25where you,
- 38:28basically, we just sort of specify a range
- 38:30for the unobserved moderator V in the population.
- 38:34So we kind of say, well, we don't know
- 38:36the distribution of this moderator in the population,
- 38:40but we're gonna guess that it's in some range.
- 38:43And then, we kind of projected out using data from the trial
- 38:48to understand like the extent of the moderation
- 38:51due to that variable.
- 38:53There's another variation on this,
- 38:55which is sort of the weighting variation
- 38:58where you kind of adjust the weights,
- 39:00essentially again for this unobserved moderator.
- 39:03Again, either way you sort of basically just have to specify
- 39:07a potential range for this V, the unobserved moderator
- 39:11in the population.
- 39:14So here's an example of that.
- 39:16This is a different example, where we were looking
- 39:18at the effects of a smoking cessation intervention
- 39:21among people in substance use treatment.
- 39:24And in the randomized trial, the mean addiction score
- 39:31was four.
- 39:33But we didn't have this addiction score,
- 39:35in the target population of interest.
- 39:37And so, what the sensitivity analysis allows us to do
- 39:40is to say, well, let's imagine that range is anywhere
- 39:44from three to five.
- 39:45And how much does that change our population effect
- 39:49estimates?
- 39:51Essentially, how steep this line is, is gonna be
- 39:54sort of determine how much it matters.
- 39:57And the steepness of the line basically
- 39:59is how much of a moderator is it,
- 40:02sort of how much effect heterogeneity is there in the trial
- 40:05as a result of that variable.
- 40:07But again, this is at least one way to sort of turn
- 40:11this sort of worry about an unobserved moderator
- 40:13into a more formal statement about how much
- 40:16it really might matter.
- 40:21I'm not gonna get into this partly,
- 40:22so you might also be thinking, well,
- 40:24what if the trial doesn't know what all the moderators are?
- 40:27And what if there's some fully unobserved moderator
- 40:31that will call U?
- 40:34This is a much much harder, basically,
- 40:36if anyone wants to try to dig into it, that would be great.
- 40:39Part of the reason it's harder is because you have to make
- 40:42very strong assumptions about the distribution
- 40:44of the observed covariance and U together.
- 40:48We put out one approach,
- 40:49but it is a fairly special case and not very general.
- 40:53So again, hopefully we're not in this sort of scenario
- 40:56very often.
- 41:01This is a little bit of a technicality,
- 41:03but often epidemiologists ask this question.
- 41:05So I've laid stuff out again with respect to kind of a risk
- 41:09difference or a difference in outcomes
- 41:12and sort of like more of like an additive treatment scale.
- 41:15There is this real complication that arises,
- 41:17which is that if you have like a binary,
- 41:20like the scale of the outcome matters in terms of effect
- 41:25moderation.
- 41:26And in particular, there might be sort of more apparent
- 41:30effect heterogeneity on one scale versus another.
- 41:33So I'm just kind of flagging this, that like this exists,
- 41:37there are some people sort of looking at this in more
- 41:39formal, but again for now sort of just think about like risk
- 41:44difference kind of scale.
- 41:47Okay, great.
- 41:48So let me just conclude with a few kind of final thoughts.
- 41:51So, I think all of us, not all of us,
- 41:54but often we sort of want to assume that study results
- 41:58generalize.
- 41:58Often people write a discussion section in a paper,
- 42:01where they kind of qualitatively have some sentences
- 42:05about why they do or don't think that the results
- 42:08in this paper kind of extend to other groups
- 42:10or other populations.
- 42:13But I think until the past again, sort of five or so years,
- 42:16a lot of that discussion was very hand-wavy
- 42:19and sort of qualitative.
- 42:21I think that what we are seeing in epidemiology
- 42:24and statistics and bias statistics
- 42:26recently has been a push towards having more
- 42:29ability to quantify this and make it sort of more formal
- 42:33statements.
- 42:35So I think if we do wanna be serious though,
- 42:37about assessing and enhancing external validity,
- 42:41again, we really need these different pieces.
- 42:43We need information on the factors that influence effect
- 42:46heterogeneity the moderators.
- 42:49We need information on the factors that influence
- 42:51participation in rigorous studies like randomized trials.
- 42:55And we need data on all of those things,
- 42:57in the trial and the population.
- 43:00And then finally, we need statistical methods that allow us
- 43:04to use that data to estimate population treatment effects.
- 43:08I would argue that that last bullet is sort of much further
- 43:12along than any of the others.
- 43:13That in my experience,
- 43:15the limiting factor is usually not the methods.
- 43:19The limiting factor at this point in time is the data
- 43:22and sort of the scientific knowledge
- 43:25about these different factors.
- 43:29And that's what this slide is.
- 43:30So I think I've already said, but that again,
- 43:33is sort of one of the motivations for the sensitivity
- 43:35analysis is just a recognition that it's often,
- 43:39really quite hard to get data that
- 43:42is consistently measured between a trial and a population.
- 43:47So on that point, recommendations again,
- 43:49if we wanna be serious about effect heterogeneity
- 43:51or about estimating population treatment effects,
- 43:55we need better information on treatment effect heterogeneity
- 43:59that might be better analysis of existing trials,
- 44:02that might be meta-analysis of existing trials.
- 44:05That might also be theoretical models for the interventions
- 44:07to understand what the likely moderators are.
- 44:12We also need better information on the factors
- 44:14that influence participation in trials and more discussion
- 44:17of how trial samples are selected.
- 44:22We need to standardize measures.
- 44:23So again, it's incredibly frustrating when you have trial
- 44:26and population data, but the measures in them are not
- 44:30consistent.
- 44:31There are methods that can be used for this,
- 44:33some data harmonization approaches,
- 44:36but, they require assumptions.
- 44:39It's better if we can be thoughtful and strategic about,
- 44:42for example, common measures across studies.
- 44:45I will say one of the frustrations too,
- 44:47is that in some fields like the early childhood data
- 44:51I talked about,
- 44:52part of the problem was like the two data sets might
- 44:55actually have the same measure,
- 44:56but they didn't give the raw data,
- 44:58and they're like standardized scales differently.
- 45:01Like they standardized them to their own population,
- 45:03not sort of more generally.
- 45:05And so they, weren't sort of on the same scale in the end.
- 45:10As a statistician, of course, I will say we do need more
- 45:12research on the methods and understanding when they work
- 45:15and when they don't.
- 45:16There are some pretty strong assumptions
- 45:19in these approaches.
- 45:20But again, I think that sort of in some ways,
- 45:24that is further along and then some of the data situations.
- 45:29So I just wanted to take one minute to flag some current
- 45:32work in case partly if anyone wants to ask questions about
- 45:34these.
- 45:36One thing I'm kind of excited about,
- 45:38especially in my education world is...
- 45:42So what I've been talking about today has mostly been,
- 45:44if we have a trial sample and we wanna project
- 45:46to kind of a larger target population.
- 45:49But there's an equally interesting question,
- 45:51which is sort of how well can randomized trial informs
- 45:54or local decision making?
- 45:56So if we have a randomized trial with 60 schools in it,
- 46:01how well can the results from that trial be used to inform
- 46:04individual school districts decisions?
- 46:07Turns out, not particularly well.
- 46:09(laughs)
- 46:10We can talk more about that.
- 46:12I mentioned earlier, Issa Dahabreh, who's at Brown,
- 46:15and he's really interested in developing sort of the formal
- 46:18theories underlying different ways of estimating
- 46:21these population effects, again, including some
- 46:23doubly robust approaches.
- 46:26Trang Nguyen, who works at Hopkins with me,
- 46:29we are still looking at sort of the sensitivity analysis
- 46:32for unobserved moderators.
- 46:34I mentioned Hwanhee Hong already, who's now at Duke.
- 46:37And she, again, sort of straddles the meta-analysis world
- 46:40in this world, which has some really interesting
- 46:43connections.
- 46:45My former student now he's at Flatiron Health
- 46:48as of a few months ago.
- 46:50Ben Ackerman, did some work on sort of measurement error
- 46:53and sort of partly how to deal with some of these
- 46:55measurement challenges between the sample and population.
- 47:00And then I'll just briefly mention Daniel Westreich at UNC,
- 47:04who is really...
- 47:05If you come from sort of more of an epidemiology world,
- 47:09Daniel has some really nice papers that are sort of trying
- 47:11to translate these ideas to epidemiology,
- 47:14and this concept of what he calls target validity.
- 47:17So sort of rather than thinking about internal and external
- 47:20validity separately, and as potentially,
- 47:23in kind of conflict with each other,
- 47:26instead really think carefully about a target of inference
- 47:29and then thinking of internal and external validity
- 47:31sort of within that and not sort of trying to prioritize
- 47:35one over the other.
- 47:37And then just an aside, one thing,
- 47:40I would love to do more in the coming years is thinking
- 47:43about combining experimental and non-experimental evidence.
- 47:46I think that is probably where it would be very beneficial
- 47:49to go instead of more of that cross designed synthesis
- 47:52kind of idea.
- 47:55But again, I wanna conclude with this,
- 47:57which is gets us back to design and that again,
- 48:01sort of what is often the limiting factor here is the data
- 48:04and just sort of strong designs.
- 48:07So Rubin, 2005 with better data, fewer assumptions
- 48:10are needed and then Light, Singer and Willett,
- 48:13who are sort of big education methodologists.
- 48:16You can't fix by analysis what you've bungled by design.
- 48:19So again, just wanna highlight that if we wanna be serious
- 48:22about estimating population effects,
- 48:24we need to be serious about that in our study designs,
- 48:27both in terms of who we recruit,
- 48:30but then also what variables we collect on them.
- 48:32But if we do that,
- 48:33I think that we can have the potential to really help guide
- 48:37policy and practice by thinking more carefully
- 48:39about the populations that we care about.
- 48:43So for more...
- 48:44Here's this, there's my email, if you wanna email me
- 48:47for the slides.
- 48:49And thanks to various funders, and then I'll leave this up
- 48:53for a couple minutes,
- 48:55which are all big, tiny font, some of the references,
- 48:59but then I'll take that down in a minute so that we can see
- 49:01each other more.
- 49:02So thank you, and I'm very happy to take some questions.
- 49:14I don't know if you all have a way to organize
- 49:16or people just can
- 49:19jump in.
- 49:24- So maybe I'll ask the question.
- 49:25Thanks Liz, for this very interesting and great talk.
- 49:29So I noticed that you've talked about the target population
- 49:34in this framework.
- 49:35And I think there are situations where the population sample
- 49:39is actually a survey from a larger population.
- 49:43- Yeah.
- 49:44- Cause we do not really afford to absorb everything,
- 49:47actual population, which will contain
- 49:49like millions of individuals.
- 49:50And so in that situation, does the framework still apply
- 49:55particularly in terms of the sensitivity analysis?
- 49:58And is there any caveat that we should also know in dealing
- 50:01with those data?
- 50:03- Great question.
- 50:05And actually, thank you for asking that because I forgot
- 50:07to mention that Ben Ackerman's dissertation,
- 50:10also looked at that.
- 50:11So I mentioned his measurement error stuff.
- 50:13But yes, actually, so Ben's second dissertation paper
- 50:17did exactly that, where we sort of laid out the theory
- 50:21for when these the target population data
- 50:24comes from a complex survey itself.
- 50:29Short answer is yes, it all still works.
- 50:31Like you have to use the weights, there are some nuances,
- 50:34but, and you're right, like essentially,
- 50:36especially like in...
- 50:38Like for representing the U.S. population, often, the data
- 50:41we have is like the National Health Interview Survey
- 50:44or the Add Health Survey of Adolescents,
- 50:47which are these complex surveys.
- 50:49So short answer is, yeah, it still can work.
- 50:53Your question about the sensitivity analysis is actually
- 50:55a really good one and we have not extended...
- 50:58I'd have to think, I don't know, off hand, like,
- 51:00I think it would be sort of straightforward to extend
- 51:04the sensitivity analysis to that, but we haven't actually
- 51:07done it.
- 51:08- Thanks Liz.
- 51:11The other short question is that I noticed that
- 51:12in your slide, you first define, PATE as population ate,
- 51:16but then in one slide you have this Tate,
- 51:19which I assume is target ate.
- 51:21And so, I'm just really curious as to like, is there any,
- 51:25like differences or nuances in the choice of this
- 51:27terminology?
- 51:29- Good question.
- 51:30And no, yeah, I'm not...
- 51:31I wasn't very precise with that, but in my mind, no.
- 51:35Over time I've been trying to use Tate,
- 51:38but you can see that kind of just by default,
- 51:40I still sometimes use PATE.
- 51:43Part of the reason I use Tate is because I think
- 51:46the target is just a slightly more general term.
- 51:48Like people sometimes I think, think if we meet,
- 51:50if we say PATE, the population has to be like
- 51:53the U.S. population or some like very sort of big,
- 51:58very official population in some sense.
- 52:01Whereas, the target average treatment effect,
- 52:04Tate terminology, I think reflects that sometimes
- 52:06it's just a target group that's well-defined.
- 52:10- Gotcha.
- 52:11Thanks, that's very helpful.
- 52:12And I think we have a question coming from the chat as well.
- 52:15- Yeah, I just saw that.
- 52:16So I can read that.
- 52:17We have theory for inference from a sample to a target
- 52:20population needs to find that internal validity approaches,
- 52:23what theory is there for connecting the internal validity
- 52:25methods to external validity?
- 52:29So I think, what you mean is sort of,
- 52:33what is the formal theory for projecting the impact
- 52:37to the target population?
- 52:38That is exactly what some of those people that I referenced
- 52:41sort of lay out.
- 52:42Like I didn't...
- 52:42For this talk, I didn't get into all the theoretical weeds,
- 52:45but if you're interested in that stuff,
- 52:46probably some of Issa Dahabreh's work would be the most
- 52:49relevant to look at.
- 52:51Cause he really lays out sort of the formal theory.
- 52:54I mean, some of my early papers on this topic did it,
- 52:58but his is like a little bit more formal and sort of makes
- 53:01connections to the doubly robust literature
- 53:04and things like that.
- 53:04And so it's really...
- 53:06Anyway, that's what this whole literature
- 53:08and part of it is sort of building is that theoretical base
- 53:11for doing this.
- 53:17Any other questions?
- 53:28- [Ofer] Liz,
- 53:29I'm Ofer Harel.
- 53:30- Oh, hi Ofer?
- 53:31- [Ofer] Hi.
- 53:33(mumbles)
- 53:34Just jump on the corridor, so it's make it great.
- 53:39So in most of the studies that I would work on,
- 53:43they don't do really have a great idea about
- 53:46what really the population is and how really to measure
- 53:50those.
- 53:51So it's great if I have some measure of the population,
- 53:54but most of the time it is the studies that I work.
- 53:57I have no real measurements on that population.
- 54:02What happens then?
- 54:03- Yeah, great question.
- 54:04And in part, I meant to say this,
- 54:06but that's one of the reasons why the analogy...
- 54:08Why the design strategies don't always work particularly
- 54:10well is like, especially when you're just starting out
- 54:13a study, right?
- 54:14We don't really know the target population.
- 54:17I think certainly to do any of these procedures,
- 54:21you need eventually to have a well defined population.
- 54:25But I think that's partly why some of the analysis
- 54:27approaches are useful is that,
- 54:29you might have multiple target populations.
- 54:31Like we might have one trial,
- 54:33and we might be interested in saying,
- 54:35how well does this generalize to the State of New Hampshire
- 54:39or the State of Vermont or the State of Connecticut?
- 54:41And so, you could imagine one study that's used to inform
- 54:45multiple target populations.
- 54:48With different assumptions,
- 54:49sort of you have to think through the assumptions
- 54:50for each one.
- 54:52If you don't even,
- 54:54I guess I would say if you don't even know
- 54:56who your population is, you shouldn't be using these methods
- 54:59at all, cause like the whole premise is that there is some
- 55:02well-defined target population and you do need data on it
- 55:05or at least...
- 55:07Yeah, the joint distribution of some covariance
- 55:09or something.
- 55:10Without that, you're kind of just like,
- 55:13I don't know, what a good analogy is,
- 55:15but you're kinda just like guessing at everything.
- 55:24(mumbles)
- 55:26- No, go ahead.
- 55:27Go ahead.
- 55:29- Oh, Vinod, yeah.
- 55:30All my friends are popping up, it's great.
- 55:32(laughs)
- 55:34- [Vinod] Can I go ahead?
- 55:35I feel like I'm talking to someone.
- 55:39- Yeah, go ahead Vinod.
- 55:40- [Vinod] That was a great talk.
- 55:42So I have a little ill formulated question,
- 55:44but it's queuing after just the last question
- 55:47that was asked is,
- 55:49in clinical set populations where,
- 55:55in some ways we're using this clinical samples
- 55:58to learn about the population because unless they seek help,
- 56:02we often don't know what they are in the wild, so to speak.
- 56:05And so, each sampling of that clinical population
- 56:09is a maybe by sampling of that larger population
- 56:13in the wild.
- 56:14So I guess my question is, how do you get around this,
- 56:18I guess Rumsfeld problem, which is every time you sample
- 56:22there's this unknown, unknown, but there's no way to get
- 56:24at them because in some ways, your sampling relies on...
- 56:27If we could say it relies on help seeking,
- 56:30which is by itself as process.
- 56:33And if we could just stipulate, there's no way to get
- 56:35around that.
- 56:36How do you see this going forward?
- 56:40- Yeah, good question.
- 56:40I think right, particularly relevant in mental health
- 56:43research where there's a lot of people who are not seeking
- 56:46treatment.
- 56:47These methods are not gonna help with that in a sense
- 56:50like again, they are gonna be sort of tuned to whatever
- 56:53population you have.
- 56:55I think though there are...
- 56:57If you really wanna be thoughtful about that's
- 57:00problem, that's where sort of some of the strategies
- 57:03that were used like the Epidemiologic Catchment Area
- 57:05Surveys, where they would go door to door and knock on doors
- 57:08and do diagnostic interviews.
- 57:11Like if we wanna be really serious about trying to reach
- 57:14everyone and get an estimate of the really sort of true
- 57:17population, then we really have to tackle that
- 57:20very creatively and with a lot of resources probably.
- 57:25- [Vinod] Thanks.
- 57:27- Welcome.
- 57:29- Hi Liz?
- 57:30Yeah, it's gonna be a true question and great talk
- 57:33by the way.
- 57:35I'm curious, you mentioned there could be a slight
- 57:38difference between the terms transportability
- 57:40and generalizability.
- 57:41Yeah, I'm curious about that.
- 57:43- Yeah, briefly, this is a little bit of a...
- 57:48What's the word?
- 57:48Simplification, but briefly I think of generalizability
- 57:51as one where the sample that, like the trial sample
- 57:55is a proper subset of the population.
- 57:57So we do a trial in New Hampshire,
- 58:01and we're trying to generalize to new England.
- 58:04Whereas transportability is one where it is not a proper
- 58:08subset, so we do a trial in the United States
- 58:10and we wanna transport to Europe.
- 58:14Underlying both, the reason I don't worry too much about it,
- 58:17the terms is because either way,
- 58:19the assumption is essentially the same.
- 58:21Like you still have to make this assumption about
- 58:23no unobserved moderators.
- 58:25It's just that it's probably gonna be a stronger assumption
- 58:28and harder to believe,
- 58:30when transporting rather than when generalizing.
- 58:33Cause you sort of know that you're going from one place
- 58:36to another in some sense.
- 58:39- Thanks, makes sense.
- 58:41- Sure.
- 58:43- I think there's another question in the chat.
- 58:45- Yeah, so this is a great question.
- 58:46I'm glad shows you on.
- 58:48I hope I got that.
- 58:50It seems there are multiple ways to calculate the Tate
- 58:53from standardization to waiting to the outcome model.
- 58:55Do you have comments for their performance under different
- 58:57circumstances?
- 58:58Great question, and I don't.
- 59:01I mean, there has been...
- 59:02This is an area where I think
- 59:04it'd be great to have more research on this topic.
- 59:06So I have this one paper with Holger Kern and Jennifer Hill
- 59:09where we sort of did try to kind of explore that.
- 59:14And honestly, what we found not surprisingly
- 59:16is that if that no unmeasured moderator assumption holds,
- 59:20all the different methods are pretty good and fine.
- 59:23And like, we didn't see much difference in them.
- 59:25If that no unobserved moderator assumption doesn't hold
- 59:28then of course, none of them are good.
- 59:29So it sort of is like similar to propensity score world.
- 59:33Like, the data you have is more important than what you do
- 59:35with the data in a sense.
- 59:38But anyway, I think that that is something that like,
- 59:40we need a lot more work on.
- 59:42One thing, for example, I do have a student working on this.
- 59:45Like, we're trying to see if your sample
- 59:47is a tiny proportion of the population, like how...
- 59:51Cause like there's different.
- 59:52That's one where like waiting might not work as well
- 59:54actually, who knows.
- 59:56Anyways, so like all of these different data scenarios,
- 59:58I think need a lot more investigation to have better
- 01:00:01guidance on when the different methods work well.
- 01:00:09Anything else or maybe we're out of time?
- 01:00:11I don't know, how tight you are at one o'clock.
- 01:00:20- I think we're at an hour, so let's...