BIS Seminar: Dealing with observed and observed effect moderators wehn estimating population average treatment effects

September 22, 2020

Information

Elizabeth Stuart

Associate Dean for Education

Bloomberg PRofessor of American Health

September 22, 2020

ID5661

To CiteDCA Citation Guide

00:00- Maybe one or two minutes and then,
00:02I'll have you introduced.
00:03- And it's about, and so I...
00:05And it's gonna be more fun for me if it's a little
00:07interactive, as much as we can make it.
00:09So I won't be able to see all of you nodding and whatnot,
00:12but please feel free to jump in.
00:15And the talk's gonna be pretty non-technical.
00:17My goal is mostly to sort of help
00:19convey some of the concepts and ideas and so I will.
00:23Hopefully it will be a reasonable topic to do via Zoom.
00:30Great, so I think,
00:33Frank basically gave this stuff that's relevant
00:36on this slide.
00:37I do also wanna apologize, those of you guys
00:39who I was supposed to meet with this morning, we have a...
00:41My husband broke his collarbone over the weekend.
00:44So I've had to cancel things this morning,
00:47but I'm glad I'm able to still do this seminar,
00:51I didn't wanna,
00:52have to cancel that.
00:54So again,
00:56the topic is gonna be sort of this idea of external
00:59validity, which I think is a topic that people often
01:01are interested in because it's the sort of thing
01:04that we often think sort of qualitatively about,
01:06but there hasn't been a lot of work thinking about it
01:08quantitatively.
01:09So again, my goal today will be to sort of help
01:11give a framework for thinking about external validity
01:15in sort of a more formal way.
01:19So let's start out with the sorts of questions
01:22that might be relevant when you're thinking about
01:25external validity.
01:27So it might be research questions like a health insurer
01:30is deciding whether or not to approve some new treatment
01:34for back pain.
01:36There might be interested predicting overall population
01:39impacts of a broad public health media campaign.
01:43A physician practice might be deciding whether training
01:46providers in a new intervention would actually be cost
01:49effective given the patient population that they have.
01:53And that I felt like I needed to get some COVID
01:55example in...
01:57But, for example, a healthcare system,
01:59might wanna know whether it's sort of giving convalescent
02:02plasma to all of the individuals recently diagnosed
02:06with COVID-19 in their system, whether that would
02:08sort of lead to better outcomes overall.
02:12So all of these...
02:15What I'm distinguishing here or sort of trying to convey
02:17is that all of these reflect what I will call a population
02:20average treatment effect.
02:22So across some well-defined population,
02:25does some intervention work sort of on average.
02:28The population might be pretty narrow.
02:30Again, it might be the patients in one particular
02:33physician practice, or might be quite broad.
02:35It could be everyone in the State of Connecticut
02:38or in the entire country.
02:40But either way, it's a well-defined kind of population
02:44and we'll come back to that.
02:46What's really important,
02:48and this will sort of underlie much of the talk
02:50is that kind of the whole point is that there might
02:52be underlying treatment effect heterogeneity.
02:55So there might be some individuals
02:57for whom this treatment of interest is actually
02:59more effective than others.
03:01But what I wanna be clear about, is the goal of inference
03:04that I'm talking about today, is gonna be about
03:07this overall population average.
03:09So we're not trying to say like which people
03:11are gonna benefit more or sort of to which people
03:14should we give this treatment.
03:16It's really more a question of sort of more population
03:20level decisions, sort of if we have...
03:22If we're making a decision, that's sort of a policy
03:24kind of population level,
03:25on average is this gonna be something that makes sense.
03:28So I hope that distinction makes sense.
03:30I'm happy to come back to that.
03:35So again until I don't know, five or,
03:38well maybe now more than 10 years ago,
03:41there had been relatively little attention
03:43to the question of how well results from
03:46kind of well-designed studies like a randomized trial
03:50might carry over to a relevant target population.
03:53I think in much of statistics as well as fields
03:56like education research, public policy, even healthcare,
04:00there's really been a focus on randomized trials
04:03and getting internal validity,
04:05and I'll formalize this in a minute.
04:07But in the past 10 or so years, there's been more and more
04:10interest in this idea of how well can we take the results
04:13from a particular study and then project them
04:17to well-defined target population.
04:20And again, so today I'm gonna try to give
04:21sort of an overview of the thinking in this area,
04:24along with some of the limitations and in particular,
04:27the data limitations that we have in thinking about this.
04:33One thing I do wanna be clear about is there's a lot
04:36of reasons why results from randomized trials
04:38might not generalize.
04:40There's some classic examples in education
04:42where there are scale-up problems.
04:44The classic example is one I'm looking at,
04:50class size.
04:51And so, in Tennessee, they randomly assign kids
04:54to be in smaller versus larger classes
04:57and found quite large effects of smaller classes.
05:00But then, when the State of California tried to implement
05:03this, the problem is that you need a lot more teachers
05:06to kind of roll that out statewide.
05:08And so, it led actually to a different pool of teachers
05:11being hired.
05:12And so, there's sort of scale-up problems
05:14sometimes with the interventions and that might lead
05:16to different contexts or different implementation.
05:19Today, what I'm gonna be focusing on are differences
05:21between a sample and a population.
05:25Their difference is in sort of baseline characteristics,
05:28that moderate treatment effects.
05:29And again, I'll formalize this a little bit as we go along.
05:33Just as a little bit of an aside,
05:34but in case some of you know this field a little bit,
05:37just to give you a little, just...
05:39I wanna flag this.
05:40Some people might use the term transportability.
05:43So some of the literature in this field uses the term
05:46transportability.
05:47I tend to use generalizability.
05:50There's some subtle differences between the two,
05:52which we can come back to, but for all intents and purposes,
05:55like they basically can think of them interchangeably
05:59for now.
06:00I also wanna note, if any of you kind of come
06:02from like a survey world, these debates about
06:06kind of how well a particular sample reflects a target
06:09population are exactly, not exactly the same,
06:12but very similar to the debates happening in the survey
06:15world around non-probability samples and sort of concerns
06:19about,
06:21the use of like say online surveys and things that might not
06:25have a true formal sort of survey sampling design,
06:28and sort of some of the concerns that arise about
06:31generalizability.
06:32So there's this whole parallel literature in the survey
06:34world.
06:35Andrew Mercer has a nice summary of that.
06:37Again, I'm happy to talk more about that.
06:41Okay, any questions before I keep going?
06:49Okay.
06:49So let me formalize kind of what we're talking about
06:52a little bit.
06:53This is...
06:55This framework is now, 12 years old.
06:59Time goes quickly.
07:01But we're just to formalize what we're interested in.
07:05The goal is to estimate, again, this what I'll call
07:07a population average treatment effect or PATE.
07:10And so here,
07:12hopefully you're familiar with sort of potential outcomes
07:14and causal inference.
07:16But the idea is that we have some well-defined population
07:19of size N.
07:20And Y(1) is the potential outcomes, if people
07:24in that population receive the treatment condition
07:28of interest.
07:29Y(0) are the outcomes if they receive the control
07:32or comparison condition of interest.
07:34So here, we're just saying we're interested
07:35in the average effect, basically sort of the difference
07:40in potential outcomes, average across the population.
07:46We could be doing this with risk ratios
07:49or odds ratios or something.
07:51Those are a little more complicated because the math
07:53doesn't work as nicely.
07:55So for now think about it more like risk differences
07:57or something, if you have a binary outcome,
08:00the same fundamental points hold.
08:03So I'm not gonna tell you right now where
08:05the data we have came from, but imagine that we just
08:08have a simple estimate of this PATE,
08:11as the difference in means of some outcome
08:14between an observed treated group and an observed
08:16control group.
08:17So again, we see that there's a bunch of people
08:20who got treated, a bunch of people who got control,
08:22and we might estimate this PATE as just the simple
08:25difference in means between again, the treatment group
08:28and the control group.
08:29So what I wanna talk through for the next couple of minutes,
08:32is the bias in this sort of naive estimate of the PATE.
08:36So we'll call that Delta.
08:38So I'm being a little loose with notation here,
08:40but sort of the PATE that the bias essentially
08:43think of it as sort of the difference between
08:45the true population effect and our naive estimate of it.
08:49And what this paper did with Gary King and Kosuke Imai,
08:54we sort of laid how different choices of study designs
08:58impact the size of this bias.
09:01And in particular, we showed that sort of under
09:03some simplifying situations,
09:05sort of mathematical simplicity,
09:07you can decompose that overall bias into four pieces.
09:11So the two Delta S terms are what are called,
09:15what we call sample selection bias.
09:17So basically, the bias that comes in if our data sample
09:22is not representative of the target population
09:25that we care about.
09:27The Delta T terms are our typical sort of confounding bias.
09:31So bias that comes in if our treatment group is dissimilar
09:36from our control group.
09:38The X refers to the variables we observe,
09:40and the U refers to variables that we don't observe.
09:45So what we then did in the paper,
09:46and this is sort of what motivates a lot of this work
09:49is to think through these, again, the trade offs
09:51in these different designs.
09:53And essentially what we're trying to sort of point out
09:56is that...
09:59Let's go to the second row of this table first actually,
10:01a typical experiment.
10:02So a typical experiment, I would say is one where
10:06we kind of take whoever comes in the door,
10:08we kind of try to recruit people for a randomized trial,
10:11whether that's schools or patients or whatever it is.
10:16And we randomized them to treatment and control groups.
10:19So that is our typical randomized experiment.
10:22The treatment selection bias in that case is zero.
10:26In expectation, that's why we like randomized experiments.
10:29In expectation, there is no confounding
10:32and we get an unbiased treatment effect estimate
10:34for the sample at hand.
10:37The problem for population inference
10:40is that the Delta S terms might be big,
10:43because the people that agree to be in a randomized trial,
10:46might be quite different from the overall population
10:49that we care about.
10:51So in this paper, we're trying to just sort of...
10:53In some ways, be a little provocative and point this out
10:56that our standard thinking about study designs
10:59and sort of our prioritization of randomized trials,
11:03implicitly prioritizes internal validity over external
11:07validity.
11:08And in particular, if we really care about
11:12population effects, we really should be thinking about
11:15these together and trying to sort of have small
11:18sample selection bias and small treatment selection bias.
11:22So an ideal experiment would be one where we can randomly
11:25select people for our trial.
11:28Let's say we have...
11:30Well, actually, I'll come back to that in a second.
11:31Randomly select people for our trial and then randomly
11:34assign people to treatment or control groups.
11:37And in expectation, we will have zero bias in our population
11:41effect estimate.
11:42But these other designs, and again,
11:44like a typical experiment might end up having larger bias
11:47overall, than a well designed non-experimental study,
11:51where if we do a really good job like adjusting
11:54for confounders,
11:55it may be that well done non-experimental study
11:59conducted using say the electronic health records
12:02from a healthcare system might actually give us lower bias
12:06for a population effect estimate.
12:08Then does a non-representative small randomized trial.
12:12Again, a little provocative,
12:13but I think useful to be thinking about what is really our
12:17target of inference and how do we get data that is most
12:19relevant for that.
12:22I will also just as a small aside,
12:24maybe a little on the personal side,
12:26but it's been striking to me in the past two days.
12:28So my husband broke his collarbone over the weekend.
12:31And it turns out the break is one where there's a little bit
12:35of debate about whether you should have surgery or not.
12:38Although kind of recent thinking is that
12:39there should be surgery.
12:40And I was doing a PubMed search as a good statistician
12:44public health person whose family member
12:47needs medical treatment.
12:49And I found all these randomized trials that actually
12:52randomized people to get surgery or not.
12:55And then I came home...
12:56Oh, no, I didn't come home, we were home all the time.
12:59I asked my husband later, I was like,
13:00would you ever agree to be randomized?
13:02Like right now, we are trying to make this decision about,
13:05should you have surgery or not.
13:07And would we ever agree to be randomized?
13:09And he's like, no, we wouldn't.
13:11We're gonna go with what the physician recommends
13:15and what we feel is comfortable.
13:16And it really just hit home for me at this point that
13:19the people who agree to be randomized or the context
13:22under which we can sort of randomize
13:26are sometimes fairly limited.
13:28And again, so partly what this body of research is trying
13:31to do is sort of think through what are the implications
13:33of that when we do wanna make population inferences.
13:38Make sense so far?
13:39I can't see faces, so hopefully.
13:43Okay.
13:47So,
13:48I will say a lot of my work in this area has actually,
13:50in part been just helping or trying to raise awareness
13:53of thinking about external validity bias.
13:56So some of the research in this area has been trying
14:00to understand how big of a problem is this.
14:03If maybe people don't agree to be in randomized trials
14:06very often,
14:07but maybe that doesn't really cause bias in terms
14:10of our population effect estimates.
14:12So what I've done in a couple of papers on these
14:15other sides on this slide is basically trying to formalize
14:18this and it's pretty intuitive, but basically we show,
14:22and I'm not showing you the formulas here.
14:24But intuitively, there will be bias in a population effect
14:28estimate essentially if participation in the trial
14:33is associated with the size of the impacts.
14:35So in particular,
14:38what I'll call the external validity bias.
14:39So,
14:40those Delta S terms kind of the bias
14:42due to the lack of representativeness
14:45is a function of the variation of the probabilities
14:48of participating in a trial,
14:50variation and treatment effects,
14:52and then the correlation between those things.
14:54So if constant...
14:56If we have treat constant treatment effects
14:58or the treatment effect is zero
14:59or is two for everyone, there's gonna be no external
15:02validity bias.
15:03It doesn't matter who is in our study.
15:06Or if there...
15:08If everyone has an equal probability of participating
15:10in the study, we really do have a nice random selection,
15:14then again, there's gonna be no external validity bias.
15:17Or if the factors that influence whether or not you
15:20participate in the study are independent of the factors
15:23that moderate treatment effects,
15:25again, there'll be no external validity bias.
15:29The problem is that we often have very limited information
15:32about these pieces.
15:34We, as a field, I think medicine, public health, education,
15:38all the fields I worked in, there has not been much
15:41attention paid to these processes of how we actually
15:44enroll people in studies.
15:46And so it's hard to know kind of what factors relate
15:49to those and if those then also moderate treatment effects.
15:53(phone ringing)
15:54Oops, sorry.
15:55Incoming phone call, which I will ignore.
15:58So,
15:59there has been...
16:01Sorry.
16:03There has been a little bit of work trying to document this
16:05in real data and find empirical evidence on these sizes.
16:11The problem, and sorry, some of the...
16:13Some of you might...
16:14If any of you are familiar with the, like,
16:16within what it's called the within study comparison
16:18literature.
16:19So there's this whole literature on non-experimental studies
16:23that sort of try to estimate the bias due to non-random
16:28treatment assignment.
16:30This is sort of analogous to that.
16:32But the problem here is that what you need is you need
16:34an accurate estimate of the impact in the population.
16:37And then you also need sort of estimates of the impact
16:40in samples that are sort of obtained in kind of typical
16:44ways.
16:45So that's actually really hard to do.
16:47So I'll just briefly talk through two examples.
16:49And if any of you have data examples that you think might
16:52sort of be useful for generating evidence,
16:55that would be incredibly useful.
16:57So one of the examples is...
17:00So let me back up for a second.
17:02In the field of mental health research,
17:03there's been a push recently, or actually not so much
17:06recently in the past, like 10, 15 years
17:08to do what I call or what are called pragmatic trials
17:12with the idea of enrolling much more...
17:16A much broader set of people use a broader set of practices
17:21or locations around the country.
17:23And so what this Wisniewski et al people did was they took
17:27the data from one of those large pragmatic trials.
17:29And the idea they...
17:30Again, the idea was that it should be more representative
17:33of people in this case with depression
17:35across the U.S.
17:37And then, they said, well, what if...
17:38In fact, we didn't have that.
17:40What if we use sort of our normal study inclusion
17:44and exclusion criteria, it's sort of been, we'd like subset,
17:47this pragmatic trial data to the people that we think
17:50would have been more typically included in a sort of more
17:53standard randomized trial.
17:55And sort of not surprisingly, they found that
17:58the people in the sort of what they call
17:59the efficacy sample, those sort of typical trial sample
18:03had better outcomes and larger treatment effects
18:05than the overall pragmatic trial sample as a whole.
18:10We did something similar sort of in education research where
18:15it's a little bit in the weeds.
18:16I don't really wanna get into the details,
18:18but we essentially had a pretty reasonable regression
18:22discontinuity design.
18:23So we were able to get estimates of the effects of this
18:26reading first intervention across a number of states.
18:30And we then compared those state wide impact estimates
18:34to the estimates you would get if we enrolled only
18:38the sorts of schools and school districts that are typically
18:41included in educational evaluations.
18:44And there we found that this external validity bias
18:48was about 0.1 standard deviations,
18:50which in education world is fairly large.
18:53Certainly people would be concerned about an internal
18:56validity bias of that size.
18:58So we were able to sort of use this to say, look,
19:00if we really wanna be serious about external validity,
19:03it might be as much of a problem as sort of typical internal
19:06validity bias that people care about in that field.
19:13So again, the problem though, is we don't usually
19:15have these sorts of designs where we have a population
19:17effect estimate, and then sample estimates,
19:19and we can compare them.
19:21And so instead we can sometimes try to get evidence on sort
19:24of the pieces.
19:25So, but again, we basically often have very little
19:28information on why people end up participating in trials.
19:31And we also are having,
19:34I think there's growing numbers of methods,
19:36but there's still limited information on treatment effect
19:39heterogeneity.
19:40Individual randomized trials are almost never powered
19:43to detect subgroup effects.
19:45Although, there is really growing research in this field
19:48and that is maybe a topic for another day.
19:52Okay.
19:53But again, there is a little...
19:55I think I'll go through this really quickly, but,
19:58I will give credit to some fields which are trying to better
20:01understand kind of who are the people that enroll in trials
20:04and how do they compare policy populations of interest.
20:08So a lot of that has been done in sort of the substance
20:11use field.
20:12And you can see a bunch of sites here
20:14documenting that people who participate in randomized trials
20:18of substance use treatment do actually differ quite
20:22substantially from people seeking treatment for substance
20:25use problems more generally.
20:27So for example, the Okuda reference the eligibility criteria
20:32in cannabis treatment RCTs would exclude about 80%
20:36of patients across the U.S. seeking treatment
20:38for cannabis use.
20:40And so again, it's sort of there's indications
20:43that the people that participate in trials
20:45are not necessarily reflective of the people
20:48for whom decisions are having to be made.
20:54Okay, so hopefully that at least kind of give some
20:57motivation for why we want to think more carefully
21:01about the population average treatment effect
21:04and why we might wanna think about designing studies
21:06or analyzing data in ways that help us estimate that.
21:10Any questions before I move to, how do we do that?
21:19Okay.
21:20I will end...
21:21I'm gonna hopefully end it at about 12:45, 1250,
21:24so we'll have time at the end, too.
21:27So, as a statistician, I feel obligated to say,
21:31and actually I have a quote on this at the very end
21:32of the talk.
21:33If we wanna be serious about estimating something,
21:36it's better to incorporate that through the design
21:38of our study, rather than trying to do it post talk
21:41at the end.
21:44So let's talk briefly about how we can improve external
21:47validity through study or randomized trial design.
21:52So again,
21:53as I alluded to earlier with the sort of ideal experiment.
21:56An ideal scenario is one where we can randomly sample
21:59from a population and then randomly assign treatment
22:02and control conditions.
22:04Doing this will give us a formerly unbiased treatment effect
22:07estimate in the population of interest.
22:10This is wonderful.
22:11I know of about six examples of this type.
22:17Most of the examples I know of are actually a federal
22:19government programs where they are administered through
22:23like centers or sites.
22:25And the federal government was able to mandate participation
22:28in an evaluation.
22:29So classic example is the Head Start Impact Study,
22:33where they were able to randomly select headstart centers
22:36to participate.
22:37And then within each center,
22:39they randomized kids to be able to get in off the wait list
22:42versus not.
22:44An upward bound evaluation had a very similar design.
22:48It's funny, I was...
22:50I gave a talk on this topic at Facebook and I was like,
22:52why is Facebook gonna care about this?
22:54Because you would think at a place like Facebook,
22:56they have their user sample,
22:59they should be able to do randomization within,
23:02like they should be able to pick users randomly
23:04and then do any sort of random assignment they want
23:06within that.
23:07It turns out it's more complicated than that, and so,
23:10they were interested in this topic,
23:12but I think that's another sort of example where people
23:15should be thinking, could we do this?
23:16Like,
23:18in a health system.
23:20I can imagine Geisinger or something implement something
23:22in their electronic health record where
23:24it's about messaging or something.
23:26And you could imagine actually picking people randomly
23:29to then randomize.
23:31But again, that's pretty rare.
23:33There's an idea that's called purpose of sampling.
23:35And this goes back to like the 1960s or 70s
23:39and the idea is sort of picking subjects purposefully.
23:44So one example here is like maybe we think
23:47that this intervention might look different
23:49or have different effects for large versus small
23:52school districts.
23:53So in our study, we just make an effort to enroll
23:56both large and small districts.
23:59This is sort of nice.
24:00It kind of gives you some variability in the types of people
24:05or subjects in the trial, but, it doesn't have the formal
24:09representativeness and sort of the formal unbiasness,
24:12like the random sampling I just talked about.
24:15And then again, sort of similar is this idea and this push
24:17in many fields towards pragmatic or practical clinical
24:20trials, where the idea is just to sort of try to enroll
24:24like kind of more representative sample
24:27in sort of a hand wavy way like I'm doing now.
24:29So not, it doesn't have this sort of formal statistical
24:31underpinning, but at least it's trying to make sure
24:35that it's not just patients from the Yale hospital
24:38and the Hopkins hospital and whatever sort of large medical
24:41centers, at least they might be trying to enroll patients
24:45from a broader spectrum across the U.S.
24:49Unfortunately, though, as much as I want to do things
24:53for design often, we're in a case where there's a study
24:56that's already been conducted and we are just
25:00sort of stuck analyzing it.
25:01And we wanna get a sense for how representative
25:04the results might be for a population.
25:09Sometimes people, when I talk about this,
25:10people are like, well, isn't this what meta-analysis does?
25:13Like meta-analysis enables you to combine multiple
25:16randomized trials and come up with sort of an overall
25:20effect estimate.
25:23And my answer to that is sort of yes maybe, or no maybe.
25:26Basically, the challenge with meta-analysis,
25:30is that until recently, no one really had a potential target
25:34population.
25:35It was not very formal about what the target population is.
25:38I think underlying that analysis is generally
25:41sort of a belief that the effects are constant
25:44and we're just trying to pool data.
25:48And it...
25:48And even just like, you can sort of see this,
25:50like if all of the trials sampled the same
25:52non-representative population,
25:54combining them is not going to help you get towards
25:57representativeness.
25:59That's that I have a former Postdoc Hwanhee Hong,
26:01who's now at Duke.
26:03And she has been doing some work to try to bridge
26:06these worlds and sort of really try to think through,
26:08well, how can we better use multiple trials
26:12to get to target population effects?
26:16There's another field it's called risk cross-design
26:18synthesis or research synthesis.
26:21This is sort of neat.
26:22It's one where you kind of combine randomized trial data,
26:26which might be not representative with non-experimental
26:30study data.
26:31So sort of explicitly trading off the internal and external
26:34validity.
26:36I'm not gonna get into the details,
26:37there's some references here.
26:38Ellie Kaizar at Ohio State, is one of the people
26:41that's done a lot of work on this.
26:45And part of the reason I'm not focused on this is that
26:48I work in a lot of areas like education and public health,
26:53sort of social science areas,
26:54where we often don't have multiple studies.
26:56So we often are stuck with just one study and we're trying
27:00to use that to learn about target populations.
27:04So I'm gonna briefly talk about an example
27:07where we trying to sort of do this.
27:12And basically, the fundamental idea is to re-weight
27:16the study sample to look like the target population.
27:21This idea is related to post stratification
27:25or, oh my gosh, I'm blanking now.
27:27Raking adjustments in surveys.
27:31So post stratification would be sort of at a simple level,
27:33would be something like...
27:35Well, if we know that males and females
27:38have different effects, or let's say young and old
27:41have different effects, let's estimate the effects
27:44separately for young versus old.
27:47And then re-weight those using the population proportions
27:51of sort of young versus old.
27:54That sort of stratification doesn't work if you have more
27:58than like one or two categorical effect moderators.
28:02And so,
28:03what I'm gonna show today is an approach where we use
28:06weighting, where we fit a model,
28:08predicting participation in the trial,
28:10and then weight the trial sample to look like the target
28:13population.
28:14So similar idea to things like propensity score weights
28:17or non-response adjustment weights in samples.
28:21There is a different approach,
28:23So what I'm gonna illustrate today is sort of this sample
28:27selection weighting strategy.
28:29You also can tackle this external validity
28:32by trying to model the outcome very flexibly
28:35and then project outcomes in the population.
28:40In some work I did with Jennifer Hill and others,
28:43we showed that BARTs, Bayesian Additive Regression Trees
28:46can actually work quite well for that purpose.
28:49And more recently, Issa Dahabreh at Brown has done some
28:53nice work sort of bridging these two and showing
28:55basically a doubly robust kind of idea where we can use
28:58both the sample membership model and the outcome model
29:04to have better performance.
29:06But today, I'm gonna just illustrate the weighting approach,
29:08partly because it's a really nice sort of pedagogical
29:11example and helps you kind of see what's going on
29:14in the data.
29:16Okay, any questions before I continue?
29:21Okay.
29:22So the example I'm gonna use is...
29:26There was this, I mean, some of you probably know much more
29:28about HIV treatment than I do, but the ACTG Trial,
29:33which was now quite an old trial,
29:36but it was one of the ones that basically showed that
29:39HAART therapy, highly active antiretroviral therapy
29:42was quite effective at reducing time to AIDS or death
29:46compared to standard combination therapy at the time.
29:49So it randomized about 1200 U.S. HIV positive adults
29:54to treatment versus control.
29:56And the intent to tree analysis in the trial
29:59had a hazard ratio of 0.51.
30:01So again, very effective at reducing time to AIDS or death.
30:07So Steve Cole and I though kind of asked the question, well,
30:10we don't necessarily just care about the people
30:13in the trial.
30:14This seems to be a very effective treatment.
30:16What could we use this data to project out
30:19sort of what the effects of the treatment would be
30:22if it were implemented nationwide?
30:25So we from CDC got estimates of the number of people
30:28newly infected with HIV in 2006.
30:32And basically, asked the question sort of if hypothetically,
30:35everyone in that group were able to get HAART versus
30:40standard combination therapy,
30:42what would be the population impacts of this treatment?
30:48In this case, because of sort of data availability,
30:50we only had the joint distribution of age, sex and race
30:55for the population.
30:56So we made sort of a pseudo population, again,
30:59sort of representing the U.S. population
31:02of newly infected people.
31:03But again, all we have is sex, race and age,
31:06which I will come back to.
31:08So this table documents the trial and the population.
31:12So you can see for example,
31:15that the trial tended to have more sort of 30 to 39 year
31:20olds, many fewer people under 30.
31:25The trial had more males and also had more whites
31:29and fewer blacks, Hispanic was similar.
31:32But I wanna flag and we'll come back to this in a minute
31:35that, in what I'm gonna show,
31:38we can adjust for the age, sex, race distribution.
31:41But, there's a real limitation,
31:43which is that the CD4 cell count as sort of a measure
31:46of disease severity is not available in the population.
31:50So this is a potential effect moderator,
31:53which we don't observe in the population.
31:56So in sort of projecting the impacts, we can say, well,
31:59here is the predicted impact given the age, sex,
32:03race distribution, but there's this unobserved
32:06potential effect moderator that we sort of might be worried
32:09about kind of in the back of our heads.
32:15So again, I briefly mentioned this,
32:17this is like the super basic description
32:20of what can be done.
32:22There are more nuances and I have some sites at the end
32:24for sort of more details.
32:26But basically fundamentally will, again,
32:28we sort of think about it as we kind of stack
32:30our data sets together.
32:31So we put our trial sample and our population data set
32:34together.
32:35We have an indicator for whether someone is in the trial
32:38versus the population.
32:40And then, we're gonna wait the trial members
32:43by their inverse probability of being in the trial
32:46as a function of the observed covariance.
32:48And again, very similar intuition and ideas
32:51and theory underlying this as underlying things
32:55like Horvitz-Thomson estimation in sample surveys
32:58and inverse probability of treatment waiting
33:01in non-experimental studies.
33:06So I showed you earlier that age, sex and race
33:09are all related to participation in the trial.
33:13What I'm not showing you the details of,
33:15but just trust me is that those factors also moderate
33:19effects in the trial.
33:20So the trial showed the largest effects for those ages,
33:2430 to 39, males and black individuals.
33:28And so, this is exactly why then what we might think
33:31that the overall trial estimate might not reflect
33:34what we would see population-wide.
33:39Ironically though, it turns out actually
33:40it kind of all cancels out.
33:41So this table shows the estimated population effects.
33:45So the first row again, is just the sort of naive trial
33:48results.
33:50We can then sort of weight by each characteristic
33:52separately, and then the bottom row is the combined
33:56age, sex, race adjustments.
33:58And you can see sort of actually the hazard ratio
34:01was remarkably similar.
34:03It's partly because like the age weightings
34:05sort of makes the impact smaller,
34:07but then the race weighting makes it bigger.
34:10And so then it kind of just washes out.
34:13But again, it's sort of a nice example,
34:15cause you can sort of see how the patterns
34:17evolve based on the size of the effects
34:20and the sample selection.
34:23I also wanna point out though that, of course,
34:25the confidence interval is wider,
34:27and that is sort of reflecting the fact that we are doing
34:30this extrapolation from the trial sample to the population.
34:33And so there's sort of a variance price we'll pay for that.
34:39Okay.
34:40So I haven't been super formal on the assumptions,
34:44but I'm I alluded to this?
34:45So I wanna just take a few minutes to turn
34:48to what about unobserved moderators?
34:50Because again, we can interpret this 0.57
34:54as the sort of overall population effect estimate
34:58only under an assumption that there are no unobserved
35:01moderators that differ between sample and population,
35:06once we adjust for age, sex, race.
35:11Okay, and in reality,
35:14such unobserved effect moderators are likely the rule,
35:17not the exception.
35:18So again, sort of, as I just said,
35:20the key assumption is that we've basically adjusted
35:23for all of the effect moderators.
35:26Very kind of comparable assumption to the assumption
35:30of no an observed confounding in a non-experimental study.
35:35And one of the reasons this is an important assumption
35:38to think about, is that, it is quite rare actually
35:42to have extensive covariate data overlap
35:46between the sample and the population.
35:48I have been working in this area for...
35:51How many years now?
35:52At least 10 years.
35:53And I've found time and time again,
35:56across a number of content areas,
35:58that it is quite rare to have a randomized trial sample
36:01and the target population dataset
36:03with very many comparable measures.
36:06So in the Stuart and Rhodes paper,
36:08this was in like early childhood setting
36:12and each data set, the trial and the population data
36:15had like over 400 variables observed at baseline.
36:19There were literally only seven that were measured
36:22consistently between the two samples.
36:25So essentially we have very limited ability then to adjust
36:28for these factors because they just don't have much overlap.
36:32So what that then motivated us to create some sensitivity
36:37analysis to basically probe and say, well,
36:40what if there is an unobserved effect moderator,
36:43how much would that change our population effect estimate?
36:47Again, this is very comparable to analysis of sensitivity,
36:51to unobserved confounding and non-experimental studies
36:54sort of adapted for this purpose of trial population,
36:59generalized ability.
37:03I think I can skip this in the interest of time and not go
37:06through all the details.
37:07If anyone wants the slides by the way,
37:08feel free to email me, I'm happy to send them.
37:13I'm gonna skip this too cause I've already said
37:15sort of the key assumption that is relevant for right now,
37:19but basically what we propose is,
37:24I'm gonna talk about two cases.
37:26So the easier case is this one where we're gonna assume
37:29that the randomized trial observes all of the effect
37:32moderators.
37:33And the issue is that our target population dataset
37:36does not have some moderators observed.
37:41I think this is fairly realistic because at least
37:43like to think that the people running the randomized trials
37:47have enough scientific knowledge and expertise
37:50that they sort of know what the likely effect moderators
37:52are and that they measure them in the trial.
37:55That is probably not fully realistic, but I'm...
37:58I like to give them sort of the benefit of the doubt
38:00on that.
38:01And that sort of that's what the ACTG example,
38:05was like CD4 count would be an example of this,
38:07where we have CD4 count in the trial,
38:11but we just don't have it in the population.
38:14So what we showed is that there's actually,
38:16a couple of different ways you can implement
38:18this sort of sensitivity analysis.
38:22One is essentially kind of an outcome model based one
38:25where you,
38:28basically, we just sort of specify a range
38:30for the unobserved moderator V in the population.
38:34So we kind of say, well, we don't know
38:36the distribution of this moderator in the population,
38:40but we're gonna guess that it's in some range.
38:43And then, we kind of projected out using data from the trial
38:48to understand like the extent of the moderation
38:51due to that variable.
38:53There's another variation on this,
38:55which is sort of the weighting variation
38:58where you kind of adjust the weights,
39:00essentially again for this unobserved moderator.
39:03Again, either way you sort of basically just have to specify
39:07a potential range for this V, the unobserved moderator
39:11in the population.
39:14So here's an example of that.
39:16This is a different example, where we were looking
39:18at the effects of a smoking cessation intervention
39:21among people in substance use treatment.
39:24And in the randomized trial, the mean addiction score
39:31was four.
39:33But we didn't have this addiction score,
39:35in the target population of interest.
39:37And so, what the sensitivity analysis allows us to do
39:40is to say, well, let's imagine that range is anywhere
39:44from three to five.
39:45And how much does that change our population effect
39:49estimates?
39:51Essentially, how steep this line is, is gonna be
39:54sort of determine how much it matters.
39:57And the steepness of the line basically
39:59is how much of a moderator is it,
40:02sort of how much effect heterogeneity is there in the trial
40:05as a result of that variable.
40:07But again, this is at least one way to sort of turn
40:11this sort of worry about an unobserved moderator
40:13into a more formal statement about how much
40:16it really might matter.
40:21I'm not gonna get into this partly,
40:22so you might also be thinking, well,
40:24what if the trial doesn't know what all the moderators are?
40:27And what if there's some fully unobserved moderator
40:31that will call U?
40:34This is a much much harder, basically,
40:36if anyone wants to try to dig into it, that would be great.
40:39Part of the reason it's harder is because you have to make
40:42very strong assumptions about the distribution
40:44of the observed covariance and U together.
40:48We put out one approach,
40:49but it is a fairly special case and not very general.
40:53So again, hopefully we're not in this sort of scenario
40:56very often.
41:01This is a little bit of a technicality,
41:03but often epidemiologists ask this question.
41:05So I've laid stuff out again with respect to kind of a risk
41:09difference or a difference in outcomes
41:12and sort of like more of like an additive treatment scale.
41:15There is this real complication that arises,
41:17which is that if you have like a binary,
41:20like the scale of the outcome matters in terms of effect
41:25moderation.
41:26And in particular, there might be sort of more apparent
41:30effect heterogeneity on one scale versus another.
41:33So I'm just kind of flagging this, that like this exists,
41:37there are some people sort of looking at this in more
41:39formal, but again for now sort of just think about like risk
41:44difference kind of scale.
41:47Okay, great.
41:48So let me just conclude with a few kind of final thoughts.
41:51So, I think all of us, not all of us,
41:54but often we sort of want to assume that study results
41:58generalize.
41:58Often people write a discussion section in a paper,
42:01where they kind of qualitatively have some sentences
42:05about why they do or don't think that the results
42:08in this paper kind of extend to other groups
42:10or other populations.
42:13But I think until the past again, sort of five or so years,
42:16a lot of that discussion was very hand-wavy
42:19and sort of qualitative.
42:21I think that what we are seeing in epidemiology
42:24and statistics and bias statistics
42:26recently has been a push towards having more
42:29ability to quantify this and make it sort of more formal
42:33statements.
42:35So I think if we do wanna be serious though,
42:37about assessing and enhancing external validity,
42:41again, we really need these different pieces.
42:43We need information on the factors that influence effect
42:46heterogeneity the moderators.
42:49We need information on the factors that influence
42:51participation in rigorous studies like randomized trials.
42:55And we need data on all of those things,
42:57in the trial and the population.
43:00And then finally, we need statistical methods that allow us
43:04to use that data to estimate population treatment effects.
43:08I would argue that that last bullet is sort of much further
43:12along than any of the others.
43:13That in my experience,
43:15the limiting factor is usually not the methods.
43:19The limiting factor at this point in time is the data
43:22and sort of the scientific knowledge
43:25about these different factors.
43:29And that's what this slide is.
43:30So I think I've already said, but that again,
43:33is sort of one of the motivations for the sensitivity
43:35analysis is just a recognition that it's often,
43:39really quite hard to get data that
43:42is consistently measured between a trial and a population.
43:47So on that point, recommendations again,
43:49if we wanna be serious about effect heterogeneity
43:51or about estimating population treatment effects,
43:55we need better information on treatment effect heterogeneity
43:59that might be better analysis of existing trials,
44:02that might be meta-analysis of existing trials.
44:05That might also be theoretical models for the interventions
44:07to understand what the likely moderators are.
44:12We also need better information on the factors
44:14that influence participation in trials and more discussion
44:17of how trial samples are selected.
44:22We need to standardize measures.
44:23So again, it's incredibly frustrating when you have trial
44:26and population data, but the measures in them are not
44:30consistent.
44:31There are methods that can be used for this,
44:33some data harmonization approaches,
44:36but, they require assumptions.
44:39It's better if we can be thoughtful and strategic about,
44:42for example, common measures across studies.
44:45I will say one of the frustrations too,
44:47is that in some fields like the early childhood data
44:51I talked about,
44:52part of the problem was like the two data sets might
44:55actually have the same measure,
44:56but they didn't give the raw data,
44:58and they're like standardized scales differently.
45:01Like they standardized them to their own population,
45:03not sort of more generally.
45:05And so they, weren't sort of on the same scale in the end.
45:10As a statistician, of course, I will say we do need more
45:12research on the methods and understanding when they work
45:15and when they don't.
45:16There are some pretty strong assumptions
45:19in these approaches.
45:20But again, I think that sort of in some ways,
45:24that is further along and then some of the data situations.
45:29So I just wanted to take one minute to flag some current
45:32work in case partly if anyone wants to ask questions about
45:34these.
45:36One thing I'm kind of excited about,
45:38especially in my education world is...
45:42So what I've been talking about today has mostly been,
45:44if we have a trial sample and we wanna project
45:46to kind of a larger target population.
45:49But there's an equally interesting question,
45:51which is sort of how well can randomized trial informs
45:54or local decision making?
45:56So if we have a randomized trial with 60 schools in it,
46:01how well can the results from that trial be used to inform
46:04individual school districts decisions?
46:07Turns out, not particularly well.
46:09(laughs)
46:10We can talk more about that.
46:12I mentioned earlier, Issa Dahabreh, who's at Brown,
46:15and he's really interested in developing sort of the formal
46:18theories underlying different ways of estimating
46:21these population effects, again, including some
46:23doubly robust approaches.
46:26Trang Nguyen, who works at Hopkins with me,
46:29we are still looking at sort of the sensitivity analysis
46:32for unobserved moderators.
46:34I mentioned Hwanhee Hong already, who's now at Duke.
46:37And she, again, sort of straddles the meta-analysis world
46:40in this world, which has some really interesting
46:43connections.
46:45My former student now he's at Flatiron Health
46:48as of a few months ago.
46:50Ben Ackerman, did some work on sort of measurement error
46:53and sort of partly how to deal with some of these
46:55measurement challenges between the sample and population.
47:00And then I'll just briefly mention Daniel Westreich at UNC,
47:04who is really...
47:05If you come from sort of more of an epidemiology world,
47:09Daniel has some really nice papers that are sort of trying
47:11to translate these ideas to epidemiology,
47:14and this concept of what he calls target validity.
47:17So sort of rather than thinking about internal and external
47:20validity separately, and as potentially,
47:23in kind of conflict with each other,
47:26instead really think carefully about a target of inference
47:29and then thinking of internal and external validity
47:31sort of within that and not sort of trying to prioritize
47:35one over the other.
47:37And then just an aside, one thing,
47:40I would love to do more in the coming years is thinking
47:43about combining experimental and non-experimental evidence.
47:46I think that is probably where it would be very beneficial
47:49to go instead of more of that cross designed synthesis
47:52kind of idea.
47:55But again, I wanna conclude with this,
47:57which is gets us back to design and that again,
48:01sort of what is often the limiting factor here is the data
48:04and just sort of strong designs.
48:07So Rubin, 2005 with better data, fewer assumptions
48:10are needed and then Light, Singer and Willett,
48:13who are sort of big education methodologists.
48:16You can't fix by analysis what you've bungled by design.
48:19So again, just wanna highlight that if we wanna be serious
48:22about estimating population effects,
48:24we need to be serious about that in our study designs,
48:27both in terms of who we recruit,
48:30but then also what variables we collect on them.
48:32But if we do that,
48:33I think that we can have the potential to really help guide
48:37policy and practice by thinking more carefully
48:39about the populations that we care about.
48:43So for more...
48:44Here's this, there's my email, if you wanna email me
48:47for the slides.
48:49And thanks to various funders, and then I'll leave this up
48:53for a couple minutes,
48:55which are all big, tiny font, some of the references,
48:59but then I'll take that down in a minute so that we can see
49:01each other more.
49:02So thank you, and I'm very happy to take some questions.
49:14I don't know if you all have a way to organize
49:16or people just can
49:19jump in.
49:24- So maybe I'll ask the question.
49:25Thanks Liz, for this very interesting and great talk.
49:29So I noticed that you've talked about the target population
49:34in this framework.
49:35And I think there are situations where the population sample
49:39is actually a survey from a larger population.
49:43- Yeah.
49:44- Cause we do not really afford to absorb everything,
49:47actual population, which will contain
49:49like millions of individuals.
49:50And so in that situation, does the framework still apply
49:55particularly in terms of the sensitivity analysis?
49:58And is there any caveat that we should also know in dealing
50:01with those data?
50:03- Great question.
50:05And actually, thank you for asking that because I forgot
50:07to mention that Ben Ackerman's dissertation,
50:10also looked at that.
50:11So I mentioned his measurement error stuff.
50:13But yes, actually, so Ben's second dissertation paper
50:17did exactly that, where we sort of laid out the theory
50:21for when these the target population data
50:24comes from a complex survey itself.
50:29Short answer is yes, it all still works.
50:31Like you have to use the weights, there are some nuances,
50:34but, and you're right, like essentially,
50:36especially like in...
50:38Like for representing the U.S. population, often, the data
50:41we have is like the National Health Interview Survey
50:44or the Add Health Survey of Adolescents,
50:47which are these complex surveys.
50:49So short answer is, yeah, it still can work.
50:53Your question about the sensitivity analysis is actually
50:55a really good one and we have not extended...
50:58I'd have to think, I don't know, off hand, like,
51:00I think it would be sort of straightforward to extend
51:04the sensitivity analysis to that, but we haven't actually
51:07done it.
51:08- Thanks Liz.
51:11The other short question is that I noticed that
51:12in your slide, you first define, PATE as population ate,
51:16but then in one slide you have this Tate,
51:19which I assume is target ate.
51:21And so, I'm just really curious as to like, is there any,
51:25like differences or nuances in the choice of this
51:27terminology?
51:29- Good question.
51:30And no, yeah, I'm not...
51:31I wasn't very precise with that, but in my mind, no.
51:35Over time I've been trying to use Tate,
51:38but you can see that kind of just by default,
51:40I still sometimes use PATE.
51:43Part of the reason I use Tate is because I think
51:46the target is just a slightly more general term.
51:48Like people sometimes I think, think if we meet,
51:50if we say PATE, the population has to be like
51:53the U.S. population or some like very sort of big,
51:58very official population in some sense.
52:01Whereas, the target average treatment effect,
52:04Tate terminology, I think reflects that sometimes
52:06it's just a target group that's well-defined.
52:10- Gotcha.
52:11Thanks, that's very helpful.
52:12And I think we have a question coming from the chat as well.
52:15- Yeah, I just saw that.
52:16So I can read that.
52:17We have theory for inference from a sample to a target
52:20population needs to find that internal validity approaches,
52:23what theory is there for connecting the internal validity
52:25methods to external validity?
52:29So I think, what you mean is sort of,
52:33what is the formal theory for projecting the impact
52:37to the target population?
52:38That is exactly what some of those people that I referenced
52:41sort of lay out.
52:42Like I didn't...
52:42For this talk, I didn't get into all the theoretical weeds,
52:45but if you're interested in that stuff,
52:46probably some of Issa Dahabreh's work would be the most
52:49relevant to look at.
52:51Cause he really lays out sort of the formal theory.
52:54I mean, some of my early papers on this topic did it,
52:58but his is like a little bit more formal and sort of makes
53:01connections to the doubly robust literature
53:04and things like that.
53:04And so it's really...
53:06Anyway, that's what this whole literature
53:08and part of it is sort of building is that theoretical base
53:11for doing this.
53:17Any other questions?
53:28- [Ofer] Liz,
53:29I'm Ofer Harel.
53:30- Oh, hi Ofer?
53:31- [Ofer] Hi.
53:33(mumbles)
53:34Just jump on the corridor, so it's make it great.
53:39So in most of the studies that I would work on,
53:43they don't do really have a great idea about
53:46what really the population is and how really to measure
53:50those.
53:51So it's great if I have some measure of the population,
53:54but most of the time it is the studies that I work.
53:57I have no real measurements on that population.
54:02What happens then?
54:03- Yeah, great question.
54:04And in part, I meant to say this,
54:06but that's one of the reasons why the analogy...
54:08Why the design strategies don't always work particularly
54:10well is like, especially when you're just starting out
54:13a study, right?
54:14We don't really know the target population.
54:17I think certainly to do any of these procedures,
54:21you need eventually to have a well defined population.
54:25But I think that's partly why some of the analysis
54:27approaches are useful is that,
54:29you might have multiple target populations.
54:31Like we might have one trial,
54:33and we might be interested in saying,
54:35how well does this generalize to the State of New Hampshire
54:39or the State of Vermont or the State of Connecticut?
54:41And so, you could imagine one study that's used to inform
54:45multiple target populations.
54:48With different assumptions,
54:49sort of you have to think through the assumptions
54:50for each one.
54:52If you don't even,
54:54I guess I would say if you don't even know
54:56who your population is, you shouldn't be using these methods
54:59at all, cause like the whole premise is that there is some
55:02well-defined target population and you do need data on it
55:05or at least...
55:07Yeah, the joint distribution of some covariance
55:09or something.
55:10Without that, you're kind of just like,
55:13I don't know, what a good analogy is,
55:15but you're kinda just like guessing at everything.
55:24(mumbles)
55:26- No, go ahead.
55:27Go ahead.
55:29- Oh, Vinod, yeah.
55:30All my friends are popping up, it's great.
55:32(laughs)
55:34- [Vinod] Can I go ahead?
55:35I feel like I'm talking to someone.
55:39- Yeah, go ahead Vinod.
55:40- [Vinod] That was a great talk.
55:42So I have a little ill formulated question,
55:44but it's queuing after just the last question
55:47that was asked is,
55:49in clinical set populations where,
55:55in some ways we're using this clinical samples
55:58to learn about the population because unless they seek help,
56:02we often don't know what they are in the wild, so to speak.
56:05And so, each sampling of that clinical population
56:09is a maybe by sampling of that larger population
56:13in the wild.
56:14So I guess my question is, how do you get around this,
56:18I guess Rumsfeld problem, which is every time you sample
56:22there's this unknown, unknown, but there's no way to get
56:24at them because in some ways, your sampling relies on...
56:27If we could say it relies on help seeking,
56:30which is by itself as process.
56:33And if we could just stipulate, there's no way to get
56:35around that.
56:36How do you see this going forward?
56:40- Yeah, good question.
56:40I think right, particularly relevant in mental health
56:43research where there's a lot of people who are not seeking
56:46treatment.
56:47These methods are not gonna help with that in a sense
56:50like again, they are gonna be sort of tuned to whatever
56:53population you have.
56:55I think though there are...
56:57If you really wanna be thoughtful about that's
57:00problem, that's where sort of some of the strategies
57:03that were used like the Epidemiologic Catchment Area
57:05Surveys, where they would go door to door and knock on doors
57:08and do diagnostic interviews.
57:11Like if we wanna be really serious about trying to reach
57:14everyone and get an estimate of the really sort of true
57:17population, then we really have to tackle that
57:20very creatively and with a lot of resources probably.
57:25- [Vinod] Thanks.
57:27- Welcome.
57:29- Hi Liz?
57:30Yeah, it's gonna be a true question and great talk
57:33by the way.
57:35I'm curious, you mentioned there could be a slight
57:38difference between the terms transportability
57:40and generalizability.
57:41Yeah, I'm curious about that.
57:43- Yeah, briefly, this is a little bit of a...
57:48What's the word?
57:48Simplification, but briefly I think of generalizability
57:51as one where the sample that, like the trial sample
57:55is a proper subset of the population.
57:57So we do a trial in New Hampshire,
58:01and we're trying to generalize to new England.
58:04Whereas transportability is one where it is not a proper
58:08subset, so we do a trial in the United States
58:10and we wanna transport to Europe.
58:14Underlying both, the reason I don't worry too much about it,
58:17the terms is because either way,
58:19the assumption is essentially the same.
58:21Like you still have to make this assumption about
58:23no unobserved moderators.
58:25It's just that it's probably gonna be a stronger assumption
58:28and harder to believe,
58:30when transporting rather than when generalizing.
58:33Cause you sort of know that you're going from one place
58:36to another in some sense.
58:39- Thanks, makes sense.
58:41- Sure.
58:43- I think there's another question in the chat.
58:45- Yeah, so this is a great question.
58:46I'm glad shows you on.
58:48I hope I got that.
58:50It seems there are multiple ways to calculate the Tate
58:53from standardization to waiting to the outcome model.
58:55Do you have comments for their performance under different
58:57circumstances?
58:58Great question, and I don't.
59:01I mean, there has been...
59:02This is an area where I think
59:04it'd be great to have more research on this topic.
59:06So I have this one paper with Holger Kern and Jennifer Hill
59:09where we sort of did try to kind of explore that.
59:14And honestly, what we found not surprisingly
59:16is that if that no unmeasured moderator assumption holds,
59:20all the different methods are pretty good and fine.
59:23And like, we didn't see much difference in them.
59:25If that no unobserved moderator assumption doesn't hold
59:28then of course, none of them are good.
59:29So it sort of is like similar to propensity score world.
59:33Like, the data you have is more important than what you do
59:35with the data in a sense.
59:38But anyway, I think that that is something that like,
59:40we need a lot more work on.
59:42One thing, for example, I do have a student working on this.
59:45Like, we're trying to see if your sample
59:47is a tiny proportion of the population, like how...
59:51Cause like there's different.
59:52That's one where like waiting might not work as well
59:54actually, who knows.
59:56Anyways, so like all of these different data scenarios,
59:58I think need a lot more investigation to have better
01:00:01guidance on when the different methods work well.
01:00:09Anything else or maybe we're out of time?
01:00:11I don't know, how tight you are at one o'clock.
01:00:20- I think we're at an hour, so let's...