Skip to Main Content

Designing randomized trials to understand treatment effect heterogeneity

November 17, 2020

Biostatistics Seminar - November 17, 2020

Elizabeth Tipton, PhD
Associate Professor, Department of Statistics, Northwestern University

ID
5892

Transcript

  • 00:00So welcome everyone,
  • 00:01it is my great pleasure to introduce
  • 00:04our seminar speaker today,
  • 00:06Doctor Elizabeth Tipton.
  • 00:07She's an associate professor
  • 00:09statistics the Co director of the
  • 00:12statistics or evidence based policy
  • 00:14and practice Center and a faculty
  • 00:16fellow in the Institute for Policy
  • 00:18Research at Northwestern University.
  • 00:20Unducted sentence research focuses on
  • 00:23the design and analysis of randomized
  • 00:25experiments with a focus on issues for
  • 00:28external validity and generalizability,
  • 00:30as well as meta analysis with the
  • 00:33focus on dependent effect sizes.
  • 00:35Um, today she's going to share with us
  • 00:37how to design randomized experiments
  • 00:39to better understand treatment effects.
  • 00:41Head virginity welcome best.
  • 00:43The floor is yours.
  • 00:44Thank you.
  • 00:45Thank you. I'm very excited to be here today.
  • 00:48I really wish I hear I wasn't talking
  • 00:50about my office slash closet and was
  • 00:52actually with you guys in person and
  • 00:55this is my first time doing slides where
  • 00:58I'm on the slide so it's a little.
  • 01:00It's a little strange.
  • 01:01I don't know what is the protocol
  • 01:04for questions. How do you guys?
  • 01:06How do you usually set this up?
  • 01:08Do people what's the norm?
  • 01:10Do you guys usually up jump in
  • 01:12with questions or save them for
  • 01:13the end? So I think as you prefer,
  • 01:16we can do either way. OK, I'm
  • 01:18just I won't be very good at
  • 01:20checking the chat, so if there's
  • 01:22a question if somebody can just
  • 01:23speak up that would be will do that.
  • 01:26I'll do that on the chat. OK, thank you.
  • 01:28OK so I just want to set out background
  • 01:31for what I'm talking about today,
  • 01:33which is I'm talking about randomized
  • 01:34trials an I realized that in a
  • 01:36Biostatistics Department, you guys.
  • 01:39The idea that randomized trials are
  • 01:41common is probably almost absurdly basic
  • 01:43for the world that you operate in,
  • 01:45but I do a lot of my statistical
  • 01:47work in the areas of education and
  • 01:50psychology and kind of in the field
  • 01:52experiments world and those areas.
  • 01:55Randomized trials have only
  • 01:56become common really,
  • 01:57I'd say the last 20 years.
  • 02:00So almost 20 years ago,
  • 02:03the Institute for Education Sciences
  • 02:05was founded in the Department of
  • 02:08Education in the US government,
  • 02:10and that has funded almost 500
  • 02:12what are called efficacy and
  • 02:15effectiveness trials and education.
  • 02:17Previous to that there were
  • 02:19very few of these.
  • 02:21There's also an increasing number of
  • 02:24nudge experiments in social psychology
  • 02:27experiments that are occurring in the world.
  • 02:31I know that there's a lot of rain in mice
  • 02:34trials occurring in developing countries,
  • 02:38so this is late in parallel,
  • 02:40maybe 2.
  • 02:42In public health,
  • 02:43they're being randomized trials there,
  • 02:45so I'm just sort of pointing out
  • 02:46that these are becoming increasingly
  • 02:48common for policy decisions,
  • 02:50not just individual decisions.
  • 02:53But the trials as there as they are
  • 02:56designed currently are not necessarily
  • 02:58ideal ideal in the sense that they are
  • 03:00not as big as we would like them to be.
  • 03:03In order to be able to really explore
  • 03:06the data well, there often in,
  • 03:08you know,
  • 03:09sort of somewhat small samples
  • 03:11of clusters in the in the kind of
  • 03:13education world that I work in it.
  • 03:16They're very often just simple to
  • 03:18arm designs 5050 treatment control.
  • 03:19I much less common to see things
  • 03:22like step wedge or smart designs,
  • 03:24so those are trickling in, I think.
  • 03:28And the goal of these the is often
  • 03:30to get into some things like clearing
  • 03:32House of some places so that policy
  • 03:35making decision makers can use the
  • 03:38information from the trials to
  • 03:40make decisions.
  • 03:41But the problem which is the focus
  • 03:43of my talk is that there very
  • 03:45often been taking place in samples
  • 03:48that are purely of convenience,
  • 03:50which makes thinking about generalizability
  • 03:52and heterogeneity rather difficult.
  • 03:58If the treatment of X very if treatment
  • 04:00effects vary across individuals or
  • 04:01they vary across clusters in some way,
  • 04:04then it's pretty straightforward to
  • 04:05see as a group of statisticians here
  • 04:07that the average treatment effect
  • 04:09you would get in the population.
  • 04:11Is probably not exactly the same thing as
  • 04:14the average treatment effect in the sample,
  • 04:16and that these could be quite different
  • 04:18if treatment effects vary a lot aniff
  • 04:21depending upon how the sample is selected.
  • 04:23So there has been an increasing
  • 04:25amount of work in this area.
  • 04:27There's a couple of papers I think that
  • 04:30are particularly helpful if there's
  • 04:32a paper in education where they're
  • 04:34looking at bias from non random treats.
  • 04:37Non random treatment assignment or
  • 04:39they show that the bias of external
  • 04:41validity is on the same order as
  • 04:44internal validity bias and so to do
  • 04:46so they hear they sort of leverage
  • 04:49and natural experiment with a
  • 04:51randomized trial to look at this.
  • 04:52And that's worked by Bell,
  • 04:54Olson, Oregon, Stewart.
  • 04:55There's also work showing that.
  • 04:58In education and the kinds of schools
  • 05:00and school districts that take part in
  • 05:03randomized trials are different than
  • 05:05the populations of various populations.
  • 05:08At something like the Institute of
  • 05:10Education Sciences might be interested in,
  • 05:13so I have a paper out with Jessica Spy,
  • 05:16Brooke Ann are students looking
  • 05:18at 37 randomized trials and the
  • 05:20samples of schools taking part in
  • 05:22those studies and comparing them
  • 05:24to various populations of schools.
  • 05:27In the US.
  • 05:28There's also work hidden behind me.
  • 05:31By Liz Stewart.
  • 05:33San colleagues looking at school districts
  • 05:36and a couple of other papers as well,
  • 05:40and these find fairly consistent things.
  • 05:43For example,
  • 05:44that large school districts are
  • 05:47overrepresented in research.
  • 05:49Relative to the size of districts in the US.
  • 05:52There's been also a lot of work
  • 05:55in this area of generalizability
  • 05:57and post hoc corrections.
  • 05:59I started into this work looking at
  • 06:02using post stratification as a way
  • 06:05of estimating a population average
  • 06:07treatment effect from a sample.
  • 06:09There's also been work using
  • 06:11inverse probability weighting,
  • 06:12maximum entropy weighting
  • 06:14bounding approaches.
  • 06:15There have been some approaches
  • 06:17that focus on little,
  • 06:18so I'm thinking like.
  • 06:22Here's the paper and Stuart San Green.
  • 06:24I think that does that,
  • 06:25so there's been like a kind of
  • 06:27a flurry of method development.
  • 06:29I think here in this area of
  • 06:31thinking you know,
  • 06:32how do I actually estimate this?
  • 06:34If I have population data of different forms
  • 06:36and I have sample data of different forms,
  • 06:39how can I actually estimate a
  • 06:41population average treatment effect?
  • 06:43But when I first started doing this work,
  • 06:46I realized in a series of examples
  • 06:47that I was working on that the
  • 06:50effectiveness of these methods is
  • 06:51often severely limited in practice
  • 06:53because of undercoverage and
  • 06:55what I mean is that it you can't.
  • 06:57If it turns out that your population
  • 07:00has there's a part of the population
  • 07:02that's just not represented in the trial,
  • 07:04there's really not much statistical
  • 07:06magic you can do.
  • 07:07You can make some assumptions,
  • 07:09but you can't really re wait
  • 07:11something that doesn't exist, and the.
  • 07:14It's it's really a reflection of lack
  • 07:16of positive ITI in the study. Yes,
  • 07:18exactly thanks. Yeah exactly.
  • 07:19And yeah, I'm just using survey
  • 07:21sampling language for the same thing.
  • 07:23That's right, yeah, and so.
  • 07:25And that's the lack of positive ITI
  • 07:27often arises because people aren't
  • 07:29thinking about what the population is
  • 07:31in advanced and so it's very tricky
  • 07:33for them after the fact to generalize,
  • 07:35because it turns out that maybe
  • 07:36this what I as analyst him now
  • 07:38trying to think of as the population
  • 07:40isn't exactly the population,
  • 07:42but it's very hard for people to
  • 07:44articulate what the population is,
  • 07:46and so spent a lot of time just trying to.
  • 07:49You're out what the population actually is,
  • 07:51and if that's population is meaningful,
  • 07:53as if that's a population that even matters.
  • 07:55So I realized I pivoted a bit.
  • 07:57I realized that you could do
  • 07:58a lot of this with statistics,
  • 08:00but you were going to be limited
  • 08:02if you didn't design better trials.
  • 08:08And so that's allowed me to think.
  • 08:10Well, why don't we just start?
  • 08:13The beginning and do this
  • 08:14do a better job this so why?
  • 08:17What have we started at the
  • 08:18beginning of our studies by asking
  • 08:20what the target population of the
  • 08:22intervention was thinking about
  • 08:24inclusion and exclusion criteria,
  • 08:25I think it helped.
  • 08:27This probably matters even more with
  • 08:29like comorbidities and you know.
  • 08:31I like rolling out people.
  • 08:32You're doing a study on depression,
  • 08:34but you rule out people with anxiety,
  • 08:36and that's like a big problem
  • 08:38for the interpretation since they
  • 08:39were highly related to each other.
  • 08:41This is true in education as well.
  • 08:43So like,
  • 08:44what are the inclusion exclusion
  • 08:45criteria for your trial and how might
  • 08:48that affect where you can generalize?
  • 08:50And then also thinking about
  • 08:51background characteristics and
  • 08:52contextual variables that might
  • 08:54moderate the intervention's effect
  • 08:55and the tricky part here is,
  • 08:56there's a little bit of a circularity
  • 08:59which I'm going to keep coming
  • 09:00back to in what I'm talking about,
  • 09:02which is in order to know these,
  • 09:05you know,
  • 09:05we don't know what these are in advance,
  • 09:08and we don't have a lot of
  • 09:10knowledge generated to date about
  • 09:11what these variables are,
  • 09:13because studies have not been designed to
  • 09:14estimate or test hypothesis about moderation,
  • 09:17and so instead we have to sort of think
  • 09:19through what we think might matter.
  • 09:21Using,
  • 09:22you know,
  • 09:22not a great source of knowledge here,
  • 09:25but the idea is that you sort of
  • 09:27take all of this information and
  • 09:29then you use this to create to
  • 09:31use sampling methods to actually
  • 09:33design recruitment procedures
  • 09:35like using stratified sampling.
  • 09:36Figuring out if you should
  • 09:38you know within Strata,
  • 09:39using balanced sampling or random
  • 09:41sampling and thinking about sort of
  • 09:44ways in which you can increase the
  • 09:46coverage so you can have positive
  • 09:48ITI for the whole target population,
  • 09:50so that when you do you know.
  • 09:52When you do need to make adjustments
  • 09:54at the end of your trial using
  • 09:56these statistical methods,
  • 09:57they are in a realm in which
  • 09:59they can perform well.
  • 10:03This this sort of lad me to thinking about
  • 10:05tools for general for generalization,
  • 10:07and so I just want to highlight this
  • 10:10because I think this is a good strategy
  • 10:13for methods people to think about.
  • 10:16So I I thought will nobody is going to do
  • 10:19what I'm telling them to do if I don't
  • 10:21build a tool because the kind of people.
  • 10:24Clan randomized trials,
  • 10:25at least in my domain,
  • 10:27don't often have statisticians ready at
  • 10:28the ready to work with them on things,
  • 10:31and they are often writing grant
  • 10:33proposals before they've got funding,
  • 10:34and so it's very possible that they're not
  • 10:37going to think about generalization or or
  • 10:39have the training or tools to do it so.
  • 10:42I got a grant from the Spencer
  • 10:44Foundation and then I've had follow up
  • 10:46money from the Institute of Education
  • 10:49Sciences to build this tool called
  • 10:51the Generalize are that uses some
  • 10:53basic design principles standalone.
  • 10:54It's got.
  • 10:55It's very focused on the user
  • 10:58experience and it in the background
  • 11:01has the Common Core of data which is.
  • 11:04An annual census of the public
  • 11:06schools in the US so that the data
  • 11:08is already been cleaned and set up.
  • 11:11We're adding in right now the iPads data,
  • 11:13which is higher Ed data in the US and
  • 11:16so the idea is somebody could go in
  • 11:18and walk through inclusion exclusion
  • 11:20criteria identified moderate orsan.
  • 11:22It would build you stratified
  • 11:24recruitment plan in less than an hour.
  • 11:26You could leave with a list of all
  • 11:29the schools and start being able
  • 11:31to recruit with the.
  • 11:35Great, I've had this going since 2015
  • 11:37and it was very slow going for awhile.
  • 11:39This is sort of. I just realized this
  • 11:42year that I could actually extract
  • 11:44a lot of user data and So what you
  • 11:47can see here is actually it was slow
  • 11:49going and I had some early adopters.
  • 11:51These are people that would be star users,
  • 11:54so many of them are planning
  • 11:56randomized trials,
  • 11:56but there was actually a very big
  • 11:59jump that occur this summer and
  • 12:01that's based on this jump.
  • 12:02I actually started digging through
  • 12:04things and realized that.
  • 12:05Institute of Education Sciences that actually
  • 12:07enacted requirements for generalizability.
  • 12:09In their request for proposals,
  • 12:11and so you can see that what
  • 12:13I already always speculated,
  • 12:15which is that funders really drive change.
  • 12:17So once funders said you need to
  • 12:20pay attention to generalizability,
  • 12:22people actually started paying attention
  • 12:24to generalizability in their proposals.
  • 12:27OK, so this I just wanted to give
  • 12:28you all of this background as a
  • 12:30way of explaining sort of my like
  • 12:32where I'm coming from in the in
  • 12:34the in heterogeneity and how I'm
  • 12:36thinking about this.
  • 12:37Um?
  • 12:37So everything I've talked about
  • 12:39is sort of averaging over hedge
  • 12:42and 80 when we talk about analyze.
  • 12:44Estimate an average treatment
  • 12:46effect for a population,
  • 12:47assuming that there's variation of
  • 12:49effects and we're averaging over those.
  • 12:52But to average over those requires
  • 12:54that we know something about how
  • 12:56treatment affects very and very often,
  • 12:58and I would say this is the
  • 13:00in general we don't,
  • 13:02and the reason that we don't have a
  • 13:04great handle on this is because sample
  • 13:06size and sample sizes in randomized trials.
  • 13:09I've been very focused on
  • 13:11the after treatment effect.
  • 13:12Moderators have only become more of a focus,
  • 13:15at least in education.
  • 13:17More recently,
  • 13:17and I think that's true in
  • 13:20psychology and related areas as well.
  • 13:23And they are often more like
  • 13:24exploratory analysis at the end,
  • 13:26so you end up with these problems
  • 13:28where moderador effects don't get
  • 13:29replicated and they don't get
  • 13:31replicated because there was.
  • 13:32You know,
  • 13:33who knows how many statistical tests
  • 13:35conducted in order to find those moderators.
  • 13:38So they're not very stable,
  • 13:39and we don't really necessarily
  • 13:41understand or their underpowered
  • 13:42deeply underpowered like you just
  • 13:44have a very homogeneous sample.
  • 13:45And so how are you going to find
  • 13:47a treatment effect variation if
  • 13:49there's not much variation in your
  • 13:51sample to start with,
  • 13:52so they're often an afterthought,
  • 13:54but I what I noticed overtime is
  • 13:56that as generalizability has become
  • 13:57something people are paying attention to,
  • 13:59people are also starting to pay
  • 14:01attention to the idea that you
  • 14:03could predict treatment effects,
  • 14:04or that you could identify subgroup
  • 14:06effects and that this might
  • 14:08be very useful information.
  • 14:10Which led me to start thinking about
  • 14:12how you would design trials for this.
  • 14:15So what I'm going to,
  • 14:16what I'm leading up to is talking
  • 14:18about designing trials to think
  • 14:20about heterogeneity.
  • 14:20So I'm just going to start with
  • 14:22like a little
  • 14:23bit of a background here.
  • 14:25So we're going to assume that you've got.
  • 14:27I'm assuming we've got units
  • 14:28which are usually here.
  • 14:29Let's say students insights
  • 14:30which might be schools,
  • 14:32and I'm doing a randomized trial,
  • 14:33and I've got these potential outcomes.
  • 14:36And so we've got both an
  • 14:39average and intercept in these,
  • 14:41and we've also got some sort of
  • 14:43fixed variation that we can explain.
  • 14:46And then we have this other parts that
  • 14:48are not affected by the treatment.
  • 14:51We've got some site level and individual
  • 14:54residuals and some idiosyncratic errors.
  • 14:56But what we're interested in
  • 14:58really is in these these moderate
  • 15:00yrs of treatment effects,
  • 15:02and so you could say that Delta
  • 15:040 is the difference in averages.
  • 15:07I'm assuming these are centered variables,
  • 15:09so this is nicely the difference in
  • 15:12averages and that the vector Delta
  • 15:14is the difference between these
  • 15:16effects of the treatment and then
  • 15:18under treatment and under control.
  • 15:26A lot of so as you have to
  • 15:27think about interpretability
  • 15:28here of what I mean by Delta.
  • 15:31By this by these deltas and
  • 15:33how to standardize because we
  • 15:34wanted to talk about these,
  • 15:36they need to have a mean of 0,
  • 15:38but also in order to talk
  • 15:40about treatment effects.
  • 15:41Sort of done in general for
  • 15:43developing things like power,
  • 15:44we often standardize them so
  • 15:45often we have effect sizes for
  • 15:47the average treatment effect,
  • 15:48their standardized in relation
  • 15:49to the variation in the in the
  • 15:52sample and the population.
  • 15:53And so here I'm going to sort of
  • 15:55say we what we need to do is we
  • 15:58need to standardize the covariates
  • 15:59and we need to standardize the
  • 16:02covariates in relation to the
  • 16:03population standard deviation.
  • 16:04This might not seem like this
  • 16:06is like a radical statement,
  • 16:08but if you look into the power analysis
  • 16:10literature on how to conduct power
  • 16:12analysis for moderate are tests,
  • 16:14they are typically standardizing in
  • 16:15relation to the sample standard deviation,
  • 16:17and in doing so,
  • 16:18it makes it impossible to see
  • 16:20how your sample actually how you
  • 16:22choose your sample might matter.
  • 16:24Isibaya standardizing by this
  • 16:25fixed value by the population,
  • 16:27you've identified a population,
  • 16:28and now we're standardizing by
  • 16:29that population standard deviation.
  • 16:31That will make the role that the
  • 16:34sample plays here much more clear.
  • 16:37OK,
  • 16:37so the fact that we randomized
  • 16:39to treatment and control allows
  • 16:41us to estimate these dealt these
  • 16:43Spectre Delta using some generalized
  • 16:46least squares of some sort,
  • 16:48and I'm being a little big here
  • 16:49because I'm trying to encapsulate
  • 16:51cluster randomized randomized
  • 16:53block individual randomizer,
  • 16:54all like versions of this.
  • 16:56OK, so I can do so, I can separate.
  • 17:00These are at additive or rather subtractive.
  • 17:02I guess the treatment and
  • 17:04the control their step.
  • 17:06You can separate them.
  • 17:07And and through this I can
  • 17:10think about statistical power,
  • 17:12and for each of these moderador effects.
  • 17:16And so,
  • 17:16one way you can do that is through the
  • 17:18minimum detectable effect size difference.
  • 17:20I don't know how common this
  • 17:22is used in the sort of.
  • 17:23Biostats world,
  • 17:24but it's a pretty common metric that's
  • 17:27used in cluster randomized trials in.
  • 17:30The world I work in,
  • 17:32and so it's nice because it's
  • 17:34sort of easily interpretable,
  • 17:36so this is the smallest affect size
  • 17:39that you could for a for a given
  • 17:42Alpha level which is affecting this.
  • 17:44Msub knew this is.
  • 17:47That's like the critical value.
  • 17:49This is sort of the smallest
  • 17:51true effect that you could detect
  • 17:53with the power that you with like
  • 17:5580% power for example.
  • 17:58And so this is like a general form for this,
  • 18:01and So what I'm showing is that its function,
  • 18:04can I like move my hands?
  • 18:06I don't know.
  • 18:07I'm just going to involve a
  • 18:09lot of never mind,
  • 18:10so it's a function of the variation
  • 18:13in the population in that covariate.
  • 18:15It's also a function of S,
  • 18:17which is you could think of as
  • 18:19the sort of covariance matrix
  • 18:20of the X is in the sample,
  • 18:23so those are different.
  • 18:24And then it's a function of N,
  • 18:26which is the sample size per cluster.
  • 18:29I'm assuming it's constant here.
  • 18:30J is the number of clusters and P
  • 18:33is the proportion in treatment. So
  • 18:36that what is Sigma XK squared? Is?
  • 18:38The population SD of effect modifier or
  • 18:40the population variance effect modifier?
  • 18:42Is the population variance
  • 18:44of the effect moderate or modifier?
  • 18:46But then your square rooting it so
  • 18:48it's going to be gradual scale.
  • 18:53OK, so just to give you a couple
  • 18:55of special cases where you can
  • 18:57sort of parse out some things.
  • 18:59So there's been previous work.
  • 19:01I meant to include a citation here by
  • 19:04Jessica Spy, Brooke and colleagues.
  • 19:05That's looking at power
  • 19:06for moderate are tests.
  • 19:08And so here's 2 cases we have.
  • 19:10Site lab, site level,
  • 19:12moderate yrs and individual
  • 19:14level moderate yrs and.
  • 19:15I'm I'm taking basically what they've got,
  • 19:17but re tweaking part of it.
  • 19:21Because I'm factoring out this Sigma
  • 19:24squared and noting that you can
  • 19:26actually pull out this thing called
  • 19:28RXK at the front and the RXK is
  • 19:30this ratio of the standard deviation
  • 19:32of the covariate in the sample
  • 19:34compared to the standard deviation
  • 19:36of the covariate in the population.
  • 19:38And So what you can see here is by
  • 19:40doing that you by rewriting it.
  • 19:43This way you can see that our XK is
  • 19:45having just as much of an effect on
  • 19:48statistical power as things like the
  • 19:50square root of N or the square root of P.
  • 19:54These other parameters that most power
  • 19:56analysis has spent has focused on,
  • 19:57and that's true.
  • 19:58You know, in any of these designs.
  • 20:00Love seeing it in any of these designs.
  • 20:03RX shows up. OK,
  • 20:06so if RX is something that matters for power,
  • 20:10a question will be well.
  • 20:12What are people doing in
  • 20:14practice right now right?
  • 20:16So maybe maybe people are choosing
  • 20:19fairly heterogeneous samples,
  • 20:20and So what I've got here is 19.
  • 20:23This is 19 randomized trials
  • 20:26in education that we extracted
  • 20:28information from and we've got.
  • 20:31So these are box plots of
  • 20:33values across each of these 19,
  • 20:35and for each of them I've calculated for
  • 20:38holding like the US population of school.
  • 20:40So this is like the US population
  • 20:43of let's say elementary schools.
  • 20:45I'm looking at the ratio of this
  • 20:47moderate are in the sample in these
  • 20:49studies to the ratio of that to that
  • 20:52standard deviation in the population OK,
  • 20:55and then I'm looking at boxplots of
  • 20:57this and what you can see like do this,
  • 21:00don't?
  • 21:01OK, what you can see here is that
  • 21:03the bar at the bottom.
  • 21:05Can you see my cursor?
  • 21:08Can't tell if you guys can see my curse.
  • 21:10No,
  • 21:11you can't see my cursor.
  • 21:12OK,
  • 21:12so the bar at the bottom there's an R
  • 21:14X = sqrt 1/2 and then there's a line
  • 21:17across the top that's like a dashed one.
  • 21:19That's the R X = sqrt 2.
  • 21:22OK,
  • 21:22and so you can see that most
  • 21:24studies are actually below there,
  • 21:26less heterogeneous than the population there.
  • 21:28Below this line for one,
  • 21:30and they're actually far less heterogeneous
  • 21:32than the than the population there are.
  • 21:35Actually.
  • 21:35If you look at these median values,
  • 21:38many of them are closed 2.5,
  • 21:40so they are about 1/4 of the variation
  • 21:43as we're seeing in the population.
  • 21:45So this gives you a sense that if.
  • 21:49That there's, uh,
  • 21:50an opportunity to improve, right?
  • 21:52Like I could increase power not just by
  • 21:55increasing my sample size or increasing.
  • 21:58My sample size in schools or
  • 21:59my sample size of the number of
  • 22:01schools which are pretty expensive,
  • 22:03but I could also increase my power by
  • 22:05changing the kinds of samples that I select.
  • 22:10And so that's where these numbers came from.
  • 22:12They should have gone to
  • 22:14slightly different order.
  • 22:15So the main point is that design
  • 22:17sensitivity, the way we think,
  • 22:19whether that statistical power or
  • 22:21standard errors or whatever framework
  • 22:22that there this is proportional in
  • 22:25some way to this RX value that we can
  • 22:27improve our design sensitivity by
  • 22:29choosing a more heterogeneous sample.
  • 22:32And so funny, I must have like put
  • 22:34this in here twice on accident.
  • 22:36So this is the same thing but
  • 22:38with a line through it.
  • 22:40OK so if once you have that insight
  • 22:42that heterogeneity matters,
  • 22:43that it's actually something that
  • 22:45we can include in our power analysis
  • 22:47and that is something that is not
  • 22:49actually happening in practice.
  • 22:50Then we can start thinking about how
  • 22:52we might plan studies differently.
  • 22:54OK, so if.
  • 22:56So how can we improve statistical power?
  • 22:58Well,
  • 22:58a lot of the literature as I was saying,
  • 23:01is focused on improving power by
  • 23:03increasing sample size or instead.
  • 23:04But what I'm arguing here is that you
  • 23:07could increase instead this ratio.
  • 23:08You could increase the variation
  • 23:10in your sample choosing more
  • 23:11heterogeneous sample annual have
  • 23:12more statistical power for test
  • 23:14of heterogeneity of moderators,
  • 23:15and So what would you do with this?
  • 23:18It would mean you know purposefully
  • 23:19choosing sites that were more extreme,
  • 23:21it might end,
  • 23:22and that's easy enough to do in one variable.
  • 23:25And I'm going to talk a little bit about
  • 23:28how to do that with multiple variables.
  • 23:31So with a simple,
  • 23:32let's just say we had one single continuous.
  • 23:35Moderate are like this is a normal
  • 23:37distance normally distributed.
  • 23:38This theory would tell us that we
  • 23:41should choose half of our sample.
  • 23:42We would choose half of our sample
  • 23:45from the upper from the upper an
  • 23:47lower tails and choosing them from
  • 23:49the upper and lower tails were
  • 23:51actually getting an RX of sqrt 2.
  • 23:53This is actually a rather large,
  • 23:55so this is going to create a much
  • 23:57more homogeneous heterogeneous sample,
  • 23:59thus increasing our statistical power
  • 24:01because it's more heterogeneous than the.
  • 24:03In the population.
  • 24:07Similarly, if we had two
  • 24:09correlated normal variables,
  • 24:10when we this is, you know,
  • 24:12we could imagine getting the corners of this.
  • 24:14These are all principles, by the way,
  • 24:17straight up from experimental design.
  • 24:18If you think if you think about it,
  • 24:21there are principles from like 2.
  • 24:23You know two factor studies or
  • 24:25multi factor studies where you're
  • 24:26manipulating and instead I'm just saying
  • 24:29instead of manipulating these factors
  • 24:31were now measuring these factors.
  • 24:32Someplace you could choose them
  • 24:34to be extreme design points.
  • 24:36It gets a little harder once
  • 24:37things become correlated,
  • 24:38so when they become correlated,
  • 24:40I don't have as much sample available
  • 24:42to me because there's just fewer
  • 24:44population units in those corners,
  • 24:46and so it's going to become
  • 24:47increasingly hard as I add variables,
  • 24:49it might become harder and harder in
  • 24:52order to figure out what these units
  • 24:54are that I could be sampling from.
  • 25:02So I started thinking about
  • 25:04how you would do this,
  • 25:06and I realized that there is actually
  • 25:08a literature on this in in the world
  • 25:11of sort of industrial experiments
  • 25:12and industrial experiments,
  • 25:14and in psychology people again are
  • 25:16thinking about multi factor studies.
  • 25:18So they're thinking about things you could
  • 25:21better in the experimenters control.
  • 25:25But we could instead bout sampling in
  • 25:27the same as the same kind of thing.
  • 25:30Except that we don't have control
  • 25:32over manipulating them.
  • 25:33We can find these units and as as
  • 25:35an alternative approach,
  • 25:36so one of the things we want to
  • 25:38do is we want to make sure that
  • 25:40we observe the full range of
  • 25:43covariate values in the population,
  • 25:45so it requires us to actually think,
  • 25:47you know,
  • 25:47explore the population data and
  • 25:49make sure that we can understand
  • 25:51what that range of values is.
  • 25:53We might need to think carefully about
  • 25:55moderators that are highly correlated.
  • 25:57It can be very hard to D alias these effects,
  • 26:00so if you have two highly correlated
  • 26:02moderators. I think about that.
  • 26:04I have two highly correlated
  • 26:06moderators like this.
  • 26:06If I want to estimate and
  • 26:08understand moderators of X,
  • 26:10if I want to explore X&Z and
  • 26:11these are highly correlated,
  • 26:13I'm going to really need to make sure
  • 26:15I have those off diagonals that are
  • 26:17kind of more rare in order to help me
  • 26:20separate these effects an understand
  • 26:22the unique contribution of each.
  • 26:23The other is that if we might
  • 26:25have many potential moderators
  • 26:27that we're interested in,
  • 26:28and so we're going to have to
  • 26:30anticipate this in advance and think
  • 26:32carefully about sort of compromises,
  • 26:34we might need to make here.
  • 26:36But also think very carefully,
  • 26:37like we're not going to be able to expand
  • 26:40this study to have a much bigger sample.
  • 26:42So a lot of what I'm trying to
  • 26:44operate under the constraint here is,
  • 26:46let's not change the sample size
  • 26:47if we don't change the sample size,
  • 26:49but we instead change the height
  • 26:51types of units in our study,
  • 26:52how much better can we do?
  • 26:55OK,
  • 26:56so this leads to a principle found
  • 26:58in response surface models called D
  • 27:00optimality and so AD optimal design.
  • 27:03This is work from the 40s and 60 Forties,
  • 27:0750s and 60s.
  • 27:08A lot of work here by Walt Kiefer,
  • 27:11and a lot of people in
  • 27:13industrial experiments.
  • 27:14The idea is that you can instead focus
  • 27:16on the generalized variance an you want
  • 27:19to minimize the generalized variance,
  • 27:22which is the determinant.
  • 27:24So D is for determinant.
  • 27:26And so the design that meets
  • 27:27this criteria is one that also
  • 27:29conveniently minimizes the maximum
  • 27:31variance of any predicted outcome
  • 27:33based upon these covariates.
  • 27:34So this is great if what you're
  • 27:36headed for is trying to make predict
  • 27:39individual treatment effects or
  • 27:40site specific treatment effects.
  • 27:43The nice thing about a method
  • 27:44that's been around for a while
  • 27:46is that there's been algorithms
  • 27:48developed for doing this.
  • 27:49Better out Federov win algorithm
  • 27:51is widely used and variations
  • 27:53of it and that these are package that
  • 27:55there are like statistics package
  • 27:57already available that do this.
  • 27:59So in our there's something called
  • 28:01the ALG design package that is set
  • 28:04up to actually work through this.
  • 28:06So designs that we know are optimal.
  • 28:08In other contexts.
  • 28:09You know like our designs like Latin squares,
  • 28:12designs etc all become special cases of this.
  • 28:15So this is a much more general framework
  • 28:19that doesn't require as many assumptions.
  • 28:22OK, so once you start down this
  • 28:24path you realize too that there are
  • 28:27some tradeoffs here, so we have.
  • 28:29You can easily imagine that the design that
  • 28:32is optimal for an average treatment effect,
  • 28:35which might be a representative sample.
  • 28:37That sort of like a miniature of the
  • 28:39population on covariates is likely not
  • 28:42optimal for some of these standardized
  • 28:44effect size differences where we might
  • 28:46need to oversample in order to estimate,
  • 28:49estimate, estimate these,
  • 28:50and so there's another.
  • 28:52Benefit of this approach,
  • 28:53which is that you can focus on
  • 28:56augmentation approach and what that
  • 28:58means is you can actually say using
  • 29:00these algorithms better billable 30
  • 29:02sites or already for I've already got
  • 29:0530 design run so the language of this
  • 29:08is these sites become designed runs.
  • 29:11And I need to select 10 more.
  • 29:15Meaning population units, what?
  • 29:1710 units can I augment it with that will
  • 29:20improve that will make this as D optimal
  • 29:23as possible given these constraints,
  • 29:26and so instead so we're thinking
  • 29:29of population units as possible
  • 29:31design runs and sample as design
  • 29:34runs that we've chosen to use.
  • 29:36OK, so I'm just going to go through
  • 29:40an example to talk about this.
  • 29:45Don't have a ton more slides
  • 29:47I should say so success.
  • 29:49OK, so here's an example.
  • 29:51The success for all evaluation was
  • 29:53an elementary school reading program
  • 29:55evaluated between 2001 and 2003.
  • 29:57The reason I like to use this example.
  • 30:00Is that it's old enough that strangely,
  • 30:02they actually published in their
  • 30:04paper a list of schools they
  • 30:06actually named the schools in their
  • 30:08study and characteristics of them.
  • 30:09I have other data on other studies
  • 30:11where people have shared with me
  • 30:13the names of the schools involved,
  • 30:15but it's all like I have to keep it
  • 30:17secret for the for IR be reason,
  • 30:20so that the fact that this is
  • 30:22available makes it easier to use.
  • 30:23So what I did is I went back
  • 30:25and looked at the Common Core of
  • 30:28data I identified based upon the
  • 30:30study that they were.
  • 30:31In the way that they
  • 30:32talked about their study,
  • 30:34that Title One elementary schools
  • 30:35in the US at that time might be a
  • 30:38reasonable population to think that
  • 30:40they were trying to sample for.
  • 30:42Title one schools have at least 40%
  • 30:45students on free or reduced lunch and
  • 30:46meet a few other characteristics and
  • 30:48then they identified in the paper 5
  • 30:51variables that they thought were possible.
  • 30:53Moderators that would be really
  • 30:55important to include here,
  • 30:56so they talked about total school
  • 30:58enrollment being a factor,
  • 30:59racial and ethnic composition
  • 31:01of the students.
  • 31:02So I'm using that here as the
  • 31:04proportion of students that are black
  • 31:06and the proportion that are Hispanic
  • 31:08and SES meaning and a professor but
  • 31:10proportion at free and reduced lunch.
  • 31:12And they also talk about Urbanicity
  • 31:14because they tried to make sure they
  • 31:16had some urban schools in rural
  • 31:18schools and some other schools.
  • 31:19So I should say in previous work of
  • 31:21mine I've used this as an example
  • 31:23and then this study actually ends
  • 31:25up being a fairly representative
  • 31:27sample of the population,
  • 31:28which is interesting.
  • 31:29Is it because they had no real way
  • 31:31of they weren't doing it totally in
  • 31:33a way that allowed them to compare
  • 31:35this or to choose this in a way,
  • 31:37but they did a lot of work to try
  • 31:40to be representative,
  • 31:41and this is much more representative
  • 31:43sample then.
  • 31:44I take the modal study is in this domain.
  • 31:47OK,
  • 31:47So what I did for for this example
  • 31:50as I'm comparing for you the actual
  • 31:53sample that they selected,
  • 31:55so it's always these five moderators.
  • 31:57The actual sample selected a
  • 31:59representative sample selected.
  • 32:00If I instead I use something like
  • 32:03stratified random sampling.
  • 32:04The optimal sample based upon these
  • 32:06five covariates using this ALG
  • 32:08design package and then various
  • 32:10augmentation allocations.
  • 32:11And So what I would do here as I'd say.
  • 32:16So if I if I took 41, you know.
  • 32:19So if I used 36 sites that were
  • 32:21selected with random sampling with
  • 32:23stratified random sampling and
  • 32:25then I reserved five of them that
  • 32:27were selected using D optimality
  • 32:28and then I would change, you know,
  • 32:30the number of those.
  • 32:32So you could see this sort of effect.
  • 32:34You know that augmentation would
  • 32:36have and then for each of these I
  • 32:38calculated a few different statistics.
  • 32:40So you can see how this works.
  • 32:42So one of them is D.
  • 32:44This measure of the optimality.
  • 32:46And I'm going to show you relative
  • 32:48measures because it's a little easier
  • 32:51to see with relative measures.
  • 32:52I'm also including B,
  • 32:54which is generalizability
  • 32:55index that I developed.
  • 32:57It ranges from zero to one and one means
  • 32:59that the sample isn't exact miniature
  • 33:02of the population on these covariates.
  • 33:040 means they like are completely
  • 33:07orthogonal to each other and.
  • 33:10Chip in its the index is highly related
  • 33:14to measures of undercoverage and how and
  • 33:18the performance of reweighting methods.
  • 33:21And then the mean are meaning the ratio
  • 33:24between the the ratio between the
  • 33:27standard deviation in the sample and
  • 33:30population across these five covariates.
  • 33:34OK, so this is what we get out of this,
  • 33:36and so I just want to talk through this and
  • 33:38I'm happy to answer questions if there's.
  • 33:40I know there's a lot going on here.
  • 33:43Really wish I could figure out how to do a.
  • 33:47Pointer.
  • 33:49I don't think I can point out that way.
  • 33:52OK, so OK.
  • 33:52So what I have going on here is the number
  • 33:55of sites randomly selected is left to right?
  • 33:58So on the left is the
  • 33:59is the D optimal sample,
  • 34:01meaning the whole all 41 sites
  • 34:03were actually selected using a
  • 34:05D optimal algorithm on the.
  • 34:07Right is the ideal for the
  • 34:09average treatment effect.
  • 34:10We've used random sampling to stratified
  • 34:13random sampling and just like the
  • 34:15the sample an in the bar right right
  • 34:17there that like right up there.
  • 34:19This Gray vertical bar is the actual
  • 34:22study values for each of these.
  • 34:24OK so you can see the actual study
  • 34:26and then what I've got are three
  • 34:29different lines going on here.
  • 34:31So one line that's sloping down in
  • 34:33solid is the relative D optimality value,
  • 34:36so this is.
  • 34:37You know the highest value is if
  • 34:40it was a D optimal allocation.
  • 34:42This is a ratio,
  • 34:44and then I've got the B index,
  • 34:46which is the generalizability index.
  • 34:48Is the other solid line going up,
  • 34:50and so, not surprisingly,
  • 34:52that's increasing as we get
  • 34:53to stratified sampling,
  • 34:54so these are going in opposition
  • 34:57to each other.
  • 34:58Is what I'm saying and then this
  • 35:00relative average standard deviation.
  • 35:01Is this dotted bar line?
  • 35:03So what so the main message of
  • 35:05this is that these are going in
  • 35:08opposite directions right that?
  • 35:09The the sample that is optimal for the
  • 35:12average treatment effect is on the right.
  • 35:14The sample that is optimal for
  • 35:16moderate are effects is on the left,
  • 35:18and so there's there's tradeoffs
  • 35:20involved in these that what's best
  • 35:23for one is not best for the other.
  • 35:25But there's other lessons in here,
  • 35:27wow, so the the B index is,
  • 35:29which is a measure of similarity
  • 35:31between the sample and population,
  • 35:32is actually not that bad
  • 35:34for the optimal sample.
  • 35:35So these these the sample is
  • 35:36different from the population.
  • 35:38You'd have to do some re waiting,
  • 35:40but it wouldn't be a tremendous
  • 35:41amount of re waiting to be able to
  • 35:44estimate the average treatment effect.
  • 35:45And so one lesson that you
  • 35:47could think of it from.
  • 35:48This is if you actually if we designed
  • 35:50randomized trials to test moderators,
  • 35:52we'd actually be in a pretty
  • 35:54good space to test moderators.
  • 35:55And to estimate the average treatment effect,
  • 35:57it wouldn't be that far off.
  • 35:58It wouldn't be.
  • 36:00It wouldn't be terrible,
  • 36:01and that makes sense because we're
  • 36:03covering so much of the population
  • 36:06by getting her across a bunch of
  • 36:09moderators that we can do so that
  • 36:11we can re wait when in a domain in
  • 36:13which there's no act extrapolations,
  • 36:15we have positive ITI we can re wait next.
  • 36:18Another sort of I think finding here is
  • 36:21if we look over at the right hand side.
  • 36:24If we do, you know the trade off is.
  • 36:27If I do select for the average
  • 36:30treatment effect.
  • 36:31I do get a tremendously,
  • 36:32you know I can select for the average
  • 36:34treatment effect and do pretty
  • 36:36well for the average human effect,
  • 36:37but not do so well for that.
  • 36:39For the moderators,
  • 36:40and so what's ideal for average
  • 36:42is definitely not deal for
  • 36:43the moderate are tests.
  • 36:44So and then the third thing would
  • 36:46be if you look at the actual study.
  • 36:48As I was saying,
  • 36:49they actually did a pretty good job
  • 36:51in terms of representativeness.
  • 36:52You can see that that top dot,
  • 36:54but if you look at the bottom
  • 36:56at the other two dots you can
  • 36:59see they didn't do so well for.
  • 37:01Being able to test these these moderators.
  • 37:07OK, so in case that was not intuitive
  • 37:09another way you could look at this
  • 37:11is to actually just look at what
  • 37:14these samples these these features
  • 37:16of these samples would look like.
  • 37:18So in the top the top row here
  • 37:20are population distributions.
  • 37:22Of these five covariates that
  • 37:24were sort of identified,
  • 37:25and then at the bottom row is
  • 37:27actually the study that they had.
  • 37:29So what their actual sample looked like.
  • 37:33And then the middle is what AD
  • 37:35optimal sample would look like.
  • 37:36And then I've overlaid on here.
  • 37:38These are values,
  • 37:39so giving you a sense if R is greater is 1.
  • 37:43It means the sample is like the same
  • 37:46standard deviation as in the population.
  • 37:48If R is greater than one,
  • 37:50it means I've got more heterogeneity
  • 37:52in my sample than in my population,
  • 37:54which improves my ability to
  • 37:55estimate moderate are effects.
  • 37:57And So what you see are a few things.
  • 38:00One is in that the optimal sample is.
  • 38:02It pushes things towards the extremes,
  • 38:05right?
  • 38:05It's pushing them towards the
  • 38:07extremes to get endpoints which we
  • 38:09know from basic experimental design,
  • 38:11improved abilities.
  • 38:12The other nice thing though,
  • 38:14is a concern always when you're
  • 38:16doing experimental design like this
  • 38:18is that you're going to get your
  • 38:20highly focused on like a linearity
  • 38:22assumption that you're going to your.
  • 38:25Your ideal sample would have a
  • 38:27strong linearity assumption to it,
  • 38:29but because you have multiple variables an
  • 38:31because not all design runs are possible.
  • 38:34In the population,
  • 38:35you end up with these middle points
  • 38:37as well so you don't end up with
  • 38:39only things on both extremes.
  • 38:41You end up with some middle points
  • 38:43which allow you to be able to estimate
  • 38:47nonlinear relationships as well.
  • 38:48Me and a Third Point with me.
  • 38:50You can see that you would just end
  • 38:52up with a lot more variation and so
  • 38:54not surprisingly, total students,
  • 38:56which, again schools studies,
  • 38:57tend to over represent very large
  • 38:58schools and large school districts.
  • 39:00You can see this is a place where
  • 39:02there would be really a real
  • 39:04opportunity for a change that in
  • 39:05the sample this was less than one
  • 39:07an in the in the optimal sample
  • 39:09it would be greater than three.
  • 39:11But you can see this for most of
  • 39:14these variables that you could.
  • 39:16You could potentially improve your
  • 39:17power and ability to estimate things
  • 39:19related to demographics as well.
  • 39:21And in my paper I actually show that
  • 39:23because many of these are proportions,
  • 39:26you can actually also think about
  • 39:27student level moderate yrs because
  • 39:29proportions conveniently like the
  • 39:31variation in proportions at the
  • 39:33individual level as a function of
  • 39:34the proportion at the aggregate.
  • 39:36And so you can actually kind of workout
  • 39:39a way to select your samples so that you can.
  • 39:42Estimate individual affects,
  • 39:44not just cluster aggregates
  • 39:47for those variables.
  • 39:50OK, and so then the final point.
  • 39:52I just want to make is that the
  • 39:54other thing that this shows is that
  • 39:57there's real benefit to augmentation
  • 39:59so. Maybe? You know,
  • 40:00maybe I'm not going to be able to
  • 40:03convince people to go switch to selecting
  • 40:05their samples based upon extremes.
  • 40:07But maybe you can convince people
  • 40:10that they could preserve 5 or 10.
  • 40:12You know 10% or 25% of their
  • 40:15sample for D optimality.
  • 40:16So you choose.
  • 40:17In this case it would be like choose
  • 40:1930 of your sites using stratified
  • 40:22sampling to represent the population,
  • 40:24and then look for like an additional
  • 40:27class tenor 11 sites that might be
  • 40:29more extreme that allow you to make
  • 40:32sure that you can estimate these.
  • 40:34These moderate are effects
  • 40:36that you're interested in.
  • 40:37And you can see that doing so key
  • 40:39file with these little lines you can
  • 40:41see that doing so doesn't have a huge
  • 40:43effect on the average treatment effect,
  • 40:44but it does greatly improve
  • 40:46your ability to test moderators.
  • 40:50OK, so just to wrap up my take home
  • 40:53points today, I suppose would be that
  • 40:56the design of randomized trials has big
  • 40:59implications for ability to generalize.
  • 41:02And that I think we, I think what I've
  • 41:04seen over time is that people who are
  • 41:07starting to pay attention to that,
  • 41:08and they're starting to think
  • 41:10about how populations you know.
  • 41:11What are the populations I would
  • 41:13add as a side benefit of this is
  • 41:15I've I've watched as people in
  • 41:17asking people to scientists to think
  • 41:19about what the population is.
  • 41:20It actually sometimes make some change
  • 41:22with the intervention is because you kind
  • 41:24of have to realize like is this is this.
  • 41:27If this is the population,
  • 41:28is this the right intervention?
  • 41:31The second sort of point I would say,
  • 41:34is that if we want to sort of estimate
  • 41:36and test hypothesis and moderators
  • 41:38that we would be wise to actually
  • 41:41plan to do so and to think about how
  • 41:44to have better design sensitivity
  • 41:45and statistical statistical power for
  • 41:47doing so instead of waiting until
  • 41:49the end and then the last point is
  • 41:51just that this augmentation approach
  • 41:53indicates that we don't have to
  • 41:55be perfect at this like that,
  • 41:57we could just, you know,
  • 41:58use do this for part of our sample.
  • 42:01And we would be better off and then
  • 42:04I guess I would say maybe my general
  • 42:06philosophy in all of this design is that.
  • 42:08What I'm trying to do is to get people
  • 42:10to think differently and plan differently,
  • 42:13and by doing so,
  • 42:14even if you don't succeed 100%,
  • 42:15you're better off than you would
  • 42:17have been before,
  • 42:18and you're now able to be in the
  • 42:20realm in which you have positive
  • 42:22ITI and heterogeneity,
  • 42:23and you're able to actually
  • 42:25use statistical methods.
  • 42:26To get better estimators at the end.
  • 42:31Thank you, this is all my contact
  • 42:33information and this is the paper
  • 42:36that this talk is really about.
  • 42:38I'm happy to answer questions.
  • 42:41Thanks so much. Best,
  • 42:43I think that's really nice talk and
  • 42:45thank you for being so inspiring.
  • 42:47And maybe let's open to questions
  • 42:491st to see if we have any
  • 42:52questions from the audience.
  • 42:55If so, please speak up or, you know,
  • 42:58send a chat. Either one is OK.
  • 43:07And if not, I can go first.
  • 43:09'cause I do have a couple of questions.
  • 43:13So, so first of all, I think you
  • 43:15know there is a constant tension.
  • 43:17Of course, like you know when we work with
  • 43:20really large trials in the healthcare system,
  • 43:23I think there is a tension between how do we
  • 43:26better represent the population of interest?
  • 43:29Because we want to get effectiveness
  • 43:31information 'cause we're
  • 43:32spending millions of dollars.
  • 43:33But also I think there is a concern
  • 43:36on you know how to really better
  • 43:38engage these large clusters,
  • 43:40large healthcare systems or large clinics,
  • 43:42etc. And so I think.
  • 43:44People end up getting convenience
  • 43:46samples because that's reality.
  • 43:47Even though I do believe that there's
  • 43:49so much more to improve because
  • 43:51they're spending so much money right.
  • 43:53And then in the end,
  • 43:55you know they may be answering a
  • 43:57different question if they have a
  • 43:59very highly selected sample and then
  • 44:01people also worry about you know,
  • 44:03like you know there are some
  • 44:05disparities in their sample selection,
  • 44:06so that you're basically not covering
  • 44:08you know people with maybe more
  • 44:10vulnerable conditions etc in your study,
  • 44:12but you wish to answer questions.
  • 44:14What is population?
  • 44:15So I feel like all of this very,
  • 44:18very relevant, at least to my work.
  • 44:21And so I really appreciate you know this
  • 44:24aspect of how to design styles better.
  • 44:271 one of the questions I have
  • 44:29is that generally,
  • 44:30you know we may not really know priority
  • 44:33what the effect modifiers are in planning.
  • 44:36The trial that we may have
  • 44:38not enough knowledge amount.
  • 44:40So how does that generally come into
  • 44:43the discussion in the design stage?
  • 44:45Is it the tradition that in educational
  • 44:48studies we have a lot of prime knowledge
  • 44:51on what these effect modifiers are
  • 44:53or? No, so I think this is actually
  • 44:55one of the hardest parts, right?
  • 44:57Like I just laid out.
  • 44:58Sort of, if we knew what the
  • 45:00why zeros and Y ones were,
  • 45:01this is what we would.
  • 45:02You know this is that would be optimal,
  • 45:04but I could be wrong on
  • 45:06what those are, right?
  • 45:09And I don't know.
  • 45:10I mean, I think so.
  • 45:12There's sort of what I call the
  • 45:14usual suspects in education,
  • 45:15which are like race class and gender,
  • 45:18which are really more of concerns
  • 45:19about disparity or about closing
  • 45:21achievement gaps in various ways.
  • 45:23And so those in depth and
  • 45:24urbanicity I would add seems to be
  • 45:27something that people often like.
  • 45:28What add into that as characteristics.
  • 45:30Those are the ones that
  • 45:32people most often use.
  • 45:33But the and those are
  • 45:35available in population data,
  • 45:36which is the other thing
  • 45:38that your limit your.
  • 45:39A real limiter is what is available
  • 45:41in the population, sure.
  • 45:43What I gather is more likely to be a moderate
  • 45:46are or something like baseline achievement,
  • 45:49right?
  • 45:49So if my outcome is achievement then I would,
  • 45:52I would think that what the
  • 45:55achievement is baseline in any
  • 45:56of these places would matter.
  • 45:58That's harder to get an education.
  • 46:00I mean that information from places,
  • 46:02so there's been some work trying to
  • 46:05equate tests across across states.
  • 46:06I guess that they do.
  • 46:08Sometimes
  • 46:08they use gain scores just to subtract off
  • 46:11that baseline achievement, right? They
  • 46:13do. Yeah, exactly,
  • 46:14but the problem is that like if you
  • 46:16wanted to use state tests or something,
  • 46:19there are different tests in every state,
  • 46:21and so there's all of these
  • 46:23equating issues that go in with it.
  • 46:25My guess is that implementation is another
  • 46:27one that people often come up with is
  • 46:29like something with implementation.
  • 46:31Now this is tricky because implementation
  • 46:33is coming after assignment and
  • 46:34so it's really like a mediator.
  • 46:36But if you think about often,
  • 46:38if you think implementation
  • 46:39may be part of what is leading
  • 46:41to treatment effect variation,
  • 46:42then you can kind of think well what.
  • 46:45Affects implementation and so
  • 46:46people can sometimes think a little
  • 46:48more carefully about what affects
  • 46:50implementation like Oh well,
  • 46:51it's probably.
  • 46:52You know, it's probably easier to implement
  • 46:54this in schools that are like this.
  • 46:57Then schools that are not like that.
  • 47:01You might try to find various
  • 47:03measures of this for the
  • 47:04implementation that sounds more like a.
  • 47:08It's sort of a version of multiple
  • 47:10treatments, and it's a violation of
  • 47:11the suitable condition, probably.
  • 47:12Yeah, yeah, exactly yeah.
  • 47:13So I mean so it it gets.
  • 47:15It gets tenuous. Yeah, I don't.
  • 47:17I don't have this is, you know,
  • 47:19this is like I when I first started
  • 47:21doing this work I was like,
  • 47:23well assuming moderate yrs and assume a
  • 47:25population moving on as a statistician.
  • 47:26But actually those are the two
  • 47:28hardest things when working with
  • 47:29people in planning these trials
  • 47:31is thinking about what they are.
  • 47:32I'll give you an example though.
  • 47:34Uh, like a positive case which was.
  • 47:37I was part of designing something called
  • 47:39the National Study of learning mindsets,
  • 47:41which is we randomly sampled
  • 47:43100 high schools in the US,
  • 47:45and then we randomly and then the students.
  • 47:48There were.
  • 47:49Ninth graders were in the study
  • 47:51and so 9th graders were randomly
  • 47:53assigned to either using a computer
  • 47:55based intervention to a growth
  • 47:57mindset intervention or something
  • 47:59that was not growth mindset that was
  • 48:02just sort of control condition and.
  • 48:05And in doing that we had the social
  • 48:07psychologist I was working with had
  • 48:09a lot of questions like we had a
  • 48:11lot of hard questions about these
  • 48:13moderators and they had a lot of
  • 48:15theories about what they might be like.
  • 48:17So we oversampled like we.
  • 48:19Looked at for example,
  • 48:22proportion of students that are
  • 48:25minorities in the school and then.
  • 48:28And when we started we wanted to
  • 48:30stratify on that as well as school
  • 48:32at a measure of sort of school
  • 48:34achievement as well,
  • 48:35and so we needed to be able to
  • 48:38cross these in a way in order to.
  • 48:41In order to D alias these trends and so
  • 48:43that they could estimate one without,
  • 48:46you know,
  • 48:46without estimating with separated
  • 48:48from the other. So a lot of it.
  • 48:50So I mean,
  • 48:51in some places people are much more better,
  • 48:53much better theoretically.
  • 48:54Thinking about this,
  • 48:55I think some fields are better at
  • 48:57thinking about these mechanisms than
  • 48:59other fields are, but yeah, it's really hard.
  • 49:02It's really hard.
  • 49:02So my my other other than my
  • 49:04like standard set, you know,
  • 49:06race, class and gender is,
  • 49:08I often ask people to to think about.
  • 49:11Watch what variables might just be
  • 49:13related to other things, right?
  • 49:14That if you could.
  • 49:15If you can think of it as like I
  • 49:17ultimately want to test moderators
  • 49:19that I don't really know exactly
  • 49:21what they are,
  • 49:22but I need to get variation in them,
  • 49:24and that means probably by getting
  • 49:26variation in something else.
  • 49:27I'm going to get variation in those as well,
  • 49:29so.
  • 49:31The size of your site, you know,
  • 49:33I think,
  • 49:34is one place where you know an education.
  • 49:37You can see that everybody's in
  • 49:39very large sites.
  • 49:40And So what if we increase the variation?
  • 49:43The variation of district size
  • 49:44and school size?
  • 49:45It seems like has to increase
  • 49:47variation of some other things as
  • 49:49well. Agreed, agreed.
  • 49:50Yeah, I think another aspect why I so
  • 49:53appreciate like the aspect of effect
  • 49:55modifiers is that it really is a way to
  • 49:58move forward with information from Co.
  • 50:00Buryats and then when we talk
  • 50:02on 80 in a randomized study,
  • 50:05we often ignore covariates and
  • 50:06then just hold that the unadjusted
  • 50:09analysis provides unbiased estimates,
  • 50:11even though that may come with
  • 50:13a larger variation.
  • 50:14So by really talking about effect modifiers,
  • 50:17we somehow incurve those information,
  • 50:19but perhaps even in the estimation
  • 50:21of the average affect,
  • 50:22which can increase precision.
  • 50:24So yeah.
  • 50:27Yeah.
  • 50:30Yeah I haven't questioned,
  • 50:32so I actually have two questions,
  • 50:35so seems you're you're interested
  • 50:37in both individual level
  • 50:39an cluster level moderators right? When
  • 50:41you have cluster level moderators,
  • 50:44how does that work with
  • 50:46the augmentation design?
  • 50:47'cause you mentioned that
  • 50:49in the orientation design,
  • 50:51you might want to pick like.
  • 50:5410 or 30% of the sides.
  • 50:56An kind of like choose
  • 50:58them samples from those.
  • 50:59But how do you choose those 3%?
  • 51:02You choose those third percent with
  • 51:04respect to the cluster level modelers.
  • 51:06You could do it with respect to either.
  • 51:09You can do it with respect there
  • 51:12because it depends the way you
  • 51:14enter them into the model so.
  • 51:18You work out so you can work out that if
  • 51:20if I'm interested in the individual level.
  • 51:23Moderate are that that what I need
  • 51:25to do is I need the I actually
  • 51:27need to include as a covariate the
  • 51:30interaction between like X an 1 -- X.
  • 51:32That's what I included here
  • 51:34as the covariates.
  • 51:35I'm 'cause I want to increase the
  • 51:37variation within sites right?
  • 51:38And so you could do it either way,
  • 51:41because what it's doing,
  • 51:42what the augmentation approach does?
  • 51:44Is it assess is how much variation
  • 51:46you have in those 30 sites already.
  • 51:49And then it looks for possible design runs,
  • 51:52meaning other samples.
  • 51:54Other places that would greatly improve that.
  • 51:58And it just it doesn't algorithmically,
  • 51:59which is nice.
  • 52:00The that I would say I should add an
  • 52:02extra benefit of this is concerned
  • 52:03with all of this sample recruitment
  • 52:05is that there's non response.
  • 52:07You're never going to get,
  • 52:08you know it's not like I can just say like.
  • 52:11Here's your.
  • 52:11Here's your like 40 sites go ask
  • 52:13them and they're going to say yes,
  • 52:15but with the augmentation approach if
  • 52:16somebody says no you can like throw
  • 52:18that out and then go look for it
  • 52:20like what's the next best alternative
  • 52:21so you can keep kind of iterating.
  • 52:24So,
  • 52:25so in our current application,
  • 52:26I think the attributes are all cluster level.
  • 52:30Information right summary statistics
  • 52:31yeah yeah, well that's what
  • 52:33I have right here.
  • 52:34That's in the in the slides but
  • 52:36I didn't include in here though
  • 52:38is like you could it's but it's
  • 52:40in the paper is you could also do
  • 52:43this with individual level only.
  • 52:44For proportions mean because just because
  • 52:46the proportion works out that you can get.
  • 52:48You can think about this with
  • 52:50the same statistics you would
  • 52:52get at the cluster level.
  • 52:53You can't get the variation you can with
  • 52:55a normal like a continuous variable.
  • 52:57I can't get the.
  • 52:59I don't have the standard deviation.
  • 53:01Insights I can't do that,
  • 53:03right, right?
  • 53:05Also, the other question
  • 53:06is about so it seems
  • 53:08like all these designs are
  • 53:09under the assumption that you're
  • 53:11interested in all the moderators,
  • 53:13like equally like meaning that you're
  • 53:15not like you don't have like primary
  • 53:17moderators that you're interested in
  • 53:19estimating the moderate effect on and
  • 53:20then you have a couple of them that.
  • 53:23I mean, if you you
  • 53:25can. So I mean, what's great?
  • 53:27I mean, I think about this like this area
  • 53:29is that it's been so richly developed
  • 53:32in this other sort of design runway
  • 53:34is that you can actually add weights.
  • 53:36So you can say like I'm more like or
  • 53:39more interested in this variable than
  • 53:41that variable, and it will focus.
  • 53:43You know it will focus on one
  • 53:45variable over the other.
  • 53:47Because you Can you imagine like
  • 53:49that ask like that D matrix.
  • 53:50The determinant of S.
  • 53:52You could just add weights into that.
  • 53:54So if you add weights into that
  • 53:56then you can start looking at the
  • 53:58determinant of that weighted version.
  • 54:00Right, so you would add weights
  • 54:02in that matrix and optimize that.
  • 54:03Yeah exactly,
  • 54:04if you add weight so that some
  • 54:06of the Kobe rates are getting
  • 54:07more weight than others.
  • 54:10So I guess just maybe more precisely,
  • 54:13I think the D optimality criteria.
  • 54:16Shouldn't that be the X
  • 54:18transpose V universe in general?
  • 54:20Just because you're working with
  • 54:22clustered randomized studies so
  • 54:23that the outcome correlation is
  • 54:25somehow included in that variance?
  • 54:27Is that what the algorithm is
  • 54:29trying to get in general for?
  • 54:32Yeah yeah.
  • 54:34Inverse, yeah, it's the X prime X inverse,
  • 54:37which is the covariance. Yeah,
  • 54:38but but really not so it you don't.
  • 54:41You don't need to have the
  • 54:42variance matrix of the outcome.
  • 54:46Exactly, you don't need to have the outcome,
  • 54:48it's all about the inputs, right?
  • 54:50But that's, which is why you
  • 54:51can do it in advance, right?
  • 54:53So it's all about the Android just nicely.
  • 54:56You can leverage population
  • 54:57data that you have totally.
  • 54:58And again, I assume in all of this
  • 55:01that like there's measurement error
  • 55:02and that you know you can just
  • 55:04sort of assume that like you're
  • 55:06not going to get it exactly right,
  • 55:08but my baseline comparison is always
  • 55:10what are we doing now versus what could
  • 55:12we be doing an like frankly anything.
  • 55:15Any you know it looks to me like we
  • 55:17have fairly homogeneous samples and
  • 55:19that any effort we can make to increase
  • 55:22that heterogeneity is an improvement.
  • 55:28So, well, I think we're about the hour,
  • 55:31but let's see if we have any
  • 55:33final questions from the audience.
  • 55:40Alrighty, if not, I think you know I'm,
  • 55:43I'm sure if you have any questions
  • 55:45that petition will we have to
  • 55:47answer them offline by email?
  • 55:48So thanks so much. Again, bath.
  • 55:50It's really nice to have you and thanks to
  • 55:53everybody for attending or see all of you.
  • 55:55Hopefully after the break so have
  • 55:58a great holiday. See you later.
  • 56:02Totally not master connect
  • 56:04alright, thanks again.
  • 56:05Talk to you later. Bye take care.