YSPH Biostatistics Seminar: “A Sequential Basket Trial Design Based on Multi-Source Exchangeability with Predictive Probability Monitoring”

Name: YSPH Biostatistics Seminar: “A Sequential Basket Trial Design Based on Multi-Source Exchangeability with Predictive Probability Monitoring”
Uploaded: 2022-11-17T19:43:27.0233333Z
Duration: 46 min 21 s
Description: Dr. Alex Kaizer, Assistant Professor, Center for Innovative Design and Analysis (CIDA) Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus November 15, 2022

November 17, 2022

Dr. Alex Kaizer, Assistant Professor, Center for Innovative Design and Analysis (CIDA) Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus

November 15, 2022

Information

ID: 9130
To Cite: DCA Citation Guide

Download Transcript

00:00<v Wayne>We introduce Dr. Alex Kaizer.</v>
00:04Dr. Kaizer is an assistant professor
00:07in the Department of Biostatistics and Informatics,
00:10and he's a faculty member
00:12in the Center for Innovative Design and Analysis
00:14at the University of Colorado Medical Campus.
00:19He's passionate about translational research
00:21and the development of normal
00:24and at clinical trial designs
00:25that the more efficient that they
00:27and factually utilize available resources
00:31including past trails and past studies.
00:34And Dr. Kaizer strives to translate
00:37(indistinct) topics into understandable material
00:41that is more than just a mask
00:42and something we can appropriate
00:44and utilize in our daily lives and research.
00:46Now let's welcome Dr. Kaizer.
00:53<v Alex>Thank you Wayne.</v>
00:54So apologies for my own technical difficulties today,
00:57but I'm going to be presenting on
00:59this idea of a sequential basket trial design
01:02based on multi-source exchangeability
01:04with predictive probability monitoring.
01:06And that is admittedly quite the mouthful
01:09and I'm hoping throughout this presentation
01:11to break down each of these concepts
01:13and ideas building upon them sort of until we have this
01:17cumulative effect that represents this title today.
01:21Before jumping into everything though,
01:23I do wanna make a few acknowledgements.
01:25This paper was actually published
01:26just at the end of this past summer in PLOS ONE,
01:28and so if you're interested in more of the technical details
01:31or additional simulation examples
01:33and things beyond what I present today,
01:35I include this paper here
01:36and we'll also have it up again at the very end of my talk
01:39just for reference.
01:40Also acknowledgement to Dr. Nan Chen
01:42who helped with some of the initial coding
01:44of some of these methods and approaches.
01:50So to set the context for my seminar today,
01:52I want to think here about
01:54this move towards precision medicine generally,
01:56but especially in the context of oncology.
01:59And so within oncology, like many other disciplines,
02:02when we design research studies,
02:03we often design these for a particular,
02:05what we might call a histology
02:07or an indication or a disease.
02:09So for example, we might say,
02:11"Well, I have a treatment or intervention
02:13which I hope or think will work in lung cancer,
02:15therefore I'm going to design
02:18and enroll in the study for lung cancer."
02:21Now this represents a very standard way
02:24that we do clinical trial design where we try to
02:26really rigorously define
02:27and limitedly define what our scope is.
02:31Now within oncology,
02:32we've had some exciting scientific developments
02:34over the past few decades.
02:36So now instead of seeing cancer as just
02:38based on the site like you have a lung cancer
02:40or a prostate cancer,
02:42we actually have identified that we can partition cancers
02:45into many small molecular subtypes.
02:48And further, we've actually been able to
02:49leverage this information by being able to say that
02:52what we thought of as a holistic lung cancer
02:54isn't just one type of disease,
02:57we can actually develop therapies that we hope to
02:59target some of these differences in genetic alterations.
03:02And this really gets to that idea of precision medicine that
03:05instead of throwing a treatment at someone
03:07where we think it should work
03:08or it has worked in some people on average,
03:10hopefully we can really target the intervention
03:13based off of some signal
03:14or some indication like a biomarker or a genotype
03:17that we actually hope could respond more ideally
03:20to that intervention.
03:22Now what's really interesting about this as well
03:26is that there could be a potential for heterogeneity
03:28in this treatment benefit by indication.
03:30And what I mean by that is once we've identified
03:33that there's these different genetic alterations,
03:35we've actually discovered that these alterations
03:37aren't necessarily unique to one site of cancer.
03:41For example, we may identify a genetic alteration
03:43in the lung that also is present in the prostate, liver,
03:46and kidney in some of those types of cancer.
03:49Now the challenge here though is that
03:50even though we have the same driver hypothetically
03:53based on our clinical or scientific hypothesis
03:56of that potential benefit for a treatment we've designed
03:59to address it,
04:00there's still may be important differences
04:01that we don't know about
04:02or have yet to account for based off of each site.
04:05So what may have worked actually really well in the lung
04:07for one given mutation,
04:09even for that same mutation, let's say present in the liver,
04:12may not work as well.
04:13And that's that idea of heterogeneity and treatment benefit.
04:16That we can have different levels of response
04:18across different sites or groups of individuals.
04:23Now the cool thing I think here
04:24from the statistical perspective is that the scientific
04:27and clinical advancements
04:28have also led to the revolution and statistical
04:31and clinical design challenges and approaches.
04:34And of course that's the sweet spot that I work at.
04:36I know many of you
04:37and especially students are training
04:39and studying to work in this area
04:40to collaborate with scientific and clinical researchers
04:43and leaders to translate those results
04:45in statistically meaningful ways
04:47and to potentially design trials or studies
04:49that really target these questions and hypotheses.
04:53Now specifically in this talk today,
04:56I'm going to focus on this idea of
04:58a master protocol design or evolution.
05:01And these provide a flexible approach
05:02to the design of trials with multiple indications,
05:05but they do have their own unique challenges
05:06that I'm gonna highlight a few of here in a second.
05:09But there are a variety of master protocols out there
05:11in case you've heard some of these buzzwords.
05:13I'll be focusing on basket trials today,
05:16but you may have also heard of things like umbrella trials
05:18or even more generally platform trial designs.
05:24And so one example of what this looks like here is
05:26this is a graphic from a paper in the
05:28New England Journal by Dr. Woodcock and LaVange,
05:31Dr. Woodcock being a clinician,
05:32and Dr. Lisa LaVange being a past president of
05:35The American Statistical Association,
05:37where they actually tried to put to rest some of the
05:39confusion surrounding some of these design types
05:42because it turns out,
05:43up until 2017 when we discussed these designs
05:46across even statistical communities
05:48and with clinical researchers,
05:50we tend to use these terms fairly interchangeably
05:53even though we are really getting at
05:55very different concepts.
05:57So for example,
05:58in the top here we have this idea of an umbrella trial
06:02and this is really the context of a single disease
06:04like lung cancer,
06:05but we actually then will screen for
06:07those genetic alterations
06:08and have different therapies that we're trying to
06:10target a different biomarker or genetic alteration for.
06:14This contrasts to what we're focusing on today below
06:17of a basket trial,
06:18we actually have different diseases or indications,
06:20but they share a common target or genetic alteration
06:23which we wish to target.
06:25And in this sense we can think of it potentially
06:26as them sharing a basket
06:28or sharing a sort of that commonality there.
06:32Now, this is a fairly broad general idea of these designs.
06:35And so I think for the sake of
06:37what we're gonna talk about today
06:38and some of the statistical considerations
06:40that can be helpful to do a bit of a
06:42oversimplification of what a design might look like here.
06:46And so on the slide that I've presented,
06:48I have this kind of naive graphic of actual baskets
06:53and we're going to assume that in each column
06:54we have a different indication or site of cancer
06:56that has that common genetic alteration.
06:59So for example, basket one may represent the lung,
07:02basket two may represent the liver and so on.
07:05Now when we're in the case of designing
07:07or the design stage of a study,
07:09we tend to make oversimplifying assumptions
07:11to address these potential calculations for
07:13power, sample size,
07:15and quantities that we're usually interested in
07:17for study design.
07:19So here on this graph,
07:20we are gonna make a assumption that
07:22there's only two possible responses in this planning stage.
07:25One is that the baskets have no response or a null basket,
07:29that's the blue colored solid baskets on the screen.
07:32The other case would be a alternative response
07:35where there is some hopeful benefit to the treatment
07:37and those are the open orange colored baskets
07:41we see on the screen here.
07:43Now, one of the challenges I think with basket trial design
07:46that can be overlooked sometime,
07:48even in this design stage,
07:49is that for a standard two arm trial,
07:51we do have to make this assumption of,
07:52what is our null hypothesis or response?
07:55What's our alternative hypothesis or response?
07:57We really only have to do that for one configuration
08:00or combination because we have two arms.
08:03In the case of a single arm basket trial here,
08:05we actually see that
08:06just by having five baskets in a study
08:08and many actual trials that are implemented at
08:10far more baskets,
08:12we actually see a range of just six possible
08:14binary combinations of the basket works
08:17or it doesn't work,
08:18ranging from at the extremes a global null
08:21where unfortunately the treatment does not work
08:23in any basket down to the sort of dream scenario
08:26where the basket is actually,
08:28or the drug actually works across all baskets.
08:31There is this homogenous actually response
08:34in a positive direction for the sort of clinical outcome.
08:39More realistically,
08:40we actually will probably encounter
08:42something that we see falls in the middle here,
08:44scenarios two through five,
08:46where there's some mixture of baskets
08:48that actually do show a response
08:49and some that for whatever reason we might not know yet,
08:51it just doesn't appear to have any effect
08:53and is a null response.
08:55So this can make it challenging
08:56for some of the considerations of
08:58what analysis strategy you plan to use in practice.
09:03And so to just, at a high level,
09:05highlight some of these challenges
09:07before we jump into the methods for today's talk.
09:11In practice, each of these baskets within trial
09:13often have what we call a small
09:14and or small sample size for each of those indications.
09:18It turns out
09:18once we actually have this idea of precision medicine
09:20and we can be fairly precise
09:22for who counts for a trial,
09:23we actually have a much smaller potential sample
09:26or population to enroll.
09:27This means that even though we might have a treatment
09:29that works really well,
09:30it can be challenging to find individuals who qualify
09:33or are eligible to enroll
09:34or they may have competing trials or demands
09:36for other studies or care to consider.
09:41As I've also alluded to earlier the challenge,
09:43we also have this potential for indication
09:44or subgroup heterogeneity and that may be likely.
09:47In other words,
09:48we might not expect the same response
09:50across all those baskets.
09:50And that gets back to the previous graphic
09:52on that last slide
09:53where we might have something like two null baskets
09:56and three alternative baskets.
09:57And that can make it really challenging in the presence
09:59of a small n to determine how do we
10:02appropriately analyze that data
10:03so we capture the potentially applications baskets
10:06and can move those forward so patients benefit
10:09while not carrying forward null baskets
10:11where there is no response for those patients.
10:16Statistically speaking,
10:17we also have these ideas of operating characteristics
10:19and in the context of a trial,
10:21what we mean by that is things like power
10:22and type one error
10:24and I just have additional considerations with respect to
10:26how do we summarize these?
10:28Do we summarize them within each basket or each column
10:31on that graphic on the previous slide,
10:32essentially treating it as a bunch of
10:34standalone independent one arm trials
10:36just under one overall study design or idea?
10:40Or do we try to account for the fact that we have
10:42five baskets enrolling like on the graphic before
10:45and we might wanna consider something like a
10:47family wise type one error rate
10:48where any false positive would be a negative outcome
10:52if we're trying to correctly predict
10:53or identify associations?
10:57Now the focus of today's talk,
10:58and I could talk about these other points
11:00till the cows come home,
11:02but I'm gonna focus today on
11:04depending on that research stage we're at,
11:06if it's a phase one, two or three trial,
11:08we may wish to terminate early for some reason like
11:10efficacy or futility.
11:12And specifically for time today,
11:13I'm gonna focus on the idea of stopping for futility
11:16where we don't wanna keep enrolling baskets
11:18that are poorly performing both for ethical reasons.
11:20In other words,
11:21patients may benefit from other trials or treatments
11:24that are out there and we don't wanna subject them to
11:26treatments that have no benefit.
11:28But also from a resource consideration perspective.
11:31You can imagine that running a study or trial is expensive
11:34and can be complicated.
11:36And especially if we're doing something like a basket trial
11:38where we're having to enroll across multiple baskets,
11:40it may be ideal to be able to drop baskets early on
11:43that don't show promise
11:44so we can reallocate those resources to
11:46either different studies, research projects,
11:49or trials that we're trying to implement or run.
11:55So the motivation for today's talk
11:57building off of these ideas is that
11:58I want to demonstrate that a design that's very popular
12:01called Simon's two-stage design is
12:04generally speaking suboptimal
12:05compared to the multitude of alternative methods
12:08and designs that are out there.
12:10And then this is especially true in our context of
12:12a basket trial where within the single study
12:15we actually are simultaneously enrolling
12:17multiple one arm trials in our case today.
12:20Then the second point I'd like to highlight is
12:22we can identify when methods for sharing information
12:24across baskets could be beneficial to further improve
12:27the efficiency of our clinical trials.
12:31And so to highlight this,
12:32I wanna first just build us through
12:34and sort of illustrate or introduce these designs
12:36and the general concepts behind them
12:37because I know if you don't work in this space
12:40it may be sort of just ideas vaguely.
12:43So I wanna start with the Simon two-stage design,
12:45that comparator that people are commonly using.
12:48So Richard Simon, and this is back in 1989,
12:51introduced what he called optimal two-stage designs
12:54for phase two clinical trials.
12:56And this was specifically in the context
12:57that we're focusing on today for a one sample trial
12:59to evaluate the success of a binary outcome.
13:02So for oncology we might think of this as a yes no outcome
13:05for is there a reduction in tumor size
13:07or a survival to some predefined time point.
13:13Now specifically what Dr. Simon was motivated by
13:16was the stage-two trials
13:18as it says in the title of his paper,
13:20and just to kind of
13:22give a common lay of the land for everyone,
13:23the purpose generally speaking of a phase two trial
13:26is to identify if the intervention
13:28warrants further development
13:30while collecting additional safety data.
13:32Generally speaking,
13:33we will have already completed what we call
13:35a phase one trial where we collect preliminary safety data
13:38to make sure that the drug is not toxic
13:40or at least has expected side effects
13:44that we are willing to tolerate for that
13:45potential gain in efficacy.
13:48And then in phase two here we're actually trying to say,
13:50"You know, is there some benefit?
13:51Is it worth potentially moving this drug
13:53on either for approval
13:54or some larger confirmatory study
13:57to identify if it truly works or doesn't?"
14:01Now the motivation for Dr. Simon is that
14:03we would like to terminate studies earlier,
14:05as I mentioned before,
14:06for both ethical and resource considerations
14:08that they appear futile.
14:09In other words, it's not a great use of our resources
14:11and we should try in some
14:12rigorous statistical way to address this.
14:17If you do go back and look at Simon's 1989 paper
14:20or you just Google this
14:21and there's various calculators that people have
14:22put out there,
14:23there are two flavors of this design that exist
14:26from this original paper.
14:27One is an optimal
14:28and one is called a minimax design.
14:31Within clinical trials,
14:32once we introduce this idea of stopping early potentially
14:36or have the chance to stop early based on our data,
14:39we now have this idea that there's this expected sample size
14:42because we could enroll the entire sample size
14:44that we planned for or we could potentially stop early.
14:47And since we could stop early or go the whole way
14:49and we don't know what our choice will be
14:51until we actually collect the data and do the study,
14:53we now have sample size of the random variable,
14:56something that we can calculate an expectation
14:58or an average for.
14:59And so Simon's optimal design tries to
15:01minimize what that average sample size might be in theory.
15:06In contrast, the minimax design
15:08tries to minimize whatever that largest sample size would be
15:11if we didn't stop early.
15:13So if we kept enrolling
15:14and we never stopped at any of our interim looks,
15:16how much data would we need to collect
15:18until we choose a design that minimizes that
15:20at the expense of potentially stopping early?
15:25I think this is most helpful to see the
15:27sort of elegance of this design
15:29and why it's I think so popular
15:30by just introducing example
15:31that will also motivate our simulations
15:33here that we're gonna talk about in a minute.
15:36We're gonna consider a study where
15:37the null response rate is 10%.
15:40And we're going to consider a target
15:42for an alternative response rate of 30%.
15:44So this isn't a situation where
15:45we're looking for necessarily a curative drug,
15:48but something that does show what we think of
15:49as a clinically meaningful benefit from 10 to 30%,
15:52let's say survival or tumor response.
15:55Now if we have these two parameters
15:57and we wanna do a Simon two-stage minimax design
16:00to minimize that maximum possible sample size
16:03we would enroll,
16:04we would have to also define
16:06the type one error rate or alpha
16:08that cancels a false positive.
16:10Here we're going to set 10% for this phase two design
16:12and we also wish to target a 90% power
16:15to detect that treatment of 30% if it truly exists.
16:20So we put all of this into our calculator
16:22to Simon's framework and we turn that statistical crank.
16:25What we see is that
16:26it gives us this approach where in stage one
16:29we would enroll 16 participants
16:31and we would terminate the trial or this study arm
16:34for futility if one or fewer responses are observed.
16:37Now if we observe two or more responses,
16:41we would continue enrollment
16:43to the overall maximum sample size that we plan for
16:45of 25 in the second stage.
16:48And at this point if four or fewer responses are observed,
16:51no further investigation is warranted
16:53or we can think of this as a situation where
16:55our P value would be larger than our defined alpha 0.1.
17:00Now, the nice thing here is that it is quite simple.
17:03In fact, after we trim that statistical crank
17:05and we have this decision rule,
17:06you in theory don't even need a statistician
17:08because you can count the number of responses
17:10for your binary outcome on your hand
17:12and determine should I stop early, should I continue?
17:15And if I continue,
17:16do I have some benefit potentially
17:18that says it's worth either doing a future study
17:21or I did a statistical test,
17:23would find that the P value meets my threshold
17:25I set for significance.
17:29Now, of course,
17:31it wouldn't be a great talk if I stopped there and said,
17:33"You know, this is everything.
17:34It's perfect. There's nothing to change."
17:36There are some potential limitations
17:37and of course some solutions I think
17:39that we could address in this talk.
17:42The first thing to note is that
17:43this is extremely restrictive in when it could terminate
17:46and it may continue to the maximum sample size
17:48even if a null effect is present.
17:50And we're gonna see this come to fruition
17:52in the simulation studies,
17:54but it's worth noting here it only looks once.
17:55It's a two stage design.
17:58And depending on the criteria you plug in,
18:00it might not look for quite some time.
18:0216 out of 25 total participants enrolled
18:05is still a pretty large sample size
18:07relative to where we expect to be.
18:11One solution that we could look at
18:12and that I'm going to propose today
18:14is that we could use Bayesian methods instead
18:16for more frequent interim monitoring.
18:18And this could use quantities that we think of
18:20as the posterior or the predictive probabilities
18:23of our data.
18:26Another limitation that we wish to address as well is that
18:28in designs like a basket trial
18:30that have multiple indications
18:31or multiple arms that have the same entry criteria,
18:35Simon's two-stage design is going to
18:36fail to take advantage of the potential
18:38what we call exchange ability across baskets.
18:41In other words, if baskets appear to have the same response,
18:45whether it's let's say that null
18:46or that alternative response,
18:48it would be great if we could
18:49informatively pull them together into meta subgroups
18:52so we can increase the sample size
18:54and start to address that challenge of the small n
18:56that I mentioned earlier for these basket trial designs.
18:59And specifically today we're going to examine the use of
19:02what we call multi-source exchangeability models
19:05to share information across baskets when appropriate.
19:08And I'll walk through a very high level sort of
19:10conceptual idea of what these models
19:12and how they work and what they look like.
19:17Before we get into that though,
19:18I wanna just briefly mention the idea of posterior
19:20and predictive probabilities
19:22and give some definitions here
19:23so we can conceptually envision what we mean
19:25and especially if you haven't had the chance
19:27to work with a lot of patient methods,
19:29this can help give us an idea
19:31of some of the analogs to maybe a frequentist approach
19:33or what we're trying to do here
19:34that you may be familiar with.
19:36Now I will mention,
19:37I'm not the first person to propose looking at
19:39Bayesian interim stopping rules.
19:41I have a couple citations here by Dmitrienko
19:43and Wang and Saville et all
19:45and they do a lot of extensive work in addition to
19:47hundreds of other papers considering
19:49Bayesian interim monitoring.
19:51But specifically to motivate this
19:53we have these two concepts that commonly come up
19:56in Bayesian analysis,
19:57a posterior probability or a predictive probability.
20:01The posterior probability
20:03is very much analogous to kinda like a P value
20:05in a frequent significance.
20:06It says, "Based on the posterior distribution
20:09we arrive at through a Bayesian analysis,
20:11we're gonna calculate the probability
20:13that our proportion exceeds the null response rate
20:15we wish to beat."
20:16So in our case, we're basically saying,
20:18"What's the probability based on our data
20:20and a prior we've given that the response is 10% or higher."
20:25So this covers a lot of ground
20:26'cause anything you know from 10.1 up to 100%
20:29would meet this criteria being better than 10%.
20:32But it does quantify,
20:34based on the evidence we've observed so far,
20:37how the data suggests the
20:40benefit may be with respect to that null.
20:42So in the case of let's say
20:44an interim look for futility at the data, we could say,
20:47if we just use Simon's two-stage design as our motivating
20:51ground to consider, we might say,
20:53"Okay, we have 16 people so far,
20:55what's the probability based on these 16 people
20:58that I could actually say
20:59there's no chance or limited chance
21:00I'm going to detect something in the trial here
21:03based on the data I've seen so far?"
21:05Now the challenge here is that
21:07it is based on off the data we've seen so far
21:09and it doesn't take into account the fact that we still have
21:12another nine potential participants to enroll
21:15to get to that maximum sample size of 25.
21:18That's where this idea of what we call a
21:20predictive probability comes in.
21:22We're considering our accumulated data
21:24and the priors we've specified in our Bayesian context,
21:27it's the probability that we will have observed
21:30a significant result if we've met
21:32and enrolled up to our maximum sample size.
21:36In other words, I think it's a very natural place to be
21:38for interim monitoring
21:39because it says based on the data I've seen so far,
21:41i.e the posterior probability,
21:43if I use that to help identify what are likely futures
21:46to observe or likely sample sizes
21:48I will continue enrolling to get to that maximum of 25,
21:51what's the probability at the end of the day
21:53when I do hit that sample size of 25,
21:56I will have a significant conclusion?
21:58And if it's a really low predictive probability,
22:00if I say there's only a 5% chance
22:02of you actually declaring significance if you
22:04keep enrolling participants,
22:06that can be really informative both statistically
22:08and for clinical partners to say
22:10it doesn't seem very likely that we're gonna hit our target.
22:13That being said,
22:15a lot of people are very happy to continue trials going
22:17with low chances or low probability
22:19because you're saying there's still a chance
22:21I may detect something that could be
22:23significant enough worth.
22:25So we'll see that across a range of these thresholds,
22:28the performance of these models may change.
22:32Now this brings us to a brief recap
22:34of sort of our motivation.
22:35I just spent a few minutes
22:37introducing that popular Simon two-stage design,
22:39the idea behind it,
22:40what it might look like in practice,
22:42as well as some alternatives with the Bayesian flare.
22:45The next part I wanna briefly address is that
22:47we can also now look at this idea
22:50of sharing information across baskets
22:52to further improve that trial efficiency
22:54'cause so far both Simon's design
22:56and the just using a posterior predictive probability
22:59for an interim monitoring will still treat each basket
23:02as its own little one arm trial.
23:07Now specifically today I'm gonna focus on this idea
23:10we call multi-source exchangeability models or MEMs.
23:13This is a general Bayesian framework
23:15to enable the incorporation of independent sources
23:18of supplemental information
23:20and its original work that I developed
23:22during my dissertation at the University of Minnesota.
23:25In this case,
23:26the amount of borrowing is determined by
23:27the exchange ability of our data,
23:29which in our context is really,
23:31how equivalent are the response rates?
23:33If two baskets have the exact same response rate,
23:35we may think that there's a higher probability
23:38that the true underlying population
23:40we are trying to estimate are truly exchangeable.
23:42We wish to combine that data as much as we possibly can.
23:46First is if again we see something that is like a
23:4810% response rate for one basket
23:50and a 30% response rate for another basket,
23:53we likely don't want to combine that data because
23:55those are not very equivalent response rates.
23:57In fact, we seem to have identified
23:59two different subgroups
24:00and performances in those two baskets.
24:04One of the advantages of MEMs relative to
24:07a host of other statistical methods that are out there
24:09that include things like power priors, commensurate priors,
24:12meta analytic priors, and so forth,
24:15is that we've been able to demonstrate that
24:16in their most basic iteration without
24:18any extra bells or whistles,
24:20MEMs are able to actually account for this heterogeneity
24:23across different potential response rates
24:26and appropriately down weight non-changeable sources.
24:29Whereas we show through simulation
24:30and earlier work some of these other methods without
24:33newer advancements to them
24:35actually either naively pull everything together
24:38even if there's non-changeable groups
24:41or they're afraid of the sort of presence of
24:43non-change ability and if anything seems amiss,
24:45they quickly go to an independence analysis
24:48that doesn't leverage this potential sharing
24:51of information across meta subgroups that are exchangeable.
24:56Now again, I don't wanna get too much into the weeds
24:59of the math behind the MEMs,
25:00but I will have a few formulas in a couple slides
25:02but I do think it's helpful to
25:03conceptualize it with graphics.
25:05And so here I just want to illustrate a very simplified case
25:08where we're gonna assume that we have a three basket trial
25:11and for the sake of doing an analysis with MEMs,
25:14I think it's helpful to also think of it as
25:16we're looking at the perspective of the analysis
25:18from one particular basket.
25:20So here on this slide here we see that we have this
25:24theta P circle in the middle
25:25and that's the parameter or parameters of interest
25:28we wish to estimate.
25:29In our case, that would be that
25:31binary outcome in each basket.
25:34Now, for this graphic we're using each of these circles here
25:37to represent a different data source.
25:40We're gonna say Y sub P is that primary basket
25:42that we're interested in or the perspective
25:44we're looking at for this example
25:46and Y sub one and Y sub two
25:48are two of the other baskets enrolled within the trial.
25:51Now a standard analysis
25:53without any information sharing across baskets
25:55would only have a data pooled from the observed data.
26:00I mean this is sort of the unexciting
26:01or unsurprising analysis
26:03where we basically are analyzing the data we have
26:05for the one basket that actually represents that group.
26:10However, we could imagine if we wish
26:11to pool together data from these other sources,
26:14we have different ways we could add arrows to this figure
26:17to represent different combinations of these groups.
26:21And this brings us to
26:22that multi-source exchangeability framework.
26:25So we see here on this slide,
26:26I now of a graphic showing four different combinations
26:29of exchangeability when we have these two other baskets
26:32that compare to our one basket of interest right now.
26:36And from top left to the bottom left
26:38in sort of a clockwise fashion,
26:39we see that making different assumptions from
26:42that standard analysis with no borrowing
26:44in the top right here where I'm drawing that arrow.
26:46So it is possible that
26:48none of our data sources are exchangeable
26:49and we should be doing an analysis that
26:51doesn't share information.
26:53On the right hand side that we might envision that
26:55well maybe the first basket or Y1 is exchangeable.
26:58So we wanna pull that with Y2 or excuse me with Yp,
27:01but Y2 is not.
27:03In the bottom right, this capital omega two,
27:05we actually assume that Y2 is exchangeable
27:07but Y1 is not.
27:09And in the bottom left we assume in this case
27:10that all the data is exchangeable
27:12and we should just pool it all together.
27:15So at this stage we've actually
27:17proposed all the configurations we can pairwise
27:20of combining these different data sources with Y sub P.
27:23And we know that these are fitting four now different models
27:26based off of the data
27:27because for example in the top left, that standard analysis,
27:30there is no extra information from those other baskets
27:33versus like in the bottom left,
27:35we basically have combined everything
27:36and we think there's some common effect.
27:39Now this leads to two challenges on its own
27:40if we just stopped here with the framework.
27:43One would be that we'd have this idea of maybe
27:44cherry picking or trying to pick whichever combination
27:47best suits your prior hypotheses clinically.
27:50And so that would be a big no-go.
27:51We don't like cherry picking
27:52or fishing for things like P values
27:54or significance in our statistical analyses.
27:57The other challenge also is that
27:59all of these configurations are just assumptions
28:01of how we could combine data
28:03but we know underlying everything in the population is that
28:05true assumption of exchange ability of
28:07are these baskets or groups truly combinable or not?
28:11And we're just approximating that with our sample.
28:13And so right now if we have four separate models
28:15and potentially four separate conclusions,
28:18we need some way of combining these models
28:20to make inference.
28:21And in this case we propose
28:23leveraging a Bayesian model averaging framework
28:26where we calculate in this case
28:28and in our formulas here,
28:29the queues represent a posterior distribution
28:32where I've drawn this little arrow
28:33and I'm underlining right now,
28:35that reflects each square's configuration of
28:39exchange ability for our estimates.
28:41And through this process
28:42we estimate these lower case omega model weights
28:45that tries to estimate the appropriateness
28:47of exchangeability with the ultimate goal of
28:50having a average posterior that we can use
28:53for statistical inference
28:54to draw a conclusion about the potential efficacy
28:57or lack thereof of a treatment.
29:02Now very briefly,
29:03because this is a Bayesian model averaging framework,
29:06just one of the few formulas I have in the presentation,
29:08we just see over here that we have
29:10the way we calculate these posterior model weights
29:13as the prior on each model
29:15multiplied by an integrated marginal likelihood.
29:18Essentially, we can think of that as saying
29:20based off of that square we saw on the previous slide
29:22and combining those different data sources,
29:25what is that estimate of the effect
29:26with those different combinations?
29:29One unique thing about the MEM framework
29:31that differs from Bayesian model averaging though is that
29:34we actually specify priors with respect to these sources.
29:37And in the case of this example
29:39with only two supplemental like sources for our graphic,
29:42it's not a great cost savings,
29:45but we can imagine that if we have more and more sources,
29:47there's actually two to the P if P's the number of sources,
29:50combinations of exchange ability
29:52that we have to consider and model.
29:54And that quickly can become overwhelming if we have
29:56multiple sources that we have to define
29:58for each one of those squares,
29:59what's my prior that each combination of exchangeability
30:02is potentially true.
30:04Versus if we define it with respect to the source,
30:06we now go from two to the P priors to just P priors
30:09we have to specify for exchangeability.
30:14A few more notes about this idea here
30:17and just really zooming in on
30:19what we're gonna focus on for today's presentation.
30:21We have developed both fully
30:23and empirically Bayesian prior approaches here,
30:25fully Bayesian meaning that it is defined a priori
30:29and is agnostic to the data you've collected,
30:31empirically Bayesian meaning
30:32we leverage the data we've collected
30:34to help inform that prior for what we've observed.
30:38Specifically there is a what we call a
30:40non constrained, or naive,
30:42empirically based prior
30:43where we would look through all of those growths we had
30:45and we would say, "Whichever one of these
30:47maximizes the integrated marginal likelihood
30:49that's the correct configuration
30:51and we're gonna put all of our eggs into that basket."
30:53Or 100% of the probability there
30:55and that's the only model we use for analysis.
30:59We know, generally speaking,
31:00since we went to all the work to defining
31:02all of these different combinations of exchangeability
31:04and that it's based off of samples,
31:06potentially small samples,
31:07that this can be a very strong assumption.
31:10And so we can also modify this prior
31:12to what we call a constrained EB prior,
31:15where instead of just giving everyone of those model
31:18sources in that MEM that
31:20maximizes the likelihood 100% weight,
31:23we instead give it a weight of what we're calling just B.
31:25This is our hyper prior value here
31:28where if it's a value of zero or up to one,
31:31it'll control the amount of borrowing
31:32and allow other nested models of exchangeability
31:36to also be potentially considered for analysis.
31:39So for example,
31:40if we do set a value of one
31:42that actually replicates the non constrained EB prior
31:44and really aggressively borrows from one specific model.
31:48At the other extreme here, if we set a value of zero,
31:50we essentially recreate an independent analysis
31:53like assign a two stage design or just using those
31:55Bayesian methods for futility monitoring
31:57that doesn't share information.
31:59And then any value in between
32:00gives a little more granularity or control
32:03over the amount of borrowing.
32:06So with that background behind us,
32:09I'm gonna introduce the simulation stuff
32:11and then present results for a couple
32:13key operating characteristics for our trial.
32:16In this case, we're going to assume for our simulations
32:18that we have a basket trial
32:19with 10 different baskets or indications.
32:21So again, that's 10 different types of cancer
32:23that we have enrolled that all have
32:25the same genetic mutation that we think is targeted
32:28by the therapy of interest.
32:31Like we had before,
32:32we're going to assume a null response P knot of 0.1 or 10%.
32:36And an alternative response rate of 30% or P1 here.
32:41We are gonna compare then three different designs
32:43that we just spent some time introducing and outlining.
32:46The first is a Simon minimax two-stage design
32:49using that exact set up that we had before
32:52where we will enroll 16 people,
32:54determine if we have one or fewer observations of success.
32:56If so, stop the trial.
32:58If not, continue on.
33:00In the second case,
33:01we're going to implement a Bayesian design
33:03that uses predictive probability monitoring
33:05but we don't use any information sharing
33:07just to illustrate that we can at least
33:09potentially improve upon the frequency
33:12in use of a interim monitoring above a single look
33:14from the Simon minimax design.
33:17And then the third design
33:18will add another layer of complexity
33:20where we will try to share information across baskets
33:23that have what we estimate to be exchangeable subgroups.
33:28One thing to note here is that
33:29we are setting this hyper parameter value B at 0.1.
33:32This is a fairly conservative value
33:34and admittedly for this design
33:36we actually did not calibrate specifically
33:39for the amount of borrowing to be 0.1.
33:40This is actually based off of
33:41some other prior work we've done
33:43and published on basket trials that just showed that
33:45in the case of an empirically Bayesian prior for MEMs,
33:49this actually allows information sharing
33:51in cases where there's a high degree of exchangeability
33:53and low heterogeneity
33:55and down leap it in cases where we might be
33:57a little more uncertain,
33:58so it's a little more conservative
33:59but we'll see in the simulation results
34:01there are some potential benefits.
34:05For each of the scenarios we're gonna look at today,
34:08we will generate a thousand trials
34:10with a maximum sample size of 25 per basket.
34:14We're gonna look at two cases,
34:16there's a few other in the paper
34:17but we're gonna focus on first the global scenario
34:20where all the baskets are either null
34:21or all 10 baskets have some meaningful effect.
34:24And this is the setting where
34:25information sharing methods like meds
34:27really should outperform anything else
34:29because everything is truly exchangeable
34:31and everything could naively be pooled together
34:34because we're simulating them to have the same response.
34:37We'll then look at what happens if we actually have
34:39a mixed scenario,
34:40which I think is actually more indicative
34:42of what's happened in practice
34:43with some of the published basket trials
34:45and clinically what we've seen from applications
34:47of these types of designs.
34:49Specifically here, we're gonna look at the case where
34:52there are eight null baskets and two alternative baskets.
34:57A few other points just to highlight here.
34:59We're going to assume a beta 0.5 0.5 prior
35:02for our Bayesian models.
35:04This essentially for a binary outcome can be thought of as
35:06adding half of a response
35:08and half of a lack of a response to our observed data.
35:12We're going to look at the most extreme dream Bayesian case
35:16of doing utility monitoring
35:18or any type of interim monitoring continually.
35:20So after every single participant's enrolled
35:22we will do a calculation
35:24and determine if we should stop the trial.
35:27We will then look at the effect of this choice
35:29across a range of predictive probability thresholds
35:34ranging from 0%,
35:35meaning we wouldn't stop early at all,
35:37up to 50% saying if there's anything less
35:39than a 50% chance I'll find success,
35:41I wanna stop that trial.
35:44And then finally it's worth noting
35:45we're actually also completely disregarding calibration
35:49for this interim monitoring.
35:51And so what we're gonna do is
35:52we're gonna calibrate our decision rules
35:54for the posterior probability at the end of the trial
35:57based off of a global scenario where
35:59we think it's ideal to share information
36:02and we're all not gonna account for the fact that
36:03we're doing interim looks at the data.
36:05Part of the question here was
36:07if we truly do all these assumptions
36:08and we do sort of the most naive thing,
36:11how badly do we actually do?
36:13Like is there enough reason to fear the results
36:16if we don't correctly calibrate for everything here?
36:21So I'm gonna paint some pictures here building from the
36:24simpler Simon design to our more complex Bayesian designs
36:27and then with information sharing
36:29just to illustrate three different properties.
36:31I'm gonna go fairly quickly
36:33'cause I know that you all have to vacate the classroom
36:35in about 10 minutes.
36:37So for the global scenario that we're looking at here,
36:41the like rate lines are going to represent
36:44the alternative basket scenario.
36:46So all, in this case, all 10 null baskets.
36:49Here we see we plan for 90% power
36:51Simon's design appropriately achieved
36:53that rejection rate of 90%.
36:55Likewise, the lines at the bottom here,
36:58these black lines,
36:59are going to represent the results of null baskets.
37:01Here are the global null scenario
37:03and we see that it achieves a 10% rejection rate.
37:06Now, this is a flat line here
37:08because again Simon's design is agnostic to things like
37:11the predictive probability.
37:13Now if we do frequent Bayesian monitoring,
37:16we see two interesting things here with these new lines.
37:19We see that at the top
37:21and the bottom, here I add these circles
37:23where the predictive probability threshold is 0%.
37:25This does represent the actual design
37:27that would correspond to the actual calibration we did
37:29without interim monitoring.
37:31And we see that it is possible with Bayesian approaches
37:34to achieve the same frequent operating characteristics
37:37that we would achieve with something like the Simon design.
37:40We can see though that if we want to do interim monitoring
37:43but we didn't calibrate
37:44or think of that in our calculations,
37:46we do see this trade off where we have our
37:49alternative baskets having a decreasing power
37:51or rejection rate as the aggressiveness of the
37:53predictive probability threshold increases.
37:56And likewise the type one error rate or the
37:59rejection rate of the marginal baskets also decreases.
38:02Now if we add information sharing to this design,
38:06we actually see some encouraging results
38:08in this global scenario.
38:09First, it's worth noting that in the case
38:11where we actually calibrated for,
38:12we actually see an increase in power from 90% to about 97%.
38:17And even when we actually have a
38:1910% predictive probability threshold for interim monitoring,
38:23we see that we actually still achieve 90% power
38:26with a corresponding reduction in that type one error rate.
38:30Of course, this is with the caveat that
38:32this is the ideal setting for sharing information
38:35because all of the baskets are truly exchangeable.
38:38Now the rejection rate correlates to something we call
38:41that expected sample size.
38:42What is the average sample size we might enroll
38:44for each basket of our 10 baskets in the trial?
38:47We see here that in the case of a null basket
38:50the Simon design is about 20.
38:53If we do interim monitoring with Bayesian approaches
38:56and no information sharing,
38:57obviously if we don't do any interim looks at the data,
38:59we have a 0% threshold,
39:01we're gonna have a sample size of 25 every single time.
39:05I think what's encouraging though is that
39:06by looking fairly aggressively we see that our sample size,
39:09even with a very marginal
39:11or low 5% threshold for futility monitoring,
39:14drops from 20 in the assignment design to about 15
39:18in the Bayesian design,
39:19the trade-off of course being because we didn't calibrate.
39:22We also see a reduction in the sample size
39:24for the alternative baskets.
39:28And if we add that layer of information sharing,
39:30we actually see that we do slightly better than
39:32the design without information sharing
39:34while attenuating at the top here the effect
39:37our solid gray line has for the alternative baskets.
39:42Now, briefly tying this together then to the stopping rate,
39:45which we can kind of infer from those past results,
39:47we do see that on average the Simon two-stage design
39:50for the null baskets stopping for futility
39:52is only taking place a little over 50% of the time
39:55in this simulation.
39:57The advantage here though is that it is
39:58very rarely stopping for the alternative baskets.
40:02In our Bayesian approaches,
40:03we see that there is an over 80%
40:06of these low thresholds probability of stopping
40:09if it's a null effect.
40:10And this is ideal because we have 10 baskets.
40:12And so these potential savings or effects
40:14can compound themselves across these multiple baskets.
40:18We then see that the design adding these solid lines
40:21for information sharing do very similarly
40:23where again the the consequence of not calibrating
40:26are attenuated in this circumstance.
40:30Now the thing to note here that
40:31everything I presented on these few graphics
40:34were with respect to the global scenario,
40:36that ideal scenario that I actually don't think
40:38is super realistic in practice.
40:41So we see here, if we do a mixed scenario where
40:44we now have calibrated for the global scenarios,
40:46we've miscalibrated with respect to that.
40:48We've also not calibrated for interim looks at the data.
40:51We can actually see that the results for
40:53the Simon two-stage in the Bayesian design
40:55without information sharing are very similar
40:58to what we saw before.
40:59That's because they don't share information.
41:00And so in this case with eight null baskets
41:02into alternative baskets,
41:04they have very similar responses.
41:06This contrasts of course with the MEM approach
41:09or the information sharing approach
41:10where we actually see now
41:12many of these results are actually overlapping
41:15for information sharing and no information sharing.
41:18What this tells us is that even though we miscalibrated
41:21up and down the design,
41:23we are actually able with this more conservative prior
41:26to down weight borrowing
41:27and effectuate similar results
41:30that at lower thresholds for utility monitoring for example
41:34at 5% can still show potential gains in efficiency relative
41:38to the Simon design that could likely further be improved
41:41with actual calibration.
41:44So just as a reminder,
41:45we demonstrated today
41:46and introduced the idea of Simon's two-stage design
41:48and some alternative methods to compete with them.
41:51And some just brief discussion and concluding points.
41:53There is no free lunch
41:55and this is true regardless of where we are in statistics
41:57that for example in our designs,
41:59besides the fact that we miscalibrated
42:01and made it a bit harder of a comparison for our methods,
42:04we did try to replicate
42:05what people might be doing in practice
42:07or the challenge of
42:07calibrating these designs into actuality.
42:10Simon's two-stage design does have a lot of benefits
42:13from it's ideal characteristics
42:15that are easy to implement,
42:17but it is limited in how often it may stop.
42:20Our Bayesian designs,
42:21with or without information sharing,
42:22can lead to reductions in the expected sample size
42:24in the null basket
42:25and further could be improved
42:27if we actually incorporate calibration,
42:29which we further explored
42:30in a statistical methods of medical research paper
42:33published in 2020.
42:35And so that I have some sources here
42:36and I thank you for your attention
42:37and welcome any questions or discussion at this point.
42:56<v Man>Thank you so much. Any questions from the room?</v>
43:12<v Student>Okay, so yeah, I have questions.</v>
43:14So in the example you just showed,
43:18all the like the task becomes so, can be achievable, right?
43:22So if the baskets,
43:25they are expected to have different benefits (indistinct),
43:28and say the 10 basket (indistinct)
43:33some other basket MEMs would allow a bigger benefit,
43:37how will the (indistinct)
43:45scenarios?
43:48<v Alex>Yeah, well, I think,</v>
43:49if I understood your question correctly
43:51and I misheard through the phone, let me know,
43:54but if we have different sample sizes for baskets,
43:56which actually really corresponds
43:58to what we've seen in practice for real basket trials
44:01where they have fairly
44:02wide range of sample sizes in each basket.
44:06I think what we would see,
44:07and let me see if I can pop back quickly to the
44:09mixed scenario results here just to illustrate some ideas.
44:13One of the concepts here that,
44:14so we did explicitly look at that to say like,
44:16"Well, what if one basket never gets beyond seven
44:18of the 25," let's say.
44:20But what we can infer is that
44:21if a basket stopped early for futility,
44:23it essentially has a smaller sample size to contribute
44:26to any analysis whether or not it was a
44:29falsely stopped basket that had a 30% effect
44:32or it was truly a null basket.
44:34And so we do see in this case that the method
44:36averaging over those ideas of differential sample sizes
44:39based off of soft baskets
44:41does seem to be borrowing,
44:43appropriately depending on the context.
44:45So like the mixed scenario results here suggests
44:47limited borrowing in the presence of that uncertainty
44:50from the global scenario
44:51because we didn't calibrate for anything else
44:53it does show more of a benefit of the stopping rate
44:56and other properties incorporating that data
44:58even in small sample sizes.
45:00And there's also been some other work
45:02and illustrations done by Dr. Emily Zebra
45:04at the Cleveland Clinic with who I work
45:06about some of the re-analysis of oncology trials
45:09that do show even in small basket sizes,
45:11we can move that significance evaluation
45:14into a more clinically meaningful realm.
45:26<v Wayne>Thanks, so do we have other questions?</v>
45:57Okay, so (indistinct) that's (indistinct).
46:01Okay, so since there are no questions let's stop here.
46:07(indistinct)
46:16<v Alex>Yeah. Thank you all.</v>