Skip to Main Content

YSPH Biostatistics Seminar: “A Sequential Basket Trial Design Based on Multi-Source Exchangeability with Predictive Probability Monitoring”

November 17, 2022
  • 00:00<v Wayne>We introduce Dr. Alex Kaizer.</v>
  • 00:04Dr. Kaizer is an assistant professor
  • 00:07in the Department of Biostatistics and Informatics,
  • 00:10and he's a faculty member
  • 00:12in the Center for Innovative Design and Analysis
  • 00:14at the University of Colorado Medical Campus.
  • 00:19He's passionate about translational research
  • 00:21and the development of normal
  • 00:24and at clinical trial designs
  • 00:25that the more efficient that they
  • 00:27and factually utilize available resources
  • 00:31including past trails and past studies.
  • 00:34And Dr. Kaizer strives to translate
  • 00:37(indistinct) topics into understandable material
  • 00:41that is more than just a mask
  • 00:42and something we can appropriate
  • 00:44and utilize in our daily lives and research.
  • 00:46Now let's welcome Dr. Kaizer.
  • 00:53<v Alex>Thank you Wayne.</v>
  • 00:54So apologies for my own technical difficulties today,
  • 00:57but I'm going to be presenting on
  • 00:59this idea of a sequential basket trial design
  • 01:02based on multi-source exchangeability
  • 01:04with predictive probability monitoring.
  • 01:06And that is admittedly quite the mouthful
  • 01:09and I'm hoping throughout this presentation
  • 01:11to break down each of these concepts
  • 01:13and ideas building upon them sort of until we have this
  • 01:17cumulative effect that represents this title today.
  • 01:21Before jumping into everything though,
  • 01:23I do wanna make a few acknowledgements.
  • 01:25This paper was actually published
  • 01:26just at the end of this past summer in PLOS ONE,
  • 01:28and so if you're interested in more of the technical details
  • 01:31or additional simulation examples
  • 01:33and things beyond what I present today,
  • 01:35I include this paper here
  • 01:36and we'll also have it up again at the very end of my talk
  • 01:39just for reference.
  • 01:40Also acknowledgement to Dr. Nan Chen
  • 01:42who helped with some of the initial coding
  • 01:44of some of these methods and approaches.
  • 01:50So to set the context for my seminar today,
  • 01:52I want to think here about
  • 01:54this move towards precision medicine generally,
  • 01:56but especially in the context of oncology.
  • 01:59And so within oncology, like many other disciplines,
  • 02:02when we design research studies,
  • 02:03we often design these for a particular,
  • 02:05what we might call a histology
  • 02:07or an indication or a disease.
  • 02:09So for example, we might say,
  • 02:11"Well, I have a treatment or intervention
  • 02:13which I hope or think will work in lung cancer,
  • 02:15therefore I'm going to design
  • 02:18and enroll in the study for lung cancer."
  • 02:21Now this represents a very standard way
  • 02:24that we do clinical trial design where we try to
  • 02:26really rigorously define
  • 02:27and limitedly define what our scope is.
  • 02:31Now within oncology,
  • 02:32we've had some exciting scientific developments
  • 02:34over the past few decades.
  • 02:36So now instead of seeing cancer as just
  • 02:38based on the site like you have a lung cancer
  • 02:40or a prostate cancer,
  • 02:42we actually have identified that we can partition cancers
  • 02:45into many small molecular subtypes.
  • 02:48And further, we've actually been able to
  • 02:49leverage this information by being able to say that
  • 02:52what we thought of as a holistic lung cancer
  • 02:54isn't just one type of disease,
  • 02:57we can actually develop therapies that we hope to
  • 02:59target some of these differences in genetic alterations.
  • 03:02And this really gets to that idea of precision medicine that
  • 03:05instead of throwing a treatment at someone
  • 03:07where we think it should work
  • 03:08or it has worked in some people on average,
  • 03:10hopefully we can really target the intervention
  • 03:13based off of some signal
  • 03:14or some indication like a biomarker or a genotype
  • 03:17that we actually hope could respond more ideally
  • 03:20to that intervention.
  • 03:22Now what's really interesting about this as well
  • 03:26is that there could be a potential for heterogeneity
  • 03:28in this treatment benefit by indication.
  • 03:30And what I mean by that is once we've identified
  • 03:33that there's these different genetic alterations,
  • 03:35we've actually discovered that these alterations
  • 03:37aren't necessarily unique to one site of cancer.
  • 03:41For example, we may identify a genetic alteration
  • 03:43in the lung that also is present in the prostate, liver,
  • 03:46and kidney in some of those types of cancer.
  • 03:49Now the challenge here though is that
  • 03:50even though we have the same driver hypothetically
  • 03:53based on our clinical or scientific hypothesis
  • 03:56of that potential benefit for a treatment we've designed
  • 03:59to address it,
  • 04:00there's still may be important differences
  • 04:01that we don't know about
  • 04:02or have yet to account for based off of each site.
  • 04:05So what may have worked actually really well in the lung
  • 04:07for one given mutation,
  • 04:09even for that same mutation, let's say present in the liver,
  • 04:12may not work as well.
  • 04:13And that's that idea of heterogeneity and treatment benefit.
  • 04:16That we can have different levels of response
  • 04:18across different sites or groups of individuals.
  • 04:23Now the cool thing I think here
  • 04:24from the statistical perspective is that the scientific
  • 04:27and clinical advancements
  • 04:28have also led to the revolution and statistical
  • 04:31and clinical design challenges and approaches.
  • 04:34And of course that's the sweet spot that I work at.
  • 04:36I know many of you
  • 04:37and especially students are training
  • 04:39and studying to work in this area
  • 04:40to collaborate with scientific and clinical researchers
  • 04:43and leaders to translate those results
  • 04:45in statistically meaningful ways
  • 04:47and to potentially design trials or studies
  • 04:49that really target these questions and hypotheses.
  • 04:53Now specifically in this talk today,
  • 04:56I'm going to focus on this idea of
  • 04:58a master protocol design or evolution.
  • 05:01And these provide a flexible approach
  • 05:02to the design of trials with multiple indications,
  • 05:05but they do have their own unique challenges
  • 05:06that I'm gonna highlight a few of here in a second.
  • 05:09But there are a variety of master protocols out there
  • 05:11in case you've heard some of these buzzwords.
  • 05:13I'll be focusing on basket trials today,
  • 05:16but you may have also heard of things like umbrella trials
  • 05:18or even more generally platform trial designs.
  • 05:24And so one example of what this looks like here is
  • 05:26this is a graphic from a paper in the
  • 05:28New England Journal by Dr. Woodcock and LaVange,
  • 05:31Dr. Woodcock being a clinician,
  • 05:32and Dr. Lisa LaVange being a past president of
  • 05:35The American Statistical Association,
  • 05:37where they actually tried to put to rest some of the
  • 05:39confusion surrounding some of these design types
  • 05:42because it turns out,
  • 05:43up until 2017 when we discussed these designs
  • 05:46across even statistical communities
  • 05:48and with clinical researchers,
  • 05:50we tend to use these terms fairly interchangeably
  • 05:53even though we are really getting at
  • 05:55very different concepts.
  • 05:57So for example,
  • 05:58in the top here we have this idea of an umbrella trial
  • 06:02and this is really the context of a single disease
  • 06:04like lung cancer,
  • 06:05but we actually then will screen for
  • 06:07those genetic alterations
  • 06:08and have different therapies that we're trying to
  • 06:10target a different biomarker or genetic alteration for.
  • 06:14This contrasts to what we're focusing on today below
  • 06:17of a basket trial,
  • 06:18we actually have different diseases or indications,
  • 06:20but they share a common target or genetic alteration
  • 06:23which we wish to target.
  • 06:25And in this sense we can think of it potentially
  • 06:26as them sharing a basket
  • 06:28or sharing a sort of that commonality there.
  • 06:32Now, this is a fairly broad general idea of these designs.
  • 06:35And so I think for the sake of
  • 06:37what we're gonna talk about today
  • 06:38and some of the statistical considerations
  • 06:40that can be helpful to do a bit of a
  • 06:42oversimplification of what a design might look like here.
  • 06:46And so on the slide that I've presented,
  • 06:48I have this kind of naive graphic of actual baskets
  • 06:53and we're going to assume that in each column
  • 06:54we have a different indication or site of cancer
  • 06:56that has that common genetic alteration.
  • 06:59So for example, basket one may represent the lung,
  • 07:02basket two may represent the liver and so on.
  • 07:05Now when we're in the case of designing
  • 07:07or the design stage of a study,
  • 07:09we tend to make oversimplifying assumptions
  • 07:11to address these potential calculations for
  • 07:13power, sample size,
  • 07:15and quantities that we're usually interested in
  • 07:17for study design.
  • 07:19So here on this graph,
  • 07:20we are gonna make a assumption that
  • 07:22there's only two possible responses in this planning stage.
  • 07:25One is that the baskets have no response or a null basket,
  • 07:29that's the blue colored solid baskets on the screen.
  • 07:32The other case would be a alternative response
  • 07:35where there is some hopeful benefit to the treatment
  • 07:37and those are the open orange colored baskets
  • 07:41we see on the screen here.
  • 07:43Now, one of the challenges I think with basket trial design
  • 07:46that can be overlooked sometime,
  • 07:48even in this design stage,
  • 07:49is that for a standard two arm trial,
  • 07:51we do have to make this assumption of,
  • 07:52what is our null hypothesis or response?
  • 07:55What's our alternative hypothesis or response?
  • 07:57We really only have to do that for one configuration
  • 08:00or combination because we have two arms.
  • 08:03In the case of a single arm basket trial here,
  • 08:05we actually see that
  • 08:06just by having five baskets in a study
  • 08:08and many actual trials that are implemented at
  • 08:10far more baskets,
  • 08:12we actually see a range of just six possible
  • 08:14binary combinations of the basket works
  • 08:17or it doesn't work,
  • 08:18ranging from at the extremes a global null
  • 08:21where unfortunately the treatment does not work
  • 08:23in any basket down to the sort of dream scenario
  • 08:26where the basket is actually,
  • 08:28or the drug actually works across all baskets.
  • 08:31There is this homogenous actually response
  • 08:34in a positive direction for the sort of clinical outcome.
  • 08:39More realistically,
  • 08:40we actually will probably encounter
  • 08:42something that we see falls in the middle here,
  • 08:44scenarios two through five,
  • 08:46where there's some mixture of baskets
  • 08:48that actually do show a response
  • 08:49and some that for whatever reason we might not know yet,
  • 08:51it just doesn't appear to have any effect
  • 08:53and is a null response.
  • 08:55So this can make it challenging
  • 08:56for some of the considerations of
  • 08:58what analysis strategy you plan to use in practice.
  • 09:03And so to just, at a high level,
  • 09:05highlight some of these challenges
  • 09:07before we jump into the methods for today's talk.
  • 09:11In practice, each of these baskets within trial
  • 09:13often have what we call a small
  • 09:14and or small sample size for each of those indications.
  • 09:18It turns out
  • 09:18once we actually have this idea of precision medicine
  • 09:20and we can be fairly precise
  • 09:22for who counts for a trial,
  • 09:23we actually have a much smaller potential sample
  • 09:26or population to enroll.
  • 09:27This means that even though we might have a treatment
  • 09:29that works really well,
  • 09:30it can be challenging to find individuals who qualify
  • 09:33or are eligible to enroll
  • 09:34or they may have competing trials or demands
  • 09:36for other studies or care to consider.
  • 09:41As I've also alluded to earlier the challenge,
  • 09:43we also have this potential for indication
  • 09:44or subgroup heterogeneity and that may be likely.
  • 09:47In other words,
  • 09:48we might not expect the same response
  • 09:50across all those baskets.
  • 09:50And that gets back to the previous graphic
  • 09:52on that last slide
  • 09:53where we might have something like two null baskets
  • 09:56and three alternative baskets.
  • 09:57And that can make it really challenging in the presence
  • 09:59of a small n to determine how do we
  • 10:02appropriately analyze that data
  • 10:03so we capture the potentially applications baskets
  • 10:06and can move those forward so patients benefit
  • 10:09while not carrying forward null baskets
  • 10:11where there is no response for those patients.
  • 10:16Statistically speaking,
  • 10:17we also have these ideas of operating characteristics
  • 10:19and in the context of a trial,
  • 10:21what we mean by that is things like power
  • 10:22and type one error
  • 10:24and I just have additional considerations with respect to
  • 10:26how do we summarize these?
  • 10:28Do we summarize them within each basket or each column
  • 10:31on that graphic on the previous slide,
  • 10:32essentially treating it as a bunch of
  • 10:34standalone independent one arm trials
  • 10:36just under one overall study design or idea?
  • 10:40Or do we try to account for the fact that we have
  • 10:42five baskets enrolling like on the graphic before
  • 10:45and we might wanna consider something like a
  • 10:47family wise type one error rate
  • 10:48where any false positive would be a negative outcome
  • 10:52if we're trying to correctly predict
  • 10:53or identify associations?
  • 10:57Now the focus of today's talk,
  • 10:58and I could talk about these other points
  • 11:00till the cows come home,
  • 11:02but I'm gonna focus today on
  • 11:04depending on that research stage we're at,
  • 11:06if it's a phase one, two or three trial,
  • 11:08we may wish to terminate early for some reason like
  • 11:10efficacy or futility.
  • 11:12And specifically for time today,
  • 11:13I'm gonna focus on the idea of stopping for futility
  • 11:16where we don't wanna keep enrolling baskets
  • 11:18that are poorly performing both for ethical reasons.
  • 11:20In other words,
  • 11:21patients may benefit from other trials or treatments
  • 11:24that are out there and we don't wanna subject them to
  • 11:26treatments that have no benefit.
  • 11:28But also from a resource consideration perspective.
  • 11:31You can imagine that running a study or trial is expensive
  • 11:34and can be complicated.
  • 11:36And especially if we're doing something like a basket trial
  • 11:38where we're having to enroll across multiple baskets,
  • 11:40it may be ideal to be able to drop baskets early on
  • 11:43that don't show promise
  • 11:44so we can reallocate those resources to
  • 11:46either different studies, research projects,
  • 11:49or trials that we're trying to implement or run.
  • 11:55So the motivation for today's talk
  • 11:57building off of these ideas is that
  • 11:58I want to demonstrate that a design that's very popular
  • 12:01called Simon's two-stage design is
  • 12:04generally speaking suboptimal
  • 12:05compared to the multitude of alternative methods
  • 12:08and designs that are out there.
  • 12:10And then this is especially true in our context of
  • 12:12a basket trial where within the single study
  • 12:15we actually are simultaneously enrolling
  • 12:17multiple one arm trials in our case today.
  • 12:20Then the second point I'd like to highlight is
  • 12:22we can identify when methods for sharing information
  • 12:24across baskets could be beneficial to further improve
  • 12:27the efficiency of our clinical trials.
  • 12:31And so to highlight this,
  • 12:32I wanna first just build us through
  • 12:34and sort of illustrate or introduce these designs
  • 12:36and the general concepts behind them
  • 12:37because I know if you don't work in this space
  • 12:40it may be sort of just ideas vaguely.
  • 12:43So I wanna start with the Simon two-stage design,
  • 12:45that comparator that people are commonly using.
  • 12:48So Richard Simon, and this is back in 1989,
  • 12:51introduced what he called optimal two-stage designs
  • 12:54for phase two clinical trials.
  • 12:56And this was specifically in the context
  • 12:57that we're focusing on today for a one sample trial
  • 12:59to evaluate the success of a binary outcome.
  • 13:02So for oncology we might think of this as a yes no outcome
  • 13:05for is there a reduction in tumor size
  • 13:07or a survival to some predefined time point.
  • 13:13Now specifically what Dr. Simon was motivated by
  • 13:16was the stage-two trials
  • 13:18as it says in the title of his paper,
  • 13:20and just to kind of
  • 13:22give a common lay of the land for everyone,
  • 13:23the purpose generally speaking of a phase two trial
  • 13:26is to identify if the intervention
  • 13:28warrants further development
  • 13:30while collecting additional safety data.
  • 13:32Generally speaking,
  • 13:33we will have already completed what we call
  • 13:35a phase one trial where we collect preliminary safety data
  • 13:38to make sure that the drug is not toxic
  • 13:40or at least has expected side effects
  • 13:44that we are willing to tolerate for that
  • 13:45potential gain in efficacy.
  • 13:48And then in phase two here we're actually trying to say,
  • 13:50"You know, is there some benefit?
  • 13:51Is it worth potentially moving this drug
  • 13:53on either for approval
  • 13:54or some larger confirmatory study
  • 13:57to identify if it truly works or doesn't?"
  • 14:01Now the motivation for Dr. Simon is that
  • 14:03we would like to terminate studies earlier,
  • 14:05as I mentioned before,
  • 14:06for both ethical and resource considerations
  • 14:08that they appear futile.
  • 14:09In other words, it's not a great use of our resources
  • 14:11and we should try in some
  • 14:12rigorous statistical way to address this.
  • 14:17If you do go back and look at Simon's 1989 paper
  • 14:20or you just Google this
  • 14:21and there's various calculators that people have
  • 14:22put out there,
  • 14:23there are two flavors of this design that exist
  • 14:26from this original paper.
  • 14:27One is an optimal
  • 14:28and one is called a minimax design.
  • 14:31Within clinical trials,
  • 14:32once we introduce this idea of stopping early potentially
  • 14:36or have the chance to stop early based on our data,
  • 14:39we now have this idea that there's this expected sample size
  • 14:42because we could enroll the entire sample size
  • 14:44that we planned for or we could potentially stop early.
  • 14:47And since we could stop early or go the whole way
  • 14:49and we don't know what our choice will be
  • 14:51until we actually collect the data and do the study,
  • 14:53we now have sample size of the random variable,
  • 14:56something that we can calculate an expectation
  • 14:58or an average for.
  • 14:59And so Simon's optimal design tries to
  • 15:01minimize what that average sample size might be in theory.
  • 15:06In contrast, the minimax design
  • 15:08tries to minimize whatever that largest sample size would be
  • 15:11if we didn't stop early.
  • 15:13So if we kept enrolling
  • 15:14and we never stopped at any of our interim looks,
  • 15:16how much data would we need to collect
  • 15:18until we choose a design that minimizes that
  • 15:20at the expense of potentially stopping early?
  • 15:25I think this is most helpful to see the
  • 15:27sort of elegance of this design
  • 15:29and why it's I think so popular
  • 15:30by just introducing example
  • 15:31that will also motivate our simulations
  • 15:33here that we're gonna talk about in a minute.
  • 15:36We're gonna consider a study where
  • 15:37the null response rate is 10%.
  • 15:40And we're going to consider a target
  • 15:42for an alternative response rate of 30%.
  • 15:44So this isn't a situation where
  • 15:45we're looking for necessarily a curative drug,
  • 15:48but something that does show what we think of
  • 15:49as a clinically meaningful benefit from 10 to 30%,
  • 15:52let's say survival or tumor response.
  • 15:55Now if we have these two parameters
  • 15:57and we wanna do a Simon two-stage minimax design
  • 16:00to minimize that maximum possible sample size
  • 16:03we would enroll,
  • 16:04we would have to also define
  • 16:06the type one error rate or alpha
  • 16:08that cancels a false positive.
  • 16:10Here we're going to set 10% for this phase two design
  • 16:12and we also wish to target a 90% power
  • 16:15to detect that treatment of 30% if it truly exists.
  • 16:20So we put all of this into our calculator
  • 16:22to Simon's framework and we turn that statistical crank.
  • 16:25What we see is that
  • 16:26it gives us this approach where in stage one
  • 16:29we would enroll 16 participants
  • 16:31and we would terminate the trial or this study arm
  • 16:34for futility if one or fewer responses are observed.
  • 16:37Now if we observe two or more responses,
  • 16:41we would continue enrollment
  • 16:43to the overall maximum sample size that we plan for
  • 16:45of 25 in the second stage.
  • 16:48And at this point if four or fewer responses are observed,
  • 16:51no further investigation is warranted
  • 16:53or we can think of this as a situation where
  • 16:55our P value would be larger than our defined alpha 0.1.
  • 17:00Now, the nice thing here is that it is quite simple.
  • 17:03In fact, after we trim that statistical crank
  • 17:05and we have this decision rule,
  • 17:06you in theory don't even need a statistician
  • 17:08because you can count the number of responses
  • 17:10for your binary outcome on your hand
  • 17:12and determine should I stop early, should I continue?
  • 17:15And if I continue,
  • 17:16do I have some benefit potentially
  • 17:18that says it's worth either doing a future study
  • 17:21or I did a statistical test,
  • 17:23would find that the P value meets my threshold
  • 17:25I set for significance.
  • 17:29Now, of course,
  • 17:31it wouldn't be a great talk if I stopped there and said,
  • 17:33"You know, this is everything.
  • 17:34It's perfect. There's nothing to change."
  • 17:36There are some potential limitations
  • 17:37and of course some solutions I think
  • 17:39that we could address in this talk.
  • 17:42The first thing to note is that
  • 17:43this is extremely restrictive in when it could terminate
  • 17:46and it may continue to the maximum sample size
  • 17:48even if a null effect is present.
  • 17:50And we're gonna see this come to fruition
  • 17:52in the simulation studies,
  • 17:54but it's worth noting here it only looks once.
  • 17:55It's a two stage design.
  • 17:58And depending on the criteria you plug in,
  • 18:00it might not look for quite some time.
  • 18:0216 out of 25 total participants enrolled
  • 18:05is still a pretty large sample size
  • 18:07relative to where we expect to be.
  • 18:11One solution that we could look at
  • 18:12and that I'm going to propose today
  • 18:14is that we could use Bayesian methods instead
  • 18:16for more frequent interim monitoring.
  • 18:18And this could use quantities that we think of
  • 18:20as the posterior or the predictive probabilities
  • 18:23of our data.
  • 18:26Another limitation that we wish to address as well is that
  • 18:28in designs like a basket trial
  • 18:30that have multiple indications
  • 18:31or multiple arms that have the same entry criteria,
  • 18:35Simon's two-stage design is going to
  • 18:36fail to take advantage of the potential
  • 18:38what we call exchange ability across baskets.
  • 18:41In other words, if baskets appear to have the same response,
  • 18:45whether it's let's say that null
  • 18:46or that alternative response,
  • 18:48it would be great if we could
  • 18:49informatively pull them together into meta subgroups
  • 18:52so we can increase the sample size
  • 18:54and start to address that challenge of the small n
  • 18:56that I mentioned earlier for these basket trial designs.
  • 18:59And specifically today we're going to examine the use of
  • 19:02what we call multi-source exchangeability models
  • 19:05to share information across baskets when appropriate.
  • 19:08And I'll walk through a very high level sort of
  • 19:10conceptual idea of what these models
  • 19:12and how they work and what they look like.
  • 19:17Before we get into that though,
  • 19:18I wanna just briefly mention the idea of posterior
  • 19:20and predictive probabilities
  • 19:22and give some definitions here
  • 19:23so we can conceptually envision what we mean
  • 19:25and especially if you haven't had the chance
  • 19:27to work with a lot of patient methods,
  • 19:29this can help give us an idea
  • 19:31of some of the analogs to maybe a frequentist approach
  • 19:33or what we're trying to do here
  • 19:34that you may be familiar with.
  • 19:36Now I will mention,
  • 19:37I'm not the first person to propose looking at
  • 19:39Bayesian interim stopping rules.
  • 19:41I have a couple citations here by Dmitrienko
  • 19:43and Wang and Saville et all
  • 19:45and they do a lot of extensive work in addition to
  • 19:47hundreds of other papers considering
  • 19:49Bayesian interim monitoring.
  • 19:51But specifically to motivate this
  • 19:53we have these two concepts that commonly come up
  • 19:56in Bayesian analysis,
  • 19:57a posterior probability or a predictive probability.
  • 20:01The posterior probability
  • 20:03is very much analogous to kinda like a P value
  • 20:05in a frequent significance.
  • 20:06It says, "Based on the posterior distribution
  • 20:09we arrive at through a Bayesian analysis,
  • 20:11we're gonna calculate the probability
  • 20:13that our proportion exceeds the null response rate
  • 20:15we wish to beat."
  • 20:16So in our case, we're basically saying,
  • 20:18"What's the probability based on our data
  • 20:20and a prior we've given that the response is 10% or higher."
  • 20:25So this covers a lot of ground
  • 20:26'cause anything you know from 10.1 up to 100%
  • 20:29would meet this criteria being better than 10%.
  • 20:32But it does quantify,
  • 20:34based on the evidence we've observed so far,
  • 20:37how the data suggests the
  • 20:40benefit may be with respect to that null.
  • 20:42So in the case of let's say
  • 20:44an interim look for futility at the data, we could say,
  • 20:47if we just use Simon's two-stage design as our motivating
  • 20:51ground to consider, we might say,
  • 20:53"Okay, we have 16 people so far,
  • 20:55what's the probability based on these 16 people
  • 20:58that I could actually say
  • 20:59there's no chance or limited chance
  • 21:00I'm going to detect something in the trial here
  • 21:03based on the data I've seen so far?"
  • 21:05Now the challenge here is that
  • 21:07it is based on off the data we've seen so far
  • 21:09and it doesn't take into account the fact that we still have
  • 21:12another nine potential participants to enroll
  • 21:15to get to that maximum sample size of 25.
  • 21:18That's where this idea of what we call a
  • 21:20predictive probability comes in.
  • 21:22We're considering our accumulated data
  • 21:24and the priors we've specified in our Bayesian context,
  • 21:27it's the probability that we will have observed
  • 21:30a significant result if we've met
  • 21:32and enrolled up to our maximum sample size.
  • 21:36In other words, I think it's a very natural place to be
  • 21:38for interim monitoring
  • 21:39because it says based on the data I've seen so far,
  • 21:41i.e the posterior probability,
  • 21:43if I use that to help identify what are likely futures
  • 21:46to observe or likely sample sizes
  • 21:48I will continue enrolling to get to that maximum of 25,
  • 21:51what's the probability at the end of the day
  • 21:53when I do hit that sample size of 25,
  • 21:56I will have a significant conclusion?
  • 21:58And if it's a really low predictive probability,
  • 22:00if I say there's only a 5% chance
  • 22:02of you actually declaring significance if you
  • 22:04keep enrolling participants,
  • 22:06that can be really informative both statistically
  • 22:08and for clinical partners to say
  • 22:10it doesn't seem very likely that we're gonna hit our target.
  • 22:13That being said,
  • 22:15a lot of people are very happy to continue trials going
  • 22:17with low chances or low probability
  • 22:19because you're saying there's still a chance
  • 22:21I may detect something that could be
  • 22:23significant enough worth.
  • 22:25So we'll see that across a range of these thresholds,
  • 22:28the performance of these models may change.
  • 22:32Now this brings us to a brief recap
  • 22:34of sort of our motivation.
  • 22:35I just spent a few minutes
  • 22:37introducing that popular Simon two-stage design,
  • 22:39the idea behind it,
  • 22:40what it might look like in practice,
  • 22:42as well as some alternatives with the Bayesian flare.
  • 22:45The next part I wanna briefly address is that
  • 22:47we can also now look at this idea
  • 22:50of sharing information across baskets
  • 22:52to further improve that trial efficiency
  • 22:54'cause so far both Simon's design
  • 22:56and the just using a posterior predictive probability
  • 22:59for an interim monitoring will still treat each basket
  • 23:02as its own little one arm trial.
  • 23:07Now specifically today I'm gonna focus on this idea
  • 23:10we call multi-source exchangeability models or MEMs.
  • 23:13This is a general Bayesian framework
  • 23:15to enable the incorporation of independent sources
  • 23:18of supplemental information
  • 23:20and its original work that I developed
  • 23:22during my dissertation at the University of Minnesota.
  • 23:25In this case,
  • 23:26the amount of borrowing is determined by
  • 23:27the exchange ability of our data,
  • 23:29which in our context is really,
  • 23:31how equivalent are the response rates?
  • 23:33If two baskets have the exact same response rate,
  • 23:35we may think that there's a higher probability
  • 23:38that the true underlying population
  • 23:40we are trying to estimate are truly exchangeable.
  • 23:42We wish to combine that data as much as we possibly can.
  • 23:46First is if again we see something that is like a
  • 23:4810% response rate for one basket
  • 23:50and a 30% response rate for another basket,
  • 23:53we likely don't want to combine that data because
  • 23:55those are not very equivalent response rates.
  • 23:57In fact, we seem to have identified
  • 23:59two different subgroups
  • 24:00and performances in those two baskets.
  • 24:04One of the advantages of MEMs relative to
  • 24:07a host of other statistical methods that are out there
  • 24:09that include things like power priors, commensurate priors,
  • 24:12meta analytic priors, and so forth,
  • 24:15is that we've been able to demonstrate that
  • 24:16in their most basic iteration without
  • 24:18any extra bells or whistles,
  • 24:20MEMs are able to actually account for this heterogeneity
  • 24:23across different potential response rates
  • 24:26and appropriately down weight non-changeable sources.
  • 24:29Whereas we show through simulation
  • 24:30and earlier work some of these other methods without
  • 24:33newer advancements to them
  • 24:35actually either naively pull everything together
  • 24:38even if there's non-changeable groups
  • 24:41or they're afraid of the sort of presence of
  • 24:43non-change ability and if anything seems amiss,
  • 24:45they quickly go to an independence analysis
  • 24:48that doesn't leverage this potential sharing
  • 24:51of information across meta subgroups that are exchangeable.
  • 24:56Now again, I don't wanna get too much into the weeds
  • 24:59of the math behind the MEMs,
  • 25:00but I will have a few formulas in a couple slides
  • 25:02but I do think it's helpful to
  • 25:03conceptualize it with graphics.
  • 25:05And so here I just want to illustrate a very simplified case
  • 25:08where we're gonna assume that we have a three basket trial
  • 25:11and for the sake of doing an analysis with MEMs,
  • 25:14I think it's helpful to also think of it as
  • 25:16we're looking at the perspective of the analysis
  • 25:18from one particular basket.
  • 25:20So here on this slide here we see that we have this
  • 25:24theta P circle in the middle
  • 25:25and that's the parameter or parameters of interest
  • 25:28we wish to estimate.
  • 25:29In our case, that would be that
  • 25:31binary outcome in each basket.
  • 25:34Now, for this graphic we're using each of these circles here
  • 25:37to represent a different data source.
  • 25:40We're gonna say Y sub P is that primary basket
  • 25:42that we're interested in or the perspective
  • 25:44we're looking at for this example
  • 25:46and Y sub one and Y sub two
  • 25:48are two of the other baskets enrolled within the trial.
  • 25:51Now a standard analysis
  • 25:53without any information sharing across baskets
  • 25:55would only have a data pooled from the observed data.
  • 26:00I mean this is sort of the unexciting
  • 26:01or unsurprising analysis
  • 26:03where we basically are analyzing the data we have
  • 26:05for the one basket that actually represents that group.
  • 26:10However, we could imagine if we wish
  • 26:11to pool together data from these other sources,
  • 26:14we have different ways we could add arrows to this figure
  • 26:17to represent different combinations of these groups.
  • 26:21And this brings us to
  • 26:22that multi-source exchangeability framework.
  • 26:25So we see here on this slide,
  • 26:26I now of a graphic showing four different combinations
  • 26:29of exchangeability when we have these two other baskets
  • 26:32that compare to our one basket of interest right now.
  • 26:36And from top left to the bottom left
  • 26:38in sort of a clockwise fashion,
  • 26:39we see that making different assumptions from
  • 26:42that standard analysis with no borrowing
  • 26:44in the top right here where I'm drawing that arrow.
  • 26:46So it is possible that
  • 26:48none of our data sources are exchangeable
  • 26:49and we should be doing an analysis that
  • 26:51doesn't share information.
  • 26:53On the right hand side that we might envision that
  • 26:55well maybe the first basket or Y1 is exchangeable.
  • 26:58So we wanna pull that with Y2 or excuse me with Yp,
  • 27:01but Y2 is not.
  • 27:03In the bottom right, this capital omega two,
  • 27:05we actually assume that Y2 is exchangeable
  • 27:07but Y1 is not.
  • 27:09And in the bottom left we assume in this case
  • 27:10that all the data is exchangeable
  • 27:12and we should just pool it all together.
  • 27:15So at this stage we've actually
  • 27:17proposed all the configurations we can pairwise
  • 27:20of combining these different data sources with Y sub P.
  • 27:23And we know that these are fitting four now different models
  • 27:26based off of the data
  • 27:27because for example in the top left, that standard analysis,
  • 27:30there is no extra information from those other baskets
  • 27:33versus like in the bottom left,
  • 27:35we basically have combined everything
  • 27:36and we think there's some common effect.
  • 27:39Now this leads to two challenges on its own
  • 27:40if we just stopped here with the framework.
  • 27:43One would be that we'd have this idea of maybe
  • 27:44cherry picking or trying to pick whichever combination
  • 27:47best suits your prior hypotheses clinically.
  • 27:50And so that would be a big no-go.
  • 27:51We don't like cherry picking
  • 27:52or fishing for things like P values
  • 27:54or significance in our statistical analyses.
  • 27:57The other challenge also is that
  • 27:59all of these configurations are just assumptions
  • 28:01of how we could combine data
  • 28:03but we know underlying everything in the population is that
  • 28:05true assumption of exchange ability of
  • 28:07are these baskets or groups truly combinable or not?
  • 28:11And we're just approximating that with our sample.
  • 28:13And so right now if we have four separate models
  • 28:15and potentially four separate conclusions,
  • 28:18we need some way of combining these models
  • 28:20to make inference.
  • 28:21And in this case we propose
  • 28:23leveraging a Bayesian model averaging framework
  • 28:26where we calculate in this case
  • 28:28and in our formulas here,
  • 28:29the queues represent a posterior distribution
  • 28:32where I've drawn this little arrow
  • 28:33and I'm underlining right now,
  • 28:35that reflects each square's configuration of
  • 28:39exchange ability for our estimates.
  • 28:41And through this process
  • 28:42we estimate these lower case omega model weights
  • 28:45that tries to estimate the appropriateness
  • 28:47of exchangeability with the ultimate goal of
  • 28:50having a average posterior that we can use
  • 28:53for statistical inference
  • 28:54to draw a conclusion about the potential efficacy
  • 28:57or lack thereof of a treatment.
  • 29:02Now very briefly,
  • 29:03because this is a Bayesian model averaging framework,
  • 29:06just one of the few formulas I have in the presentation,
  • 29:08we just see over here that we have
  • 29:10the way we calculate these posterior model weights
  • 29:13as the prior on each model
  • 29:15multiplied by an integrated marginal likelihood.
  • 29:18Essentially, we can think of that as saying
  • 29:20based off of that square we saw on the previous slide
  • 29:22and combining those different data sources,
  • 29:25what is that estimate of the effect
  • 29:26with those different combinations?
  • 29:29One unique thing about the MEM framework
  • 29:31that differs from Bayesian model averaging though is that
  • 29:34we actually specify priors with respect to these sources.
  • 29:37And in the case of this example
  • 29:39with only two supplemental like sources for our graphic,
  • 29:42it's not a great cost savings,
  • 29:45but we can imagine that if we have more and more sources,
  • 29:47there's actually two to the P if P's the number of sources,
  • 29:50combinations of exchange ability
  • 29:52that we have to consider and model.
  • 29:54And that quickly can become overwhelming if we have
  • 29:56multiple sources that we have to define
  • 29:58for each one of those squares,
  • 29:59what's my prior that each combination of exchangeability
  • 30:02is potentially true.
  • 30:04Versus if we define it with respect to the source,
  • 30:06we now go from two to the P priors to just P priors
  • 30:09we have to specify for exchangeability.
  • 30:14A few more notes about this idea here
  • 30:17and just really zooming in on
  • 30:19what we're gonna focus on for today's presentation.
  • 30:21We have developed both fully
  • 30:23and empirically Bayesian prior approaches here,
  • 30:25fully Bayesian meaning that it is defined a priori
  • 30:29and is agnostic to the data you've collected,
  • 30:31empirically Bayesian meaning
  • 30:32we leverage the data we've collected
  • 30:34to help inform that prior for what we've observed.
  • 30:38Specifically there is a what we call a
  • 30:40non constrained, or naive,
  • 30:42empirically based prior
  • 30:43where we would look through all of those growths we had
  • 30:45and we would say, "Whichever one of these
  • 30:47maximizes the integrated marginal likelihood
  • 30:49that's the correct configuration
  • 30:51and we're gonna put all of our eggs into that basket."
  • 30:53Or 100% of the probability there
  • 30:55and that's the only model we use for analysis.
  • 30:59We know, generally speaking,
  • 31:00since we went to all the work to defining
  • 31:02all of these different combinations of exchangeability
  • 31:04and that it's based off of samples,
  • 31:06potentially small samples,
  • 31:07that this can be a very strong assumption.
  • 31:10And so we can also modify this prior
  • 31:12to what we call a constrained EB prior,
  • 31:15where instead of just giving everyone of those model
  • 31:18sources in that MEM that
  • 31:20maximizes the likelihood 100% weight,
  • 31:23we instead give it a weight of what we're calling just B.
  • 31:25This is our hyper prior value here
  • 31:28where if it's a value of zero or up to one,
  • 31:31it'll control the amount of borrowing
  • 31:32and allow other nested models of exchangeability
  • 31:36to also be potentially considered for analysis.
  • 31:39So for example,
  • 31:40if we do set a value of one
  • 31:42that actually replicates the non constrained EB prior
  • 31:44and really aggressively borrows from one specific model.
  • 31:48At the other extreme here, if we set a value of zero,
  • 31:50we essentially recreate an independent analysis
  • 31:53like assign a two stage design or just using those
  • 31:55Bayesian methods for futility monitoring
  • 31:57that doesn't share information.
  • 31:59And then any value in between
  • 32:00gives a little more granularity or control
  • 32:03over the amount of borrowing.
  • 32:06So with that background behind us,
  • 32:09I'm gonna introduce the simulation stuff
  • 32:11and then present results for a couple
  • 32:13key operating characteristics for our trial.
  • 32:16In this case, we're going to assume for our simulations
  • 32:18that we have a basket trial
  • 32:19with 10 different baskets or indications.
  • 32:21So again, that's 10 different types of cancer
  • 32:23that we have enrolled that all have
  • 32:25the same genetic mutation that we think is targeted
  • 32:28by the therapy of interest.
  • 32:31Like we had before,
  • 32:32we're going to assume a null response P knot of 0.1 or 10%.
  • 32:36And an alternative response rate of 30% or P1 here.
  • 32:41We are gonna compare then three different designs
  • 32:43that we just spent some time introducing and outlining.
  • 32:46The first is a Simon minimax two-stage design
  • 32:49using that exact set up that we had before
  • 32:52where we will enroll 16 people,
  • 32:54determine if we have one or fewer observations of success.
  • 32:56If so, stop the trial.
  • 32:58If not, continue on.
  • 33:00In the second case,
  • 33:01we're going to implement a Bayesian design
  • 33:03that uses predictive probability monitoring
  • 33:05but we don't use any information sharing
  • 33:07just to illustrate that we can at least
  • 33:09potentially improve upon the frequency
  • 33:12in use of a interim monitoring above a single look
  • 33:14from the Simon minimax design.
  • 33:17And then the third design
  • 33:18will add another layer of complexity
  • 33:20where we will try to share information across baskets
  • 33:23that have what we estimate to be exchangeable subgroups.
  • 33:28One thing to note here is that
  • 33:29we are setting this hyper parameter value B at 0.1.
  • 33:32This is a fairly conservative value
  • 33:34and admittedly for this design
  • 33:36we actually did not calibrate specifically
  • 33:39for the amount of borrowing to be 0.1.
  • 33:40This is actually based off of
  • 33:41some other prior work we've done
  • 33:43and published on basket trials that just showed that
  • 33:45in the case of an empirically Bayesian prior for MEMs,
  • 33:49this actually allows information sharing
  • 33:51in cases where there's a high degree of exchangeability
  • 33:53and low heterogeneity
  • 33:55and down leap it in cases where we might be
  • 33:57a little more uncertain,
  • 33:58so it's a little more conservative
  • 33:59but we'll see in the simulation results
  • 34:01there are some potential benefits.
  • 34:05For each of the scenarios we're gonna look at today,
  • 34:08we will generate a thousand trials
  • 34:10with a maximum sample size of 25 per basket.
  • 34:14We're gonna look at two cases,
  • 34:16there's a few other in the paper
  • 34:17but we're gonna focus on first the global scenario
  • 34:20where all the baskets are either null
  • 34:21or all 10 baskets have some meaningful effect.
  • 34:24And this is the setting where
  • 34:25information sharing methods like meds
  • 34:27really should outperform anything else
  • 34:29because everything is truly exchangeable
  • 34:31and everything could naively be pooled together
  • 34:34because we're simulating them to have the same response.
  • 34:37We'll then look at what happens if we actually have
  • 34:39a mixed scenario,
  • 34:40which I think is actually more indicative
  • 34:42of what's happened in practice
  • 34:43with some of the published basket trials
  • 34:45and clinically what we've seen from applications
  • 34:47of these types of designs.
  • 34:49Specifically here, we're gonna look at the case where
  • 34:52there are eight null baskets and two alternative baskets.
  • 34:57A few other points just to highlight here.
  • 34:59We're going to assume a beta 0.5 0.5 prior
  • 35:02for our Bayesian models.
  • 35:04This essentially for a binary outcome can be thought of as
  • 35:06adding half of a response
  • 35:08and half of a lack of a response to our observed data.
  • 35:12We're going to look at the most extreme dream Bayesian case
  • 35:16of doing utility monitoring
  • 35:18or any type of interim monitoring continually.
  • 35:20So after every single participant's enrolled
  • 35:22we will do a calculation
  • 35:24and determine if we should stop the trial.
  • 35:27We will then look at the effect of this choice
  • 35:29across a range of predictive probability thresholds
  • 35:34ranging from 0%,
  • 35:35meaning we wouldn't stop early at all,
  • 35:37up to 50% saying if there's anything less
  • 35:39than a 50% chance I'll find success,
  • 35:41I wanna stop that trial.
  • 35:44And then finally it's worth noting
  • 35:45we're actually also completely disregarding calibration
  • 35:49for this interim monitoring.
  • 35:51And so what we're gonna do is
  • 35:52we're gonna calibrate our decision rules
  • 35:54for the posterior probability at the end of the trial
  • 35:57based off of a global scenario where
  • 35:59we think it's ideal to share information
  • 36:02and we're all not gonna account for the fact that
  • 36:03we're doing interim looks at the data.
  • 36:05Part of the question here was
  • 36:07if we truly do all these assumptions
  • 36:08and we do sort of the most naive thing,
  • 36:11how badly do we actually do?
  • 36:13Like is there enough reason to fear the results
  • 36:16if we don't correctly calibrate for everything here?
  • 36:21So I'm gonna paint some pictures here building from the
  • 36:24simpler Simon design to our more complex Bayesian designs
  • 36:27and then with information sharing
  • 36:29just to illustrate three different properties.
  • 36:31I'm gonna go fairly quickly
  • 36:33'cause I know that you all have to vacate the classroom
  • 36:35in about 10 minutes.
  • 36:37So for the global scenario that we're looking at here,
  • 36:41the like rate lines are going to represent
  • 36:44the alternative basket scenario.
  • 36:46So all, in this case, all 10 null baskets.
  • 36:49Here we see we plan for 90% power
  • 36:51Simon's design appropriately achieved
  • 36:53that rejection rate of 90%.
  • 36:55Likewise, the lines at the bottom here,
  • 36:58these black lines,
  • 36:59are going to represent the results of null baskets.
  • 37:01Here are the global null scenario
  • 37:03and we see that it achieves a 10% rejection rate.
  • 37:06Now, this is a flat line here
  • 37:08because again Simon's design is agnostic to things like
  • 37:11the predictive probability.
  • 37:13Now if we do frequent Bayesian monitoring,
  • 37:16we see two interesting things here with these new lines.
  • 37:19We see that at the top
  • 37:21and the bottom, here I add these circles
  • 37:23where the predictive probability threshold is 0%.
  • 37:25This does represent the actual design
  • 37:27that would correspond to the actual calibration we did
  • 37:29without interim monitoring.
  • 37:31And we see that it is possible with Bayesian approaches
  • 37:34to achieve the same frequent operating characteristics
  • 37:37that we would achieve with something like the Simon design.
  • 37:40We can see though that if we want to do interim monitoring
  • 37:43but we didn't calibrate
  • 37:44or think of that in our calculations,
  • 37:46we do see this trade off where we have our
  • 37:49alternative baskets having a decreasing power
  • 37:51or rejection rate as the aggressiveness of the
  • 37:53predictive probability threshold increases.
  • 37:56And likewise the type one error rate or the
  • 37:59rejection rate of the marginal baskets also decreases.
  • 38:02Now if we add information sharing to this design,
  • 38:06we actually see some encouraging results
  • 38:08in this global scenario.
  • 38:09First, it's worth noting that in the case
  • 38:11where we actually calibrated for,
  • 38:12we actually see an increase in power from 90% to about 97%.
  • 38:17And even when we actually have a
  • 38:1910% predictive probability threshold for interim monitoring,
  • 38:23we see that we actually still achieve 90% power
  • 38:26with a corresponding reduction in that type one error rate.
  • 38:30Of course, this is with the caveat that
  • 38:32this is the ideal setting for sharing information
  • 38:35because all of the baskets are truly exchangeable.
  • 38:38Now the rejection rate correlates to something we call
  • 38:41that expected sample size.
  • 38:42What is the average sample size we might enroll
  • 38:44for each basket of our 10 baskets in the trial?
  • 38:47We see here that in the case of a null basket
  • 38:50the Simon design is about 20.
  • 38:53If we do interim monitoring with Bayesian approaches
  • 38:56and no information sharing,
  • 38:57obviously if we don't do any interim looks at the data,
  • 38:59we have a 0% threshold,
  • 39:01we're gonna have a sample size of 25 every single time.
  • 39:05I think what's encouraging though is that
  • 39:06by looking fairly aggressively we see that our sample size,
  • 39:09even with a very marginal
  • 39:11or low 5% threshold for futility monitoring,
  • 39:14drops from 20 in the assignment design to about 15
  • 39:18in the Bayesian design,
  • 39:19the trade-off of course being because we didn't calibrate.
  • 39:22We also see a reduction in the sample size
  • 39:24for the alternative baskets.
  • 39:28And if we add that layer of information sharing,
  • 39:30we actually see that we do slightly better than
  • 39:32the design without information sharing
  • 39:34while attenuating at the top here the effect
  • 39:37our solid gray line has for the alternative baskets.
  • 39:42Now, briefly tying this together then to the stopping rate,
  • 39:45which we can kind of infer from those past results,
  • 39:47we do see that on average the Simon two-stage design
  • 39:50for the null baskets stopping for futility
  • 39:52is only taking place a little over 50% of the time
  • 39:55in this simulation.
  • 39:57The advantage here though is that it is
  • 39:58very rarely stopping for the alternative baskets.
  • 40:02In our Bayesian approaches,
  • 40:03we see that there is an over 80%
  • 40:06of these low thresholds probability of stopping
  • 40:09if it's a null effect.
  • 40:10And this is ideal because we have 10 baskets.
  • 40:12And so these potential savings or effects
  • 40:14can compound themselves across these multiple baskets.
  • 40:18We then see that the design adding these solid lines
  • 40:21for information sharing do very similarly
  • 40:23where again the the consequence of not calibrating
  • 40:26are attenuated in this circumstance.
  • 40:30Now the thing to note here that
  • 40:31everything I presented on these few graphics
  • 40:34were with respect to the global scenario,
  • 40:36that ideal scenario that I actually don't think
  • 40:38is super realistic in practice.
  • 40:41So we see here, if we do a mixed scenario where
  • 40:44we now have calibrated for the global scenarios,
  • 40:46we've miscalibrated with respect to that.
  • 40:48We've also not calibrated for interim looks at the data.
  • 40:51We can actually see that the results for
  • 40:53the Simon two-stage in the Bayesian design
  • 40:55without information sharing are very similar
  • 40:58to what we saw before.
  • 40:59That's because they don't share information.
  • 41:00And so in this case with eight null baskets
  • 41:02into alternative baskets,
  • 41:04they have very similar responses.
  • 41:06This contrasts of course with the MEM approach
  • 41:09or the information sharing approach
  • 41:10where we actually see now
  • 41:12many of these results are actually overlapping
  • 41:15for information sharing and no information sharing.
  • 41:18What this tells us is that even though we miscalibrated
  • 41:21up and down the design,
  • 41:23we are actually able with this more conservative prior
  • 41:26to down weight borrowing
  • 41:27and effectuate similar results
  • 41:30that at lower thresholds for utility monitoring for example
  • 41:34at 5% can still show potential gains in efficiency relative
  • 41:38to the Simon design that could likely further be improved
  • 41:41with actual calibration.
  • 41:44So just as a reminder,
  • 41:45we demonstrated today
  • 41:46and introduced the idea of Simon's two-stage design
  • 41:48and some alternative methods to compete with them.
  • 41:51And some just brief discussion and concluding points.
  • 41:53There is no free lunch
  • 41:55and this is true regardless of where we are in statistics
  • 41:57that for example in our designs,
  • 41:59besides the fact that we miscalibrated
  • 42:01and made it a bit harder of a comparison for our methods,
  • 42:04we did try to replicate
  • 42:05what people might be doing in practice
  • 42:07or the challenge of
  • 42:07calibrating these designs into actuality.
  • 42:10Simon's two-stage design does have a lot of benefits
  • 42:13from it's ideal characteristics
  • 42:15that are easy to implement,
  • 42:17but it is limited in how often it may stop.
  • 42:20Our Bayesian designs,
  • 42:21with or without information sharing,
  • 42:22can lead to reductions in the expected sample size
  • 42:24in the null basket
  • 42:25and further could be improved
  • 42:27if we actually incorporate calibration,
  • 42:29which we further explored
  • 42:30in a statistical methods of medical research paper
  • 42:33published in 2020.
  • 42:35And so that I have some sources here
  • 42:36and I thank you for your attention
  • 42:37and welcome any questions or discussion at this point.
  • 42:56<v Man>Thank you so much. Any questions from the room?</v>
  • 43:12<v Student>Okay, so yeah, I have questions.</v>
  • 43:14So in the example you just showed,
  • 43:18all the like the task becomes so, can be achievable, right?
  • 43:22So if the baskets,
  • 43:25they are expected to have different benefits (indistinct),
  • 43:28and say the 10 basket (indistinct)
  • 43:33some other basket MEMs would allow a bigger benefit,
  • 43:37how will the (indistinct)
  • 43:45scenarios?
  • 43:48<v Alex>Yeah, well, I think,</v>
  • 43:49if I understood your question correctly
  • 43:51and I misheard through the phone, let me know,
  • 43:54but if we have different sample sizes for baskets,
  • 43:56which actually really corresponds
  • 43:58to what we've seen in practice for real basket trials
  • 44:01where they have fairly
  • 44:02wide range of sample sizes in each basket.
  • 44:06I think what we would see,
  • 44:07and let me see if I can pop back quickly to the
  • 44:09mixed scenario results here just to illustrate some ideas.
  • 44:13One of the concepts here that,
  • 44:14so we did explicitly look at that to say like,
  • 44:16"Well, what if one basket never gets beyond seven
  • 44:18of the 25," let's say.
  • 44:20But what we can infer is that
  • 44:21if a basket stopped early for futility,
  • 44:23it essentially has a smaller sample size to contribute
  • 44:26to any analysis whether or not it was a
  • 44:29falsely stopped basket that had a 30% effect
  • 44:32or it was truly a null basket.
  • 44:34And so we do see in this case that the method
  • 44:36averaging over those ideas of differential sample sizes
  • 44:39based off of soft baskets
  • 44:41does seem to be borrowing,
  • 44:43appropriately depending on the context.
  • 44:45So like the mixed scenario results here suggests
  • 44:47limited borrowing in the presence of that uncertainty
  • 44:50from the global scenario
  • 44:51because we didn't calibrate for anything else
  • 44:53it does show more of a benefit of the stopping rate
  • 44:56and other properties incorporating that data
  • 44:58even in small sample sizes.
  • 45:00And there's also been some other work
  • 45:02and illustrations done by Dr. Emily Zebra
  • 45:04at the Cleveland Clinic with who I work
  • 45:06about some of the re-analysis of oncology trials
  • 45:09that do show even in small basket sizes,
  • 45:11we can move that significance evaluation
  • 45:14into a more clinically meaningful realm.
  • 45:26<v Wayne>Thanks, so do we have other questions?</v>
  • 45:57Okay, so (indistinct) that's (indistinct).
  • 46:01Okay, so since there are no questions let's stop here.
  • 46:07(indistinct)
  • 46:16<v Alex>Yeah. Thank you all.</v>