Skip to Main Content

YSPH Biostatistics Seminar: "Sensitivity Analysis for Observational Studies"

September 10, 2020
  • 00:00- Seminar, so hello everyone.
  • 00:03My name is Qingyuan Zhao,
  • 00:05I'm currently a University Lecturer in Statistics
  • 00:10in University of Cambridge.
  • 00:13I visited Yale Biostats,
  • 00:15briefly last year in February.
  • 00:21And so it's nice to see every guest very shortly this time.
  • 00:28And today I'll talk
  • 00:30about sensitivity analysis for observational studies,
  • 00:34looking back and moving forward.
  • 00:37So this is based on ongoing work
  • 00:39with several people Bo Zhang, Ting Ye and Dylan Small
  • 00:45at University of Pennsylvania,
  • 00:46and also Joe Hogan at Brown University.
  • 00:52So sensitivity analysis is really a very broad term
  • 00:59and you can find in almost any area
  • 01:02that uses mathematical models.
  • 01:06So, broadly speaking,
  • 01:08what it tries to do is it studies how the uncertainty
  • 01:12in the input of a mathematical model or system,
  • 01:17numerical or otherwise can be apportioned
  • 01:20to different sources of uncertainty in it's input.
  • 01:24So it's an extremely broad concept.
  • 01:27And you can even fit statistics as part
  • 01:30of a sensitivity analysis in some sense.
  • 01:35But here, there can be a lot of kinds of model inputs.
  • 01:41So, in particular,
  • 01:43it can be any factor that can be changed in a model
  • 01:47prior to its execution.
  • 01:50So one example is structural
  • 01:53or epistemic sources of uncertainty.
  • 01:57And this is sort of the things we'll talk about.
  • 02:01So basically, what our talk about today
  • 02:03is those things that we don't really know.
  • 02:07I mean, we made a lot of assumptions
  • 02:09about when proposing such a model.
  • 02:13So in the context of observational studies,
  • 02:16a very common and typical question
  • 02:20that requires sensitivity analysis is the following.
  • 02:24How do the qualitative and or the quantitative conclusions
  • 02:29of the observational study change
  • 02:31if the no unmeasured confounding assumption is violated?
  • 02:35So this is really common because essentially,
  • 02:39in the vast majority of observational studies,
  • 02:42it's essential to assume this
  • 02:45no unmeasured confounding assumption,
  • 02:47and this is an assumption that we cannot test
  • 02:50with empirical data,
  • 02:52at least with just observational data.
  • 02:55So any, if you do any observational studies,
  • 02:59so you're almost bound to be asked this question
  • 03:02that, what if this assumption doesn't hold?
  • 03:06And I'd like to point out that this question
  • 03:08is fundamentally connected to missing not at random
  • 03:12in the missing data literature.
  • 03:14So what I will do today is I'll focus
  • 03:16on sensitivity analysis for observational studies,
  • 03:20but a lot of the ideas are drawn
  • 03:22from the missing data literature.
  • 03:24And most of the ideas that I'll talk about
  • 03:28today can be also applied there
  • 03:30and to related problems as well.
  • 03:35So, currently, a state of the art of sensitivity analysis
  • 03:40for observational studies is the following.
  • 03:43There are many, many masters gazillions of methods
  • 03:47of exaggeration, but certainly many many methods
  • 03:50that are specifically designed for different
  • 03:54kinds of sensitivity analysis.
  • 03:58It often also depends on how you analyze your data
  • 04:03under unmeasured confounding assumption.
  • 04:06There are various forms of statistical guarantees
  • 04:09that have been proposed.
  • 04:11And oftentimes, these methods are not always
  • 04:15straightforward to interpret,
  • 04:17at least for inexperienced researchers,
  • 04:20it can be quite complicated and confusing.
  • 04:26The goal of this talk is to give you a high level overview.
  • 04:31So this is not a talk where I'm gonna unveil
  • 04:34a lot of new methods.
  • 04:36This is more of an overview kind of talk
  • 04:40that just to try to go through
  • 04:42some of the main ideas in this area.
  • 04:46So in particular,
  • 04:47what I wanted to address is the following two questions.
  • 04:52What is the common structure behind
  • 04:54all these sensitivity analysis methods?
  • 04:57And what are some good principles and ideas we should follow
  • 05:02and perhaps extend when we have similar problems?
  • 05:06The perspective of this talk will be global and frequentist.
  • 05:10By that, I mean,
  • 05:12there's an area in sensitivity analysis
  • 05:14called local sensitivity analysis,
  • 05:16where you're only allowed to move your parameter
  • 05:19near its maximum likelihood estimate, usually.
  • 05:25But global sensitivity analysis refer to the method
  • 05:29that you can model your sensitivity parameter
  • 05:31freely in a space.
  • 05:35So that's what we'll focus on today.
  • 05:38And also, I'll take a frequentist perspective.
  • 05:40So I won't talk about Bayesian sensitivity analysis,
  • 05:44which is also a big area.
  • 05:46And I'll use this portal typical setup
  • 05:50in observational studies,
  • 05:52where you have iid copies of these observed data O,
  • 05:56which has three parts, x is the covariance,
  • 06:00A the binary treatment, Y is the outcome
  • 06:04and these observed observed data
  • 06:06that come from underlying full data, F,
  • 06:10which includes X and A
  • 06:13and the potential outcomes, Y(0) and Y(1).
  • 06:17Okay, so this is,
  • 06:19if you haven't, if most of you probably have seen this
  • 06:21many, many times already,
  • 06:24but if you haven't seen that this
  • 06:25is the most typical setup in observational studies.
  • 06:29And it kind of gets a little bit boring
  • 06:30when you see it so many times.
  • 06:32But what we're trying to do
  • 06:34is to use this as the simplest example,
  • 06:37to demonstrate the structure and ideas.
  • 06:41And hopefully, if you understand these good ideas,
  • 06:46you can apply them to your problems
  • 06:50that are maybe slightly more complicated than this.
  • 06:55So here's the outline
  • 06:57and I'll give a motivating example
  • 06:59then I'll talk about three components
  • 07:01in the sensitivity analysis.
  • 07:03There the sensitivity model,
  • 07:04the statistical inference and the interpretation.
  • 07:10So the motivating example will sort of demonstrate
  • 07:13where these three components come from.
  • 07:16So this example is in the social sciences actually
  • 07:21it's about child soldiering,
  • 07:24a paper by Blattman and Annan, 2010.
  • 07:30On the review of economics and statistics,
  • 07:34so what they studied is this period of time in Uganda,
  • 07:41from 1995 to 2004,
  • 07:44where there was a civil war
  • 07:46and about 60,000 to 80,000 youth
  • 07:49were abducted by a rebel force.
  • 07:53So the question is,
  • 07:54what is the impact of child soldiering
  • 07:58sort of this abduction by the rebel force,
  • 08:01as on various outcomes,
  • 08:04such as years of education,
  • 08:08and in this paper to actually study the number of outcomes.
  • 08:13The authors controlled for a variety of baseline covariates,
  • 08:17like the children's age, their household size,
  • 08:20their parental education, et cetera.
  • 08:23They were quite concerned about
  • 08:26this possible unmeasured confounder.
  • 08:28That is the child's ability to hide from the rebel.
  • 08:33So it's possible that maybe if this child is smart,
  • 08:39and if he knows that he or she knows
  • 08:41how to hide from the rebel,
  • 08:44then he's less likely to be abducted
  • 08:49to be in this data set.
  • 08:52And he'll probably also be more likely
  • 08:55to receive longer education just because maybe
  • 09:00the skin is a bit more small, let's say.
  • 09:06So in their analysis,
  • 09:07they follow the model proposed by Imbens,
  • 09:11which is the following.
  • 09:12So basically, they assume this no unmeasured confounding
  • 09:18after you conditional on this unmeasured confounder U.
  • 09:22Okay, so X are all covariates
  • 09:24that U controlled for,
  • 09:26and U is they assumed is a binary, unmeasured confounder.
  • 09:32That's just a coin flip.
  • 09:36And then they assume the logistic model
  • 09:39for the probability of being abducted
  • 09:44and the normal linear model for the potential outcomes.
  • 09:49So notice that here the linear these terms
  • 09:55depends on not only the observed covariance,
  • 09:58but also the unmeasured covariates U.
  • 10:01And of course,
  • 10:02we don't measure this U.
  • 10:04So we cannot directly fit these models.
  • 10:09But what they did is they because they made
  • 10:12some distribution assumptions on U,
  • 10:16you can treat U as unmeasured variable.
  • 10:19And then, for example,
  • 10:21fit maximum likelihood estimate.
  • 10:25So they're treated this two parameters lambda and delta,
  • 10:29as sensitivity parameters.
  • 10:32So these are the parameters that you vary
  • 10:35in a sensitivity analysis.
  • 10:37So when they're both equal to zero,
  • 10:39that means that there is no unmeasured confounding.
  • 10:43So you can actually just ignore this confounder U.
  • 10:46So it corresponds to your primary analysis,
  • 10:48but in a sensitivity analysis,
  • 10:50you change the values of lambda and U
  • 10:53and you see how that changes your result
  • 10:55above this parameter beta,
  • 10:57which is interpreted as a causal effect.
  • 11:02Okay, so the results can be summarized in this one slide.
  • 11:06I mean they've done a lot more definitely.
  • 11:08But for the purpose of this talk, basically,
  • 11:12what they found is that the primary analysis
  • 11:15found that the average treatment effect is -0.76.
  • 11:19So remember the outcome was years of education.
  • 11:21So being abducted,
  • 11:23has a significant negative effect on education.
  • 11:30And then it did a sensitivity analysis,
  • 11:32which can be summarized in this calibration plot.
  • 11:36What is shown here is that these two axis
  • 11:40are basically the two sensitivity parameters,
  • 11:43lambda and delta.
  • 11:45So what the paper did is they transform it
  • 11:48to the increase in R-squared.
  • 11:51But that's that can be mapped to lambda and delta,
  • 11:56and then they compared
  • 11:59this curve, so this dashed curve
  • 12:03is where the values of lambda and delta such that
  • 12:07the treatment in fact is reduced by half.
  • 12:11And then they compare this curve
  • 12:13with all the measured confounders,
  • 12:15like year and a location,
  • 12:17location of birth, year of birth, et cetera.
  • 12:21And then you compare it with the corresponding coefficients
  • 12:25of those variables in the model
  • 12:31and then they just plot these in the same figure.
  • 12:37What is supposed to show is that look,
  • 12:39this is the point where the treatment effect
  • 12:42is reduced by half,
  • 12:44and this is about the same strength
  • 12:47as location or birth alone.
  • 12:50So, if you think your unmeasured confounder is in some sense
  • 12:54as strong as the location or the year of birth,
  • 12:58then it is possible that the treatment infact,
  • 13:01is half of what it is estimated to be.
  • 13:05Okay, so it's a pretty neat way
  • 13:08to present a sensitivity analysis.
  • 13:12So in this example, you see,
  • 13:14there's three components of sensitivity analysis.
  • 13:17First is model augmentation.
  • 13:19And you need to expand the model used by primary analysis
  • 13:24to allow for unmeasured confounding.
  • 13:26Second, you need to do statistical inference.
  • 13:30So you vary the sensitivity parameter,
  • 13:32estimate the effect,
  • 13:33and then control some statistical errors.
  • 13:36So what they did
  • 13:38is, it's they essentially varied lambda and delta,
  • 13:42and they estimated the average treatment effect
  • 13:45under that lambda and delta.
  • 13:49And the third component is to interpret the results.
  • 13:52So this paper relied on that calibration plot
  • 13:56for that purpose.
  • 13:58But this is often quite a tricky
  • 14:01because the sensitivity analysis is complicated
  • 14:05as we need to probe different directions
  • 14:07of unmeasured confounding.
  • 14:09So the interpretation is actually not always straightforward
  • 14:14and sometimes can be quite complicated.
  • 14:19There did you have there do exist two issues
  • 14:23with this analysis.
  • 14:26So this is just the model and rewriting it.
  • 14:30The first issue is that actually the sensitivity parameters
  • 14:33lambda and Dota,
  • 14:34where we vary in a sensitivity analysis
  • 14:38are identifiable from the observed data.
  • 14:41This is because this is a perfect parametric model.
  • 14:44And then it's not constructed in any way
  • 14:47so that these lambda and delta are not identifiable.
  • 14:51In fact, in the next slide,
  • 14:53I'm going to show you some empirical evidence
  • 14:55that you can actually estimate these two parameters.
  • 14:59So, logically it is inconsistent for us
  • 15:02to vary the sensitivity parameter.
  • 15:05Because if we truly believe in this model
  • 15:07and the data actually tell us what the values
  • 15:09of lambda and delta is.
  • 15:11So this is the similar criticism
  • 15:13that for Hattman selection model, for example.
  • 15:20The second issue is a bit subtle
  • 15:23is that in a calibration plot,
  • 15:25what they did is they use the partial R squared
  • 15:27as a way to measure lambda and delta
  • 15:33in a more interpretable way
  • 15:36But actually the partial R squared for the observed
  • 15:38and unobserved confounders are not directly comparable.
  • 15:42This is because they're they use different reference model
  • 15:46to start with.
  • 15:48So, actually you need to be quite careful
  • 15:50about these interpretation this calibration quotes.
  • 15:56So, here is what I promised that suggests
  • 16:01that you can actually identify
  • 16:02these two sensitivity parameters lambda and delta.
  • 16:06So here the red dots
  • 16:08are the maximum likelihood estimators.
  • 16:11And then these solid curves this regions,
  • 16:14or the rejection,
  • 16:16or I should say acceptance region
  • 16:20for the likelihood ratio test.
  • 16:23So this is at level 0.50,
  • 16:26this is 0.10, this is 0.05.
  • 16:30There is a symmetry around the origin that's
  • 16:34because the U number is symmetric.
  • 16:37So, lambda like delta is the same
  • 16:41as minus lambda minus delta.
  • 16:43But what you see
  • 16:44is that you can actually estimate lambda and delta
  • 16:47and you can sort of estimate it
  • 16:50to be in a certain region.
  • 16:53So, something a bit interesting here
  • 16:56is that there's more you can say about Delta,
  • 17:01which is the parameter for the outcome,
  • 17:04than the parameter for the treatment lambda.
  • 17:09But in any case,
  • 17:11it didn't look like we can just vary
  • 17:13this parameter lambda delta freely in this space
  • 17:16and then expect to get different results
  • 17:19for each each point.
  • 17:23What we actually can get is some estimate
  • 17:25of this sensitivity parameters.
  • 17:28So the lesson here is that
  • 17:30if you use a parametric sensitivity models,
  • 17:32then they need to be carefully constructed
  • 17:35to avoid these kind of issues.
  • 17:40So next I'll talk about the first component
  • 17:43of the sensitivity analysis,
  • 17:44which is your sensitivity model.
  • 17:48So very generally,
  • 17:51if you think about what is the sensitivity model,
  • 17:54is essentially it's a model for the full data F,
  • 18:00that include some things that are not observed.
  • 18:03So, what we are trying to do here
  • 18:05is to infer the full data distribution
  • 18:08from some observed data, O.
  • 18:11So a sensitivity model is basically
  • 18:14a family of distributions of the full data,
  • 18:18is parameterized by two parameters theta and eta.
  • 18:23So, I'm using eta to stand for the sensitivity parameters
  • 18:27and theta is some other parameters
  • 18:29that parameterize the distribution.
  • 18:33So the sensitivity model needs to satisfy two properties.
  • 18:38So first of all,
  • 18:40if we set the sensitivity parameter eta to be equal to zero,
  • 18:44then that should correspond to our primary analysis
  • 18:48assuming no unmeasured confounders.
  • 18:49So I call this augmentation.
  • 18:51A second property is that given the value of the
  • 18:56of this sensitivity prior to eta,
  • 18:59then we can actually identify this parameters data
  • 19:03from the observed data.
  • 19:06So this is sort of a minimal assumption.
  • 19:08Otherwise, this model is simply too rich,
  • 19:12and so I call model identifiability.
  • 19:15So the statistical problem in sensitivity analysis
  • 19:18is that if I give you the value of eta
  • 19:20or the range of eta,
  • 19:23can you use observed data to make inference
  • 19:26about some causal parameter that is a function
  • 19:29of the theta and eta.
  • 19:32Okay, so this is a very general abstraction
  • 19:37of what we have seen in the previous example.
  • 19:43But it's a bit too general.
  • 19:45So let's make it slightly more concrete
  • 19:49by understanding these observational equivalence causes.
  • 19:55So essentially, what we're trying to do
  • 19:58is we observe some data,
  • 19:59but then we know there's an underlying full data
  • 20:02some other observe.
  • 20:05And instead of just modeling the observed data,
  • 20:08we're modeling the full data set.
  • 20:10So that makes our model quite rich,
  • 20:14because we're modeling something that are all observed.
  • 20:18For that purpose is useful to define this
  • 20:21observationally equivalence relation
  • 20:24between two full data distribution,
  • 20:27which just means that their implied
  • 20:30observed data distributions are exactly the same.
  • 20:34So we write this as this approximate equal
  • 20:39to this equivalence symbol.
  • 20:43So then we can define the equivalence class
  • 20:45of a distribution of a full data distribution,
  • 20:48which are all the other full data distributions
  • 20:51in this family that are observationally equivalent
  • 20:55to that distribution.
  • 20:58Then we can sort of classify these sensitivity models
  • 21:02based on the behavior of these equivalence classes.
  • 21:07So, what happened in the last example
  • 21:10is that the full data distribution full data model
  • 21:15is not rich enough.
  • 21:16So these equivalence classes are just singleton's
  • 21:20so can actually identify the sensitivity parameter eta
  • 21:24from the observed data.
  • 21:26So, this makes this model testable in some sense
  • 21:31with the choice of sensitivity parameter testable,
  • 21:35and this should generally be avoided in practice.
  • 21:39Then there are the global sensitivity models
  • 21:43where you can basically freely vary
  • 21:46the sensitivity parameter eta.
  • 21:48And for any eta you can always find the theta
  • 21:51such that it is observational equivalent
  • 21:54to where you started from.
  • 21:57And then even nicer models the separable model
  • 22:01where basically, this eta,
  • 22:04the sensitivity parameter doesn't change
  • 22:07the observation of the observed data distribution.
  • 22:12So for any theta and eta,
  • 22:14theta and eta is equivalent to theta and zero.
  • 22:18So these are really nice models to work with.
  • 22:22So understand the difference between global models
  • 22:26and separable models.
  • 22:28So basically, it's just that they have different shapes
  • 22:34of the equivalence classes.
  • 22:37So for separable models,
  • 22:40these equivalence classes,
  • 22:42needs to be perpendicular to the theta axis.
  • 22:46But that's not needed for global sensitivity models.
  • 22:53So I've talked about what a sensitivity model means
  • 22:57and some basic properties of it,
  • 23:00but haven't talked about how to build them.
  • 23:02So generally, in this setup,
  • 23:05there's three ways to build a sensitivity model.
  • 23:08And then they essentially correspond
  • 23:09with different factorizations
  • 23:11of the full data distribution.
  • 23:13So there's a simultaneous model
  • 23:15that tries to factorize distribution this way.
  • 23:19So introduces unmeasured confounder, U,
  • 23:22and then you need to model
  • 23:24these three conditional probabilities.
  • 23:27There's also the treatment model
  • 23:31that doesn't rely on this unmeasured confounder U.
  • 23:35But whether you need to specify is the distribution
  • 23:39of the treatment given the unmeasured cofounders and x.
  • 23:44And once you've specified that you can use Bayes formula
  • 23:46to get this part.
  • 23:50And then there's the outcome model that factorizes
  • 23:54this distribution in the other way.
  • 23:57So this is basically the propensity score
  • 24:00and the third turn is what we need to specify
  • 24:03it's a sensitivity parameter.
  • 24:06So in the missing data literature,
  • 24:09second model kind of model
  • 24:11is usually called selection model.
  • 24:13And the third kind of models usually called
  • 24:16pattern mixture model,
  • 24:17and there are other names that have been given to it.
  • 24:23And basically different sensitivity models,
  • 24:26they amount to different ways of specifying these
  • 24:31either non identifiable distributions,
  • 24:33which are these ones that are underlined.
  • 24:37A good review is this report by a committee
  • 24:42organized by the National Research Council.
  • 24:46This ongoing review paper that we're writing
  • 24:50also gives a comprehensive review of many models
  • 24:54that have been proposed using these factorizations.
  • 25:00Okay, so that's about the sensitivity model.
  • 25:03The next component is statistical inference.
  • 25:11Things get a little bit tricky here,
  • 25:14because there are two kinds of inference
  • 25:17or two modes of inference we can talk about
  • 25:19in this study.
  • 25:21So, the first mode of inference is point identify inference.
  • 25:24So you only care about a fixed value
  • 25:27of the sensitivity parameter eta.
  • 25:32And the second kind of inference
  • 25:34is partial identified inference,
  • 25:36where you perform the statistical inference simultaneously
  • 25:40for a range of security parameters eta.
  • 25:44And that range H is given to you.
  • 25:50And in these different modes of inferences,
  • 25:54it comes differences to core guarantees.
  • 25:57So for point identified inference usually let's say
  • 26:03for interval estimators,
  • 26:04you want to construct confidence intervals.
  • 26:08And these confidence intervals depend on the observed theta
  • 26:12and the sensitivity parameter which
  • 26:15your last to use
  • 26:17in a point of identified inference
  • 26:20and it must cover the true parameter
  • 26:23with one minus alpha probability
  • 26:25for all the distributions in your model.
  • 26:28Okay that's the infimum.
  • 26:30But for partial identified inference,
  • 26:35you're only allowed to use an interval
  • 26:38that depends on the range, H.
  • 26:41So, it cannot depend on a specific values
  • 26:43of the sensitivity parameter,
  • 26:46because you only know eta is in this range H.
  • 26:50It need to satisfy this very similar criteria.
  • 26:56So I call this intervals that satisfy this criteria
  • 26:59in the sensitivity interval.
  • 27:01But in the literature people have also called this
  • 27:03uncertainty interval and or just confidence interval.
  • 27:08But to make it different from the first case,
  • 27:11we're calling a sensitivity interval here.
  • 27:15So you can see that these two equations,
  • 27:19two criterias look very similar,
  • 27:22besides just that this interval needs to depend on the range
  • 27:25instead of a particular value of the sensitivity parameter.
  • 27:29But actually, they're quite different.
  • 27:31This is usually much wider.
  • 27:34The reason is,
  • 27:36you can actually write an equivalent form
  • 27:37of this equation one,
  • 27:40because this only depends on the observed data
  • 27:45and the range H.
  • 27:46Then for every theta in that,
  • 27:49sorry for every eta in that range H,
  • 27:52is missing here, eta in H and also
  • 27:56that's observationally equivalent to a two distribution.
  • 28:00This interval also needs to cover
  • 28:02the corresponding theta parameter.
  • 28:07So in that sense,
  • 28:08this is a much stronger guarantee that you have.
  • 28:16So, in terms of the statistical methods,
  • 28:21point identified inference is usually quite straightforward.
  • 28:26It's very similar to our primary analysis.
  • 28:29So, primary analysis just assumes this eta equals to zero,
  • 28:32but this sensitivity analysis assumes eta is known.
  • 28:36So usually you just you can just plug in
  • 28:38this eta in some way as an offset to your model.
  • 28:42And then everything works out in almost the same way
  • 28:45as a primary analysis.
  • 28:48But for partially identified analysis,
  • 28:50things become quite more challenging.
  • 28:55And there are several methods several approaches
  • 28:58that you can take.
  • 29:00So, essentially there are two big classes of methods,
  • 29:05one is bound estimation,
  • 29:08one is combining point identified inference.
  • 29:11So, for bond estimation,
  • 29:14it tries to directly make inference about the two ends
  • 29:18of this partial identify region.
  • 29:21So, this set this is the region of the parameter beta
  • 29:26that are sort of indistinguishable,
  • 29:29if I only know this sensitivity parameter eta is in H.
  • 29:35If we can somehow directly estimate the infimum and supremum
  • 29:40of this in this set,
  • 29:44but then that gotta get us a way
  • 29:46to make partial identified inference.
  • 29:50The second method is basically
  • 29:53to try to combine the results of point identified inference.
  • 29:59The main idea is to sort of construct
  • 30:02let's say interval estimators,
  • 30:05for each individual sensitivity parameter
  • 30:08and then take a union of them.
  • 30:11So, these are the two broad approaches
  • 30:14to the partially identified inference.
  • 30:18And so, within the first approach
  • 30:20the bound estimation approach,
  • 30:22there are also several variety of,
  • 30:25there are several possible methods
  • 30:27depending on your problem.
  • 30:29So, the first problem,
  • 30:31the first method is called separable balance.
  • 30:35But before that, let's just slightly change our notation
  • 30:39and parameterize this range H by a hyper parameter gamma.
  • 30:47So, this is useful when we outline these methods.
  • 30:52And then this beta L of gamma,
  • 30:55this is the lower end of the partial identify region.
  • 31:01So the first method is called separable bounds.
  • 31:06What it tries to do is to write this lower end
  • 31:11as a function of beta star and gamma,
  • 31:15where beta star is your primary analysis estimate.
  • 31:21So let's say theta star zero
  • 31:24is what you would do in a primary analysis
  • 31:27that is observationally equivalent to the true distribution.
  • 31:32And then, if beta star is the corresponding causal effect,
  • 31:37from that model,
  • 31:39and if somehow can write this lower end
  • 31:42as a function of beta star and gamma
  • 31:46and the function is known,
  • 31:47then our life is quite easy,
  • 31:50because we already know how to make inference
  • 31:53about beta star from the primary analysis.
  • 31:55And all we need to do is just plug in
  • 31:57that beta star in this formula,
  • 31:59and then we're all done.
  • 32:02And we call this separable because it allows us
  • 32:06to separate the primary analysis
  • 32:09from the sensitivity analysis.
  • 32:11And statistical inference becomes a trivial extension
  • 32:15of the primary analysis.
  • 32:17So, some examples of this kind of method
  • 32:20include the classical cornfields bound
  • 32:26and the E-value,
  • 32:27if you have heard about them,
  • 32:29and E-value seems quite popular
  • 32:31these days at demonology.
  • 32:37The second type of bound estimation
  • 32:41is called tractable bounds.
  • 32:45So, in these cases,
  • 32:48we may derive this lower bound as a function
  • 32:52of theta star and gamma.
  • 32:54So we are not able to reduce it to just depend
  • 32:58on beta star the causal effect
  • 33:00under no unmeasured confounding,
  • 33:04but we're able to express in terms of theta star.
  • 33:07And then the function gl is also some practical functions
  • 33:11that we can compute.
  • 33:13And then this also makes our lives quite a lot easier,
  • 33:17because we can just replace this theta star,
  • 33:21which can be nonparametric can be parametric,
  • 33:25by its empirical estimate.
  • 33:28And, often in these cases,
  • 33:31we can find some central limit theorems
  • 33:35for the corresponding sample estimator,
  • 33:38such that the sample estimator of the bounds
  • 33:42converges to its truth at root and rate
  • 33:46and it follows the normal limit.
  • 33:51And then if we can estimate this standard error,
  • 33:55then we can use this central limit theorem
  • 33:58to make partial identified inference
  • 34:02because we can estimate the bounds.
  • 34:07There's some examples in the literature,
  • 34:09you're familiar with these papers.
  • 34:12But one thing to be careful about
  • 34:14these kind of tractable bounds
  • 34:16is that things that get a little bit tricky
  • 34:21with syntactic theory.
  • 34:24This is because in a syntactic theory,
  • 34:27the confidence intervals or the sensitivity intervals
  • 34:30in this case,
  • 34:32can be point wise or uniform in terms of the sample size.
  • 34:38So it's possible that if the convergence,
  • 34:45if there are statistical guarantee is point wise,
  • 34:49then you sometimes in extreme cases,
  • 34:56even with very large sample size,
  • 34:58they're still exist data distributions
  • 35:01such that your coverage is very poor.
  • 35:05So this point is discussed very heavily
  • 35:08in econometrics literature.
  • 35:10And these are some references.
  • 35:15So that's the second type of method
  • 35:18in the first broad approach.
  • 35:22The third kind of method
  • 35:25is called stochastic programming.
  • 35:28And this applies when the model is separable.
  • 35:34So and we can write this parameter we're interested in
  • 35:40as some expectation of some function
  • 35:43of the theta and the sensitivity parameter eta.
  • 35:48Okay, so in this case,
  • 35:51the bound becomes the optimal value
  • 35:54for an optimization problem,
  • 35:56which you want to minimize expectation of some function.
  • 36:01And the parameter in this function is in some set
  • 36:05as defined by U.
  • 36:08So, this is known as stochastic programming.
  • 36:11So, this type of problem is known as stochastic programming
  • 36:14in the optimization literature.
  • 36:17And what people do there
  • 36:19is they sample from the distribution,
  • 36:22and then they try to use it to solve the empirical version
  • 36:26and try to use that as approximate solution
  • 36:29to this population optimization problem,
  • 36:33which we can't directly U value evaluate.
  • 36:36And the method is called sample average approximation
  • 36:39in the optimization literature.
  • 36:42So, what is shown there.
  • 36:47And Alex Shapiro did a lot of great work on this,
  • 36:51is that nice problems with compact set age,
  • 36:57and everything is euclidean.
  • 36:59So it's finite dimensional.
  • 37:01Then you actually have a central limit theorem
  • 37:04for the sample optimal value.
  • 37:07And this link, is a link between sensitivity analysis
  • 37:12and stochastic programming is made in this paper
  • 37:16by Tudball et al.
  • 37:20Okay, so that's the first broad approach
  • 37:23with doing bounds estimation.
  • 37:26The second broad approach is to combine the results
  • 37:29of points identified inference.
  • 37:32So, the first possibility is to take a union
  • 37:37of the individual confidence intervals.
  • 37:40Suppose these are the confidence intervals
  • 37:43when the sensitivity from eta is given.
  • 37:47Then, it is very simple to just apply a union bound
  • 37:51and to show that if you take a union
  • 37:54of these individual confidence intervals,
  • 37:57then they should satisfy the criteria
  • 38:01for sensitivity interval.
  • 38:03So now, if you take a union this interval only depends
  • 38:07on the range H,
  • 38:08and then you just apply the union bound
  • 38:12and get this formula from the first.
  • 38:17And this can be slightly improved
  • 38:20to cover not just these parameters,
  • 38:23but also the entire partial identified region
  • 38:27if the intervals if the confidence intervals
  • 38:30have the same tail probabilities.
  • 38:35So we discussed this in our paper.
  • 38:39And here, so, all we need to do
  • 38:43is to compute this union.
  • 38:46So, which essentially is an optimization problem
  • 38:49we'd like to minimize the lower bound,
  • 38:52that the lower confidence point Cl of eta over eta in H
  • 38:59and similarly for the upper bound.
  • 39:02And usually using of syntactic theory,
  • 39:05we can get some normal base confidence
  • 39:09intervals for each fixed eta.
  • 39:12And then we just need to optimize
  • 39:14this thing this confidence interval over eta.
  • 39:20But for many problems this can be
  • 39:22computationally challenging because the standard errors
  • 39:26are usually quite complicated
  • 39:30and it has some very nonlinear dependence
  • 39:32on the parameter eta.
  • 39:34So optimizing this can be tricky.
  • 39:40This is where another method of percentile bootstrap method
  • 39:44can greatly simplify the problem.
  • 39:47It's proposed by this paper that we wrote,
  • 39:53and what it does is instead of using
  • 39:56the syntactic confidence interval for fixed eta,
  • 40:01we use the percentile bootstrap interval.
  • 40:04Where we take theta samples,
  • 40:06and then you estimate the causal effect beta
  • 40:11in each resample and then take quantiles.
  • 40:15Okay, so if you use this confidence interval,
  • 40:19then there is a general,
  • 40:25generalized minimax inequality that allows us to construct
  • 40:29this percentile bootstrap sensitivity interval.
  • 40:33So what it does is this thing in the inside
  • 40:37is just the union of these percentile construct
  • 40:41intervals for fixed eta,
  • 40:45taken over eta in H.
  • 40:49And then this generalized minimax inequality
  • 40:51allows us to interchange the infimum with quanto
  • 40:57and the supremum of a quanto.
  • 41:00Okay, so the infimum of a quanto
  • 41:01is greater than equal to the quanto of infimum
  • 41:05and that it's always true.
  • 41:07So it's just a generalization
  • 41:09of the familia minimax inequality.
  • 41:13Now, if you look at this order interval,
  • 41:16this is much easier to compute,
  • 41:19because all it needs to do
  • 41:20is you gather data resample,
  • 41:25then you just need to repeat method 1.3.
  • 41:29So just get the infimum of the point estimate
  • 41:34for that resample and the supremum for that resample.
  • 41:37Then you do this over many, many resamples
  • 41:41and then you take the quantiles of the infimum,
  • 41:44lower of the infimum and upper quantile of the supremum,
  • 41:48and then you're done.
  • 41:50And because this union sensitivity interval
  • 41:53is always valid,
  • 41:55if the individual confidence intervals are valid.
  • 41:58So you almost got a very you got a free lunch
  • 42:02in some sense,
  • 42:03you don't need to show any heavy theory.
  • 42:06All you need to show is that
  • 42:08these percentile bootstrap intervals are valid
  • 42:11for each fixed eta,
  • 42:13which are much easier to establish in real problems.
  • 42:23And this is sort of selfish,
  • 42:25where I'd like to compare this idea
  • 42:27with Efron's bootstrap,
  • 42:29where what was found there
  • 42:31is that you've got a point estimator,
  • 42:33you resample your data,
  • 42:35and then many times and then use bootstrap
  • 42:38to get the confidence interval.
  • 42:41For partially identified inference,
  • 42:44you need to do a bit more.
  • 42:46So for each resample you need
  • 42:48to get extrema optimal estimator.
  • 42:52Then the minimax inequality allows you just
  • 42:55sort of transfer the intuition from the bootstrap,
  • 43:00for bootstrap from point identification
  • 43:02to partial identification.
  • 43:08So the third approach in this,
  • 43:11is a third method in this general approach
  • 43:14is to take the supremum of key value.
  • 43:15And this is used in Rosenbaum sensitivity analysis.
  • 43:18If you're familiar with that.
  • 43:22Essentially it's a hypothesis testing analog
  • 43:24of the Union confidence interval method.
  • 43:29What it does is that
  • 43:30if you have individually valid P values for a fixed eta,
  • 43:35then you just take the supremum of the P values
  • 43:38over all the etas in this range.
  • 43:41And that can be used for partially identified inference.
  • 43:46So what Rosenbaum did,
  • 43:49and Rosenbaum is really a pioneer in this area
  • 43:52in the partially identify sensitivity analysis.
  • 43:56So what he did was use randomization tests
  • 43:59to construct these key values.
  • 44:03So, this is usually done for matched observational studies
  • 44:07and the inside of this line of work
  • 44:12is that you can use these inequalities
  • 44:16particularly Holley's inequality
  • 44:19in probabilistic combinatorics
  • 44:22to efficiently compute these supremum of the P values.
  • 44:26So, usually what is done there is that
  • 44:30the Holley's inequality gives you a way
  • 44:32to upper bound the distribution of a that,
  • 44:39to upper bound family of distributions
  • 44:42in the stochastic dominance sense.
  • 44:45So, that is used to get these supremum of the P values.
  • 44:51And so, basically the idea is to use some theoretical tool
  • 44:59to simplify the computation.
  • 45:05Okay, so that's the statistical inference.
  • 45:08The third part, the third component
  • 45:10is interpretation of sensitivity analysis.
  • 45:13And this is the area that we actually really need
  • 45:17a lot of good work at the moment.
  • 45:20So, overall, there are two good ideas that seem to work,
  • 45:26that seem to improve the interpretation
  • 45:28of sensitivity analysis.
  • 45:30The first is sensitivity value,
  • 45:32the second is the calibration using measured confounders.
  • 45:36So the sensitivity value is basically
  • 45:38the value of the sensitivity parameter
  • 45:41or the hyper parameter,
  • 45:42where some qualitative conclusions about your study change.
  • 45:48And in our motivating example,
  • 45:51this is where the estimated average treatment effect
  • 45:55is reduced by half an Rosenbaum sensitivity analysis
  • 45:59if you are familiar with that.
  • 46:01This is where, this is the value of the gamma
  • 46:03in his model,
  • 46:05where we can no longer reject the causal null hypothesis.
  • 46:10So, this is can be seen as kind of an extension
  • 46:14of the idea of a P value.
  • 46:17So P value is used for primary analysis,
  • 46:19so assuming no unmeasure confounding,
  • 46:22and then for sensitivity analysis,
  • 46:24you can use the sensitivity value to sort of sorry,
  • 46:30that's the P value it basically measures
  • 46:33how likely your results,
  • 46:36your sort of false rejection is due to
  • 46:39sort of random chance.
  • 46:44But then what a sensitivity value does
  • 46:46is measures how much sort of how sensitive your resources is
  • 46:51in some sense, so, how much deviation
  • 46:53from the unmeasured confounding it takes
  • 46:55to alter your conclusion.
  • 46:58And for sensitivity value,
  • 47:01there often exists a phase transition phenomenon
  • 47:04for partially identified inference.
  • 47:07This is because if you take your hyper parameter gamma
  • 47:11to be very large,
  • 47:13then essentially your partially identify region
  • 47:15already covered in null.
  • 47:17So, no matter how large your sample size is
  • 47:20you can never reject null.
  • 47:23So, this is sort of an interesting phenomenon
  • 47:28and explained first discovered by Rosenbaum
  • 47:32in this paper I wrote also clarified some problems
  • 47:38some issues in both the phase transition.
  • 47:44So, the second idea is the calibration
  • 47:46using measured confounders.
  • 47:49So, you have already seen an example
  • 47:51in a motivating study.
  • 47:54It's really a very necessary and practical solution
  • 47:59to quantify the sensitivity,
  • 48:01because it's not really very useful if you tell people,
  • 48:05we are sensitive at gamma equals to two,
  • 48:08what does that really mean?
  • 48:09That depends on some mathematical model.
  • 48:13But if we can somehow compare that
  • 48:15with what we do observe,
  • 48:18and we have,
  • 48:20often the practitioners have some good sense
  • 48:23about what are the important confounders and what are not.
  • 48:27Then this really gives us a way to calibrate
  • 48:31and strengthen the conclusions of a sensitivity analysis.
  • 48:35But unfortunately, although there are some good heuristics
  • 48:38about the calibration,
  • 48:40they're often suffer from some subtle issues,
  • 48:44like the ones that I described
  • 48:46in the beginning of the talk.
  • 48:49If you carefully parameterize your models
  • 48:51this can become easier.
  • 48:54And this recent paper sort of explored this
  • 48:56in terms of linear models.
  • 49:01But really there's not a unifying framework
  • 49:04then you can cover more general cases
  • 49:08and lots of work are needed.
  • 49:11And when I was writing the slides,
  • 49:13I thought maybe what we really need
  • 49:15is to somehow build this calibration
  • 49:18into the sensitivity model.
  • 49:20Because currently our workflow is that
  • 49:22we assume a sensitivity model,
  • 49:24and we see where things get changed,
  • 49:26and then we try to interpret those values
  • 49:29where things get changed.
  • 49:31But suppose if we somehow build that,
  • 49:34if we left the range H eta to be defined
  • 49:38in terms of this calibration.
  • 49:40Perhaps gamma directly means some kind of comparisons
  • 49:45that measured confounders this would solve some
  • 49:49a lot of the issues.
  • 49:50This is just a thought I came up
  • 49:53when I was preparing for this talk.
  • 49:56Okay, so to summarize,
  • 49:59so there is number of messages,
  • 50:01which I hope you can take home.
  • 50:05There are three components of a sensitivity analysis.
  • 50:08Model augmentations, statistical inference
  • 50:11and the interpretation of sensitivity analysis.
  • 50:14So sensitivity model is about parameterizing,
  • 50:17the full data distribution.
  • 50:19And that's basically about over parameterizing
  • 50:23the observed data distribution.
  • 50:25And you can understand these models
  • 50:26by the observational equivalence classes.
  • 50:30You can get different model augmentations
  • 50:33by factorizing the distribution differently
  • 50:35and specify different models
  • 50:38for those that are on identifiable.
  • 50:41And there's a difference between point identified inference
  • 50:45and partially identified inference,
  • 50:47and partially identified inference is usually much harder.
  • 50:52And there are two general approaches
  • 50:55for partially identified inference,
  • 50:57bound estimation and combining point identified inference.
  • 51:02For interpretation of sensitivity analysis,
  • 51:05there seem to be two good ideas so far,
  • 51:08to use the sensitivity value,
  • 51:10and to calibrate that sensitivity value
  • 51:13using measured confounders.
  • 51:16But overall,
  • 51:18I'd say this is still a very,
  • 51:23this is still a very open area
  • 51:26that a lot of work is needed.
  • 51:28Even for this prototypical example
  • 51:31that people have studied for decades,
  • 51:33it seems there's still a lot of questions
  • 51:36that are unresolved.
  • 51:38And there are methods that need to be developed
  • 51:41for this sensitivity analysis
  • 51:45to be regularly used in practice.
  • 51:48And then there are many other related problems
  • 51:51in missing data in causal inference
  • 51:54that need to see more developments of sensitivity analysis.
  • 51:59So that's the end of my talk.
  • 52:01And there are some references that are used.
  • 52:05I'm happy to take any questions.
  • 52:08Still have about four minutes left.
  • 52:11- Thank you.
  • 52:13That yeah, thank you.
  • 52:14Thank you, I'm sorry I couldn't introduce you earlier,
  • 52:17but my connection but it did not to work.
  • 52:21So we have time for a couple of questions.
  • 52:26You can write the question in the chat box,
  • 52:29or just unmute yourselves.
  • 52:43Any questions?
  • 52:54I guess I'll start with a question.
  • 52:56Yeah I guess I'll start with a question.
  • 53:00This was a great connection between I think,
  • 53:04sensitivity analysis literature
  • 53:06and the missing data literature.
  • 53:09Which I think it's kind of overlooked.
  • 53:12Even when you when you run a prometric sensitivity analysis,
  • 53:17it's really something, like most of the times
  • 53:20people really don't understand
  • 53:22how much information is given.
  • 53:25Like, how much information the model actually gives
  • 53:29on the sensitivity parameters.
  • 53:32And as you said,
  • 53:34like it's kind of inconsistent
  • 53:36to set the sensitivity parameters
  • 53:37when sensitivity parameters are actually identified
  • 53:40by the model.
  • 53:43So I think like my I guess a question of like,
  • 53:46clarifying question is,
  • 53:49you mentioned there is this there this testable models,
  • 53:54this testable models essentially are wherein
  • 53:56the sensitivity model is such that
  • 54:00the sensitivity barometer are actually point identified.
  • 54:04Right?
  • 54:04- Yes.
  • 54:05So it re, so you said,
  • 54:08you reshooting use the sensitivity analysis
  • 54:11to actually to set the parameters
  • 54:14if the sensitivity parameters
  • 54:16are actually identified model.
  • 54:18- Yeah.
  • 54:19- Is that what you're trying?
  • 54:21All right, so and. - Yes, yeah.
  • 54:23Basically what happened there is the model is too specific,
  • 54:27and it wasn't constructed carefully.
  • 54:30So it's possible to construct parametric models
  • 54:33that are not testable that are perfectly fine.
  • 54:37But sometimes, if you just sort of
  • 54:40write down the most natural model,
  • 54:42if it just extend what the parametric model
  • 54:46you used for observed data to also model full data,
  • 54:52then you don't do it carefully,
  • 54:54then the entire full data distribution becomes identifiable.
  • 55:00So it does makes sense to treat those parameters
  • 55:02as sensitivity parameters.
  • 55:05So this kind of is a reminiscent of the discussion
  • 55:08in the 80s about the Hackmann selection model.
  • 55:12Because in that case,
  • 55:14there was also sir Hackmann has this great selection model
  • 55:18for reducing or getting rid of selection bias,
  • 55:23but it's based on very heavy parametric assumptions.
  • 55:27And you can adapt certainly identify the selection effect
  • 55:32directly from the model where you actually have no data
  • 55:36to support that identification.
  • 55:40Which led to some criticisms in the 80s.
  • 55:45But I think we are seeing this things repeatedly
  • 55:51again and again in different areas.
  • 55:55And it's, I think it's fine
  • 55:59to use the power metric models that are testable, actually,
  • 56:05if you really believe in those models,
  • 56:07but it doesn't seem that they should be used
  • 56:09this sensitivity analysis,
  • 56:12because just logically,
  • 56:13it's a bit strange.
  • 56:15It's hard to interpret those models.
  • 56:20And but sometimes I've also seen people
  • 56:24who use the sort of parameterize the model
  • 56:28in a way that you include enough terms.
  • 56:31So the sensitivity parameters are weakly identified
  • 56:35in a practical example.
  • 56:38So with a practical data set of maybe the likelihood test,
  • 56:44Likelihood Ratio Test rejection region,
  • 56:46that acceptance region is very, very large.
  • 56:50So there are a suggestions like that,
  • 56:53that kind of it's a sort of a compromise
  • 56:58for good practice.
  • 57:02- Right in that case you gave it either set the parameters
  • 57:06and drag the causal effects,
  • 57:09or kind of treat that as a partial identification problem
  • 57:13and just write use bounds or the methods
  • 57:17you were mentioning, I guess.
  • 57:20- Yeah.
  • 57:21- Yep, thanks.
  • 57:26Other questions?
  • 57:34Well I guess you can read the question?
  • 57:37- It's a question from Kiel Sint.
  • 57:41Sorry if I didn't pronounce your name correctly.
  • 57:43"In the applications of observational studies ideally,
  • 57:46what confounders should be collected
  • 57:47for sensitivity analysis,
  • 57:49power sensitivity analysis for unmeasured confounding?"
  • 57:54Thank you.
  • 57:54So if I understand your question correctly,
  • 58:01basically what sensitivity analysis does
  • 58:04is you have observational study,
  • 58:06where you for already collected confounders
  • 58:10that you believe are important or relevant
  • 58:13that really that are real confounders,
  • 58:16that they change the causal unchanged the treatment
  • 58:20and the outcome.
  • 58:22But often that's not enough.
  • 58:25And what sensitivity analysis does is it tries to say,
  • 58:29"based on what the components already
  • 58:33you have already collected,
  • 58:34what if there is still something missing
  • 58:37that we didn't collect?
  • 58:39And then if those things behave in a certain way,
  • 58:44does that change our results?"
  • 58:47So, I guess sensitivity analysis is always relative
  • 58:52to a primary analysis.
  • 58:54So I think you should use the same set of confounders
  • 58:58that the primary analysis uses.
  • 59:02I don't see a lot of reasons to vary to say
  • 59:10use a primary analysis with more confounders,
  • 59:14but a sensitivity analysis with fewer confounders.
  • 59:21Sensitivity analysis is really a supplement
  • 59:23to what you have in the primary analysis.
  • 59:35- Just one more question if we have?
  • 59:40There not.
  • 59:41Yes.
  • 59:43- So from Ching Hou Soo,
  • 59:45"How to specify the setup sensitivity parameter gamma
  • 59:49in the real life question?
  • 59:51When gamma is too large the inference results
  • 59:54will always be non informative?"
  • 59:57Yes, this is always a tricky problem any,
  • 01:00:01and essentially the sensitivity values kind of
  • 01:00:06trying to get past that.
  • 01:00:09So it tries to directly look at the value
  • 01:00:11of this sensitivity parameter that changes your conclusion.
  • 01:00:15So in some sense, you don't need to specify
  • 01:00:19a parameter a priori.
  • 01:00:21But obviously, in the end of the day,
  • 01:00:25we need some clue about what value of sensitivity parameter
  • 01:00:30is considered large.
  • 01:00:31In a practical sense, in this application.
  • 01:00:36That's something this calibration clause
  • 01:00:39this calibration analysis is trying to address.
  • 01:00:44But as I said,
  • 01:00:44they're not perfect at the moment.
  • 01:00:47So for some time, now, at the least,
  • 01:00:52we'll have to sort of live through this and
  • 01:00:56or will either need to understand really
  • 01:01:01what the sensitivity model means,
  • 01:01:02and then use your domain knowledge
  • 01:01:06to set the sensitivity parameter,
  • 01:01:10or we have to use these rely on these
  • 01:01:15imperfect visualization tools to calibrate analysis.
  • 01:01:28- Yeah, all right.
  • 01:01:29Thank you.
  • 01:01:30I think we need to wrap up we've run over time.
  • 01:01:33So thank you again Qingyuan,
  • 01:01:36for sharing your work with us.
  • 01:01:38And thank you, everyone for joining.
  • 01:01:41Thank you.
  • 01:01:42Bye bye.
  • 01:01:43See you next week.
  • 01:01:44- It's a great pleasure.
  • 01:01:45Thank you.