# YSPH Biostatistics Seminar: "Sensitivity Analysis for Observational Studies"

September 10, 2020- 00:00- Seminar, so hello everyone.
- 00:03My name is Qingyuan Zhao,
- 00:05I'm currently a University Lecturer in Statistics
- 00:10in University of Cambridge.
- 00:13I visited Yale Biostats,
- 00:15briefly last year in February.
- 00:21And so it's nice to see every guest very shortly this time.
- 00:28And today I'll talk
- 00:30about sensitivity analysis for observational studies,
- 00:34looking back and moving forward.
- 00:37So this is based on ongoing work
- 00:39with several people Bo Zhang, Ting Ye and Dylan Small
- 00:45at University of Pennsylvania,
- 00:46and also Joe Hogan at Brown University.
- 00:52So sensitivity analysis is really a very broad term
- 00:59and you can find in almost any area
- 01:02that uses mathematical models.
- 01:06So, broadly speaking,
- 01:08what it tries to do is it studies how the uncertainty
- 01:12in the input of a mathematical model or system,
- 01:17numerical or otherwise can be apportioned
- 01:20to different sources of uncertainty in it's input.
- 01:24So it's an extremely broad concept.
- 01:27And you can even fit statistics as part
- 01:30of a sensitivity analysis in some sense.
- 01:35But here, there can be a lot of kinds of model inputs.
- 01:41So, in particular,
- 01:43it can be any factor that can be changed in a model
- 01:47prior to its execution.
- 01:50So one example is structural
- 01:53or epistemic sources of uncertainty.
- 01:57And this is sort of the things we'll talk about.
- 02:01So basically, what our talk about today
- 02:03is those things that we don't really know.
- 02:07I mean, we made a lot of assumptions
- 02:09about when proposing such a model.
- 02:13So in the context of observational studies,
- 02:16a very common and typical question
- 02:20that requires sensitivity analysis is the following.
- 02:24How do the qualitative and or the quantitative conclusions
- 02:29of the observational study change
- 02:31if the no unmeasured confounding assumption is violated?
- 02:35So this is really common because essentially,
- 02:39in the vast majority of observational studies,
- 02:42it's essential to assume this
- 02:45no unmeasured confounding assumption,
- 02:47and this is an assumption that we cannot test
- 02:50with empirical data,
- 02:52at least with just observational data.
- 02:55So any, if you do any observational studies,
- 02:59so you're almost bound to be asked this question
- 03:02that, what if this assumption doesn't hold?
- 03:06And I'd like to point out that this question
- 03:08is fundamentally connected to missing not at random
- 03:12in the missing data literature.
- 03:14So what I will do today is I'll focus
- 03:16on sensitivity analysis for observational studies,
- 03:20but a lot of the ideas are drawn
- 03:22from the missing data literature.
- 03:24And most of the ideas that I'll talk about
- 03:28today can be also applied there
- 03:30and to related problems as well.
- 03:35So, currently, a state of the art of sensitivity analysis
- 03:40for observational studies is the following.
- 03:43There are many, many masters gazillions of methods
- 03:47of exaggeration, but certainly many many methods
- 03:50that are specifically designed for different
- 03:54kinds of sensitivity analysis.
- 03:58It often also depends on how you analyze your data
- 04:03under unmeasured confounding assumption.
- 04:06There are various forms of statistical guarantees
- 04:09that have been proposed.
- 04:11And oftentimes, these methods are not always
- 04:15straightforward to interpret,
- 04:17at least for inexperienced researchers,
- 04:20it can be quite complicated and confusing.
- 04:26The goal of this talk is to give you a high level overview.
- 04:31So this is not a talk where I'm gonna unveil
- 04:34a lot of new methods.
- 04:36This is more of an overview kind of talk
- 04:40that just to try to go through
- 04:42some of the main ideas in this area.
- 04:46So in particular,
- 04:47what I wanted to address is the following two questions.
- 04:52What is the common structure behind
- 04:54all these sensitivity analysis methods?
- 04:57And what are some good principles and ideas we should follow
- 05:02and perhaps extend when we have similar problems?
- 05:06The perspective of this talk will be global and frequentist.
- 05:10By that, I mean,
- 05:12there's an area in sensitivity analysis
- 05:14called local sensitivity analysis,
- 05:16where you're only allowed to move your parameter
- 05:19near its maximum likelihood estimate, usually.
- 05:25But global sensitivity analysis refer to the method
- 05:29that you can model your sensitivity parameter
- 05:31freely in a space.
- 05:35So that's what we'll focus on today.
- 05:38And also, I'll take a frequentist perspective.
- 05:40So I won't talk about Bayesian sensitivity analysis,
- 05:44which is also a big area.
- 05:46And I'll use this portal typical setup
- 05:50in observational studies,
- 05:52where you have iid copies of these observed data O,
- 05:56which has three parts, x is the covariance,
- 06:00A the binary treatment, Y is the outcome
- 06:04and these observed observed data
- 06:06that come from underlying full data, F,
- 06:10which includes X and A
- 06:13and the potential outcomes, Y(0) and Y(1).
- 06:17Okay, so this is,
- 06:19if you haven't, if most of you probably have seen this
- 06:21many, many times already,
- 06:24but if you haven't seen that this
- 06:25is the most typical setup in observational studies.
- 06:29And it kind of gets a little bit boring
- 06:30when you see it so many times.
- 06:32But what we're trying to do
- 06:34is to use this as the simplest example,
- 06:37to demonstrate the structure and ideas.
- 06:41And hopefully, if you understand these good ideas,
- 06:46you can apply them to your problems
- 06:50that are maybe slightly more complicated than this.
- 06:55So here's the outline
- 06:57and I'll give a motivating example
- 06:59then I'll talk about three components
- 07:01in the sensitivity analysis.
- 07:03There the sensitivity model,
- 07:04the statistical inference and the interpretation.
- 07:10So the motivating example will sort of demonstrate
- 07:13where these three components come from.
- 07:16So this example is in the social sciences actually
- 07:21it's about child soldiering,
- 07:24a paper by Blattman and Annan, 2010.
- 07:30On the review of economics and statistics,
- 07:34so what they studied is this period of time in Uganda,
- 07:41from 1995 to 2004,
- 07:44where there was a civil war
- 07:46and about 60,000 to 80,000 youth
- 07:49were abducted by a rebel force.
- 07:53So the question is,
- 07:54what is the impact of child soldiering
- 07:58sort of this abduction by the rebel force,
- 08:01as on various outcomes,
- 08:04such as years of education,
- 08:08and in this paper to actually study the number of outcomes.
- 08:13The authors controlled for a variety of baseline covariates,
- 08:17like the children's age, their household size,
- 08:20their parental education, et cetera.
- 08:23They were quite concerned about
- 08:26this possible unmeasured confounder.
- 08:28That is the child's ability to hide from the rebel.
- 08:33So it's possible that maybe if this child is smart,
- 08:39and if he knows that he or she knows
- 08:41how to hide from the rebel,
- 08:44then he's less likely to be abducted
- 08:49to be in this data set.
- 08:52And he'll probably also be more likely
- 08:55to receive longer education just because maybe
- 09:00the skin is a bit more small, let's say.
- 09:06So in their analysis,
- 09:07they follow the model proposed by Imbens,
- 09:11which is the following.
- 09:12So basically, they assume this no unmeasured confounding
- 09:18after you conditional on this unmeasured confounder U.
- 09:22Okay, so X are all covariates
- 09:24that U controlled for,
- 09:26and U is they assumed is a binary, unmeasured confounder.
- 09:32That's just a coin flip.
- 09:36And then they assume the logistic model
- 09:39for the probability of being abducted
- 09:44and the normal linear model for the potential outcomes.
- 09:49So notice that here the linear these terms
- 09:55depends on not only the observed covariance,
- 09:58but also the unmeasured covariates U.
- 10:01And of course,
- 10:02we don't measure this U.
- 10:04So we cannot directly fit these models.
- 10:09But what they did is they because they made
- 10:12some distribution assumptions on U,
- 10:16you can treat U as unmeasured variable.
- 10:19And then, for example,
- 10:21fit maximum likelihood estimate.
- 10:25So they're treated this two parameters lambda and delta,
- 10:29as sensitivity parameters.
- 10:32So these are the parameters that you vary
- 10:35in a sensitivity analysis.
- 10:37So when they're both equal to zero,
- 10:39that means that there is no unmeasured confounding.
- 10:43So you can actually just ignore this confounder U.
- 10:46So it corresponds to your primary analysis,
- 10:48but in a sensitivity analysis,
- 10:50you change the values of lambda and U
- 10:53and you see how that changes your result
- 10:55above this parameter beta,
- 10:57which is interpreted as a causal effect.
- 11:02Okay, so the results can be summarized in this one slide.
- 11:06I mean they've done a lot more definitely.
- 11:08But for the purpose of this talk, basically,
- 11:12what they found is that the primary analysis
- 11:15found that the average treatment effect is -0.76.
- 11:19So remember the outcome was years of education.
- 11:21So being abducted,
- 11:23has a significant negative effect on education.
- 11:30And then it did a sensitivity analysis,
- 11:32which can be summarized in this calibration plot.
- 11:36What is shown here is that these two axis
- 11:40are basically the two sensitivity parameters,
- 11:43lambda and delta.
- 11:45So what the paper did is they transform it
- 11:48to the increase in R-squared.
- 11:51But that's that can be mapped to lambda and delta,
- 11:56and then they compared
- 11:59this curve, so this dashed curve
- 12:03is where the values of lambda and delta such that
- 12:07the treatment in fact is reduced by half.
- 12:11And then they compare this curve
- 12:13with all the measured confounders,
- 12:15like year and a location,
- 12:17location of birth, year of birth, et cetera.
- 12:21And then you compare it with the corresponding coefficients
- 12:25of those variables in the model
- 12:31and then they just plot these in the same figure.
- 12:37What is supposed to show is that look,
- 12:39this is the point where the treatment effect
- 12:42is reduced by half,
- 12:44and this is about the same strength
- 12:47as location or birth alone.
- 12:50So, if you think your unmeasured confounder is in some sense
- 12:54as strong as the location or the year of birth,
- 12:58then it is possible that the treatment infact,
- 13:01is half of what it is estimated to be.
- 13:05Okay, so it's a pretty neat way
- 13:08to present a sensitivity analysis.
- 13:12So in this example, you see,
- 13:14there's three components of sensitivity analysis.
- 13:17First is model augmentation.
- 13:19And you need to expand the model used by primary analysis
- 13:24to allow for unmeasured confounding.
- 13:26Second, you need to do statistical inference.
- 13:30So you vary the sensitivity parameter,
- 13:32estimate the effect,
- 13:33and then control some statistical errors.
- 13:36So what they did
- 13:38is, it's they essentially varied lambda and delta,
- 13:42and they estimated the average treatment effect
- 13:45under that lambda and delta.
- 13:49And the third component is to interpret the results.
- 13:52So this paper relied on that calibration plot
- 13:56for that purpose.
- 13:58But this is often quite a tricky
- 14:01because the sensitivity analysis is complicated
- 14:05as we need to probe different directions
- 14:07of unmeasured confounding.
- 14:09So the interpretation is actually not always straightforward
- 14:14and sometimes can be quite complicated.
- 14:19There did you have there do exist two issues
- 14:23with this analysis.
- 14:26So this is just the model and rewriting it.
- 14:30The first issue is that actually the sensitivity parameters
- 14:33lambda and Dota,
- 14:34where we vary in a sensitivity analysis
- 14:38are identifiable from the observed data.
- 14:41This is because this is a perfect parametric model.
- 14:44And then it's not constructed in any way
- 14:47so that these lambda and delta are not identifiable.
- 14:51In fact, in the next slide,
- 14:53I'm going to show you some empirical evidence
- 14:55that you can actually estimate these two parameters.
- 14:59So, logically it is inconsistent for us
- 15:02to vary the sensitivity parameter.
- 15:05Because if we truly believe in this model
- 15:07and the data actually tell us what the values
- 15:09of lambda and delta is.
- 15:11So this is the similar criticism
- 15:13that for Hattman selection model, for example.
- 15:20The second issue is a bit subtle
- 15:23is that in a calibration plot,
- 15:25what they did is they use the partial R squared
- 15:27as a way to measure lambda and delta
- 15:33in a more interpretable way
- 15:36But actually the partial R squared for the observed
- 15:38and unobserved confounders are not directly comparable.
- 15:42This is because they're they use different reference model
- 15:46to start with.
- 15:48So, actually you need to be quite careful
- 15:50about these interpretation this calibration quotes.
- 15:56So, here is what I promised that suggests
- 16:01that you can actually identify
- 16:02these two sensitivity parameters lambda and delta.
- 16:06So here the red dots
- 16:08are the maximum likelihood estimators.
- 16:11And then these solid curves this regions,
- 16:14or the rejection,
- 16:16or I should say acceptance region
- 16:20for the likelihood ratio test.
- 16:23So this is at level 0.50,
- 16:26this is 0.10, this is 0.05.
- 16:30There is a symmetry around the origin that's
- 16:34because the U number is symmetric.
- 16:37So, lambda like delta is the same
- 16:41as minus lambda minus delta.
- 16:43But what you see
- 16:44is that you can actually estimate lambda and delta
- 16:47and you can sort of estimate it
- 16:50to be in a certain region.
- 16:53So, something a bit interesting here
- 16:56is that there's more you can say about Delta,
- 17:01which is the parameter for the outcome,
- 17:04than the parameter for the treatment lambda.
- 17:09But in any case,
- 17:11it didn't look like we can just vary
- 17:13this parameter lambda delta freely in this space
- 17:16and then expect to get different results
- 17:19for each each point.
- 17:23What we actually can get is some estimate
- 17:25of this sensitivity parameters.
- 17:28So the lesson here is that
- 17:30if you use a parametric sensitivity models,
- 17:32then they need to be carefully constructed
- 17:35to avoid these kind of issues.
- 17:40So next I'll talk about the first component
- 17:43of the sensitivity analysis,
- 17:44which is your sensitivity model.
- 17:48So very generally,
- 17:51if you think about what is the sensitivity model,
- 17:54is essentially it's a model for the full data F,
- 18:00that include some things that are not observed.
- 18:03So, what we are trying to do here
- 18:05is to infer the full data distribution
- 18:08from some observed data, O.
- 18:11So a sensitivity model is basically
- 18:14a family of distributions of the full data,
- 18:18is parameterized by two parameters theta and eta.
- 18:23So, I'm using eta to stand for the sensitivity parameters
- 18:27and theta is some other parameters
- 18:29that parameterize the distribution.
- 18:33So the sensitivity model needs to satisfy two properties.
- 18:38So first of all,
- 18:40if we set the sensitivity parameter eta to be equal to zero,
- 18:44then that should correspond to our primary analysis
- 18:48assuming no unmeasured confounders.
- 18:49So I call this augmentation.
- 18:51A second property is that given the value of the
- 18:56of this sensitivity prior to eta,
- 18:59then we can actually identify this parameters data
- 19:03from the observed data.
- 19:06So this is sort of a minimal assumption.
- 19:08Otherwise, this model is simply too rich,
- 19:12and so I call model identifiability.
- 19:15So the statistical problem in sensitivity analysis
- 19:18is that if I give you the value of eta
- 19:20or the range of eta,
- 19:23can you use observed data to make inference
- 19:26about some causal parameter that is a function
- 19:29of the theta and eta.
- 19:32Okay, so this is a very general abstraction
- 19:37of what we have seen in the previous example.
- 19:43But it's a bit too general.
- 19:45So let's make it slightly more concrete
- 19:49by understanding these observational equivalence causes.
- 19:55So essentially, what we're trying to do
- 19:58is we observe some data,
- 19:59but then we know there's an underlying full data
- 20:02some other observe.
- 20:05And instead of just modeling the observed data,
- 20:08we're modeling the full data set.
- 20:10So that makes our model quite rich,
- 20:14because we're modeling something that are all observed.
- 20:18For that purpose is useful to define this
- 20:21observationally equivalence relation
- 20:24between two full data distribution,
- 20:27which just means that their implied
- 20:30observed data distributions are exactly the same.
- 20:34So we write this as this approximate equal
- 20:39to this equivalence symbol.
- 20:43So then we can define the equivalence class
- 20:45of a distribution of a full data distribution,
- 20:48which are all the other full data distributions
- 20:51in this family that are observationally equivalent
- 20:55to that distribution.
- 20:58Then we can sort of classify these sensitivity models
- 21:02based on the behavior of these equivalence classes.
- 21:07So, what happened in the last example
- 21:10is that the full data distribution full data model
- 21:15is not rich enough.
- 21:16So these equivalence classes are just singleton's
- 21:20so can actually identify the sensitivity parameter eta
- 21:24from the observed data.
- 21:26So, this makes this model testable in some sense
- 21:31with the choice of sensitivity parameter testable,
- 21:35and this should generally be avoided in practice.
- 21:39Then there are the global sensitivity models
- 21:43where you can basically freely vary
- 21:46the sensitivity parameter eta.
- 21:48And for any eta you can always find the theta
- 21:51such that it is observational equivalent
- 21:54to where you started from.
- 21:57And then even nicer models the separable model
- 22:01where basically, this eta,
- 22:04the sensitivity parameter doesn't change
- 22:07the observation of the observed data distribution.
- 22:12So for any theta and eta,
- 22:14theta and eta is equivalent to theta and zero.
- 22:18So these are really nice models to work with.
- 22:22So understand the difference between global models
- 22:26and separable models.
- 22:28So basically, it's just that they have different shapes
- 22:34of the equivalence classes.
- 22:37So for separable models,
- 22:40these equivalence classes,
- 22:42needs to be perpendicular to the theta axis.
- 22:46But that's not needed for global sensitivity models.
- 22:53So I've talked about what a sensitivity model means
- 22:57and some basic properties of it,
- 23:00but haven't talked about how to build them.
- 23:02So generally, in this setup,
- 23:05there's three ways to build a sensitivity model.
- 23:08And then they essentially correspond
- 23:09with different factorizations
- 23:11of the full data distribution.
- 23:13So there's a simultaneous model
- 23:15that tries to factorize distribution this way.
- 23:19So introduces unmeasured confounder, U,
- 23:22and then you need to model
- 23:24these three conditional probabilities.
- 23:27There's also the treatment model
- 23:31that doesn't rely on this unmeasured confounder U.
- 23:35But whether you need to specify is the distribution
- 23:39of the treatment given the unmeasured cofounders and x.
- 23:44And once you've specified that you can use Bayes formula
- 23:46to get this part.
- 23:50And then there's the outcome model that factorizes
- 23:54this distribution in the other way.
- 23:57So this is basically the propensity score
- 24:00and the third turn is what we need to specify
- 24:03it's a sensitivity parameter.
- 24:06So in the missing data literature,
- 24:09second model kind of model
- 24:11is usually called selection model.
- 24:13And the third kind of models usually called
- 24:16pattern mixture model,
- 24:17and there are other names that have been given to it.
- 24:23And basically different sensitivity models,
- 24:26they amount to different ways of specifying these
- 24:31either non identifiable distributions,
- 24:33which are these ones that are underlined.
- 24:37A good review is this report by a committee
- 24:42organized by the National Research Council.
- 24:46This ongoing review paper that we're writing
- 24:50also gives a comprehensive review of many models
- 24:54that have been proposed using these factorizations.
- 25:00Okay, so that's about the sensitivity model.
- 25:03The next component is statistical inference.
- 25:11Things get a little bit tricky here,
- 25:14because there are two kinds of inference
- 25:17or two modes of inference we can talk about
- 25:19in this study.
- 25:21So, the first mode of inference is point identify inference.
- 25:24So you only care about a fixed value
- 25:27of the sensitivity parameter eta.
- 25:32And the second kind of inference
- 25:34is partial identified inference,
- 25:36where you perform the statistical inference simultaneously
- 25:40for a range of security parameters eta.
- 25:44And that range H is given to you.
- 25:50And in these different modes of inferences,
- 25:54it comes differences to core guarantees.
- 25:57So for point identified inference usually let's say
- 26:03for interval estimators,
- 26:04you want to construct confidence intervals.
- 26:08And these confidence intervals depend on the observed theta
- 26:12and the sensitivity parameter which
- 26:15your last to use
- 26:17in a point of identified inference
- 26:20and it must cover the true parameter
- 26:23with one minus alpha probability
- 26:25for all the distributions in your model.
- 26:28Okay that's the infimum.
- 26:30But for partial identified inference,
- 26:35you're only allowed to use an interval
- 26:38that depends on the range, H.
- 26:41So, it cannot depend on a specific values
- 26:43of the sensitivity parameter,
- 26:46because you only know eta is in this range H.
- 26:50It need to satisfy this very similar criteria.
- 26:56So I call this intervals that satisfy this criteria
- 26:59in the sensitivity interval.
- 27:01But in the literature people have also called this
- 27:03uncertainty interval and or just confidence interval.
- 27:08But to make it different from the first case,
- 27:11we're calling a sensitivity interval here.
- 27:15So you can see that these two equations,
- 27:19two criterias look very similar,
- 27:22besides just that this interval needs to depend on the range
- 27:25instead of a particular value of the sensitivity parameter.
- 27:29But actually, they're quite different.
- 27:31This is usually much wider.
- 27:34The reason is,
- 27:36you can actually write an equivalent form
- 27:37of this equation one,
- 27:40because this only depends on the observed data
- 27:45and the range H.
- 27:46Then for every theta in that,
- 27:49sorry for every eta in that range H,
- 27:52is missing here, eta in H and also
- 27:56that's observationally equivalent to a two distribution.
- 28:00This interval also needs to cover
- 28:02the corresponding theta parameter.
- 28:07So in that sense,
- 28:08this is a much stronger guarantee that you have.
- 28:16So, in terms of the statistical methods,
- 28:21point identified inference is usually quite straightforward.
- 28:26It's very similar to our primary analysis.
- 28:29So, primary analysis just assumes this eta equals to zero,
- 28:32but this sensitivity analysis assumes eta is known.
- 28:36So usually you just you can just plug in
- 28:38this eta in some way as an offset to your model.
- 28:42And then everything works out in almost the same way
- 28:45as a primary analysis.
- 28:48But for partially identified analysis,
- 28:50things become quite more challenging.
- 28:55And there are several methods several approaches
- 28:58that you can take.
- 29:00So, essentially there are two big classes of methods,
- 29:05one is bound estimation,
- 29:08one is combining point identified inference.
- 29:11So, for bond estimation,
- 29:14it tries to directly make inference about the two ends
- 29:18of this partial identify region.
- 29:21So, this set this is the region of the parameter beta
- 29:26that are sort of indistinguishable,
- 29:29if I only know this sensitivity parameter eta is in H.
- 29:35If we can somehow directly estimate the infimum and supremum
- 29:40of this in this set,
- 29:44but then that gotta get us a way
- 29:46to make partial identified inference.
- 29:50The second method is basically
- 29:53to try to combine the results of point identified inference.
- 29:59The main idea is to sort of construct
- 30:02let's say interval estimators,
- 30:05for each individual sensitivity parameter
- 30:08and then take a union of them.
- 30:11So, these are the two broad approaches
- 30:14to the partially identified inference.
- 30:18And so, within the first approach
- 30:20the bound estimation approach,
- 30:22there are also several variety of,
- 30:25there are several possible methods
- 30:27depending on your problem.
- 30:29So, the first problem,
- 30:31the first method is called separable balance.
- 30:35But before that, let's just slightly change our notation
- 30:39and parameterize this range H by a hyper parameter gamma.
- 30:47So, this is useful when we outline these methods.
- 30:52And then this beta L of gamma,
- 30:55this is the lower end of the partial identify region.
- 31:01So the first method is called separable bounds.
- 31:06What it tries to do is to write this lower end
- 31:11as a function of beta star and gamma,
- 31:15where beta star is your primary analysis estimate.
- 31:21So let's say theta star zero
- 31:24is what you would do in a primary analysis
- 31:27that is observationally equivalent to the true distribution.
- 31:32And then, if beta star is the corresponding causal effect,
- 31:37from that model,
- 31:39and if somehow can write this lower end
- 31:42as a function of beta star and gamma
- 31:46and the function is known,
- 31:47then our life is quite easy,
- 31:50because we already know how to make inference
- 31:53about beta star from the primary analysis.
- 31:55And all we need to do is just plug in
- 31:57that beta star in this formula,
- 31:59and then we're all done.
- 32:02And we call this separable because it allows us
- 32:06to separate the primary analysis
- 32:09from the sensitivity analysis.
- 32:11And statistical inference becomes a trivial extension
- 32:15of the primary analysis.
- 32:17So, some examples of this kind of method
- 32:20include the classical cornfields bound
- 32:26and the E-value,
- 32:27if you have heard about them,
- 32:29and E-value seems quite popular
- 32:31these days at demonology.
- 32:37The second type of bound estimation
- 32:41is called tractable bounds.
- 32:45So, in these cases,
- 32:48we may derive this lower bound as a function
- 32:52of theta star and gamma.
- 32:54So we are not able to reduce it to just depend
- 32:58on beta star the causal effect
- 33:00under no unmeasured confounding,
- 33:04but we're able to express in terms of theta star.
- 33:07And then the function gl is also some practical functions
- 33:11that we can compute.
- 33:13And then this also makes our lives quite a lot easier,
- 33:17because we can just replace this theta star,
- 33:21which can be nonparametric can be parametric,
- 33:25by its empirical estimate.
- 33:28And, often in these cases,
- 33:31we can find some central limit theorems
- 33:35for the corresponding sample estimator,
- 33:38such that the sample estimator of the bounds
- 33:42converges to its truth at root and rate
- 33:46and it follows the normal limit.
- 33:51And then if we can estimate this standard error,
- 33:55then we can use this central limit theorem
- 33:58to make partial identified inference
- 34:02because we can estimate the bounds.
- 34:07There's some examples in the literature,
- 34:09you're familiar with these papers.
- 34:12But one thing to be careful about
- 34:14these kind of tractable bounds
- 34:16is that things that get a little bit tricky
- 34:21with syntactic theory.
- 34:24This is because in a syntactic theory,
- 34:27the confidence intervals or the sensitivity intervals
- 34:30in this case,
- 34:32can be point wise or uniform in terms of the sample size.
- 34:38So it's possible that if the convergence,
- 34:45if there are statistical guarantee is point wise,
- 34:49then you sometimes in extreme cases,
- 34:56even with very large sample size,
- 34:58they're still exist data distributions
- 35:01such that your coverage is very poor.
- 35:05So this point is discussed very heavily
- 35:08in econometrics literature.
- 35:10And these are some references.
- 35:15So that's the second type of method
- 35:18in the first broad approach.
- 35:22The third kind of method
- 35:25is called stochastic programming.
- 35:28And this applies when the model is separable.
- 35:34So and we can write this parameter we're interested in
- 35:40as some expectation of some function
- 35:43of the theta and the sensitivity parameter eta.
- 35:48Okay, so in this case,
- 35:51the bound becomes the optimal value
- 35:54for an optimization problem,
- 35:56which you want to minimize expectation of some function.
- 36:01And the parameter in this function is in some set
- 36:05as defined by U.
- 36:08So, this is known as stochastic programming.
- 36:11So, this type of problem is known as stochastic programming
- 36:14in the optimization literature.
- 36:17And what people do there
- 36:19is they sample from the distribution,
- 36:22and then they try to use it to solve the empirical version
- 36:26and try to use that as approximate solution
- 36:29to this population optimization problem,
- 36:33which we can't directly U value evaluate.
- 36:36And the method is called sample average approximation
- 36:39in the optimization literature.
- 36:42So, what is shown there.
- 36:47And Alex Shapiro did a lot of great work on this,
- 36:51is that nice problems with compact set age,
- 36:57and everything is euclidean.
- 36:59So it's finite dimensional.
- 37:01Then you actually have a central limit theorem
- 37:04for the sample optimal value.
- 37:07And this link, is a link between sensitivity analysis
- 37:12and stochastic programming is made in this paper
- 37:16by Tudball et al.
- 37:20Okay, so that's the first broad approach
- 37:23with doing bounds estimation.
- 37:26The second broad approach is to combine the results
- 37:29of points identified inference.
- 37:32So, the first possibility is to take a union
- 37:37of the individual confidence intervals.
- 37:40Suppose these are the confidence intervals
- 37:43when the sensitivity from eta is given.
- 37:47Then, it is very simple to just apply a union bound
- 37:51and to show that if you take a union
- 37:54of these individual confidence intervals,
- 37:57then they should satisfy the criteria
- 38:01for sensitivity interval.
- 38:03So now, if you take a union this interval only depends
- 38:07on the range H,
- 38:08and then you just apply the union bound
- 38:12and get this formula from the first.
- 38:17And this can be slightly improved
- 38:20to cover not just these parameters,
- 38:23but also the entire partial identified region
- 38:27if the intervals if the confidence intervals
- 38:30have the same tail probabilities.
- 38:35So we discussed this in our paper.
- 38:39And here, so, all we need to do
- 38:43is to compute this union.
- 38:46So, which essentially is an optimization problem
- 38:49we'd like to minimize the lower bound,
- 38:52that the lower confidence point Cl of eta over eta in H
- 38:59and similarly for the upper bound.
- 39:02And usually using of syntactic theory,
- 39:05we can get some normal base confidence
- 39:09intervals for each fixed eta.
- 39:12And then we just need to optimize
- 39:14this thing this confidence interval over eta.
- 39:20But for many problems this can be
- 39:22computationally challenging because the standard errors
- 39:26are usually quite complicated
- 39:30and it has some very nonlinear dependence
- 39:32on the parameter eta.
- 39:34So optimizing this can be tricky.
- 39:40This is where another method of percentile bootstrap method
- 39:44can greatly simplify the problem.
- 39:47It's proposed by this paper that we wrote,
- 39:53and what it does is instead of using
- 39:56the syntactic confidence interval for fixed eta,
- 40:01we use the percentile bootstrap interval.
- 40:04Where we take theta samples,
- 40:06and then you estimate the causal effect beta
- 40:11in each resample and then take quantiles.
- 40:15Okay, so if you use this confidence interval,
- 40:19then there is a general,
- 40:25generalized minimax inequality that allows us to construct
- 40:29this percentile bootstrap sensitivity interval.
- 40:33So what it does is this thing in the inside
- 40:37is just the union of these percentile construct
- 40:41intervals for fixed eta,
- 40:45taken over eta in H.
- 40:49And then this generalized minimax inequality
- 40:51allows us to interchange the infimum with quanto
- 40:57and the supremum of a quanto.
- 41:00Okay, so the infimum of a quanto
- 41:01is greater than equal to the quanto of infimum
- 41:05and that it's always true.
- 41:07So it's just a generalization
- 41:09of the familia minimax inequality.
- 41:13Now, if you look at this order interval,
- 41:16this is much easier to compute,
- 41:19because all it needs to do
- 41:20is you gather data resample,
- 41:25then you just need to repeat method 1.3.
- 41:29So just get the infimum of the point estimate
- 41:34for that resample and the supremum for that resample.
- 41:37Then you do this over many, many resamples
- 41:41and then you take the quantiles of the infimum,
- 41:44lower of the infimum and upper quantile of the supremum,
- 41:48and then you're done.
- 41:50And because this union sensitivity interval
- 41:53is always valid,
- 41:55if the individual confidence intervals are valid.
- 41:58So you almost got a very you got a free lunch
- 42:02in some sense,
- 42:03you don't need to show any heavy theory.
- 42:06All you need to show is that
- 42:08these percentile bootstrap intervals are valid
- 42:11for each fixed eta,
- 42:13which are much easier to establish in real problems.
- 42:23And this is sort of selfish,
- 42:25where I'd like to compare this idea
- 42:27with Efron's bootstrap,
- 42:29where what was found there
- 42:31is that you've got a point estimator,
- 42:33you resample your data,
- 42:35and then many times and then use bootstrap
- 42:38to get the confidence interval.
- 42:41For partially identified inference,
- 42:44you need to do a bit more.
- 42:46So for each resample you need
- 42:48to get extrema optimal estimator.
- 42:52Then the minimax inequality allows you just
- 42:55sort of transfer the intuition from the bootstrap,
- 43:00for bootstrap from point identification
- 43:02to partial identification.
- 43:08So the third approach in this,
- 43:11is a third method in this general approach
- 43:14is to take the supremum of key value.
- 43:15And this is used in Rosenbaum sensitivity analysis.
- 43:18If you're familiar with that.
- 43:22Essentially it's a hypothesis testing analog
- 43:24of the Union confidence interval method.
- 43:29What it does is that
- 43:30if you have individually valid P values for a fixed eta,
- 43:35then you just take the supremum of the P values
- 43:38over all the etas in this range.
- 43:41And that can be used for partially identified inference.
- 43:46So what Rosenbaum did,
- 43:49and Rosenbaum is really a pioneer in this area
- 43:52in the partially identify sensitivity analysis.
- 43:56So what he did was use randomization tests
- 43:59to construct these key values.
- 44:03So, this is usually done for matched observational studies
- 44:07and the inside of this line of work
- 44:12is that you can use these inequalities
- 44:16particularly Holley's inequality
- 44:19in probabilistic combinatorics
- 44:22to efficiently compute these supremum of the P values.
- 44:26So, usually what is done there is that
- 44:30the Holley's inequality gives you a way
- 44:32to upper bound the distribution of a that,
- 44:39to upper bound family of distributions
- 44:42in the stochastic dominance sense.
- 44:45So, that is used to get these supremum of the P values.
- 44:51And so, basically the idea is to use some theoretical tool
- 44:59to simplify the computation.
- 45:05Okay, so that's the statistical inference.
- 45:08The third part, the third component
- 45:10is interpretation of sensitivity analysis.
- 45:13And this is the area that we actually really need
- 45:17a lot of good work at the moment.
- 45:20So, overall, there are two good ideas that seem to work,
- 45:26that seem to improve the interpretation
- 45:28of sensitivity analysis.
- 45:30The first is sensitivity value,
- 45:32the second is the calibration using measured confounders.
- 45:36So the sensitivity value is basically
- 45:38the value of the sensitivity parameter
- 45:41or the hyper parameter,
- 45:42where some qualitative conclusions about your study change.
- 45:48And in our motivating example,
- 45:51this is where the estimated average treatment effect
- 45:55is reduced by half an Rosenbaum sensitivity analysis
- 45:59if you are familiar with that.
- 46:01This is where, this is the value of the gamma
- 46:03in his model,
- 46:05where we can no longer reject the causal null hypothesis.
- 46:10So, this is can be seen as kind of an extension
- 46:14of the idea of a P value.
- 46:17So P value is used for primary analysis,
- 46:19so assuming no unmeasure confounding,
- 46:22and then for sensitivity analysis,
- 46:24you can use the sensitivity value to sort of sorry,
- 46:30that's the P value it basically measures
- 46:33how likely your results,
- 46:36your sort of false rejection is due to
- 46:39sort of random chance.
- 46:44But then what a sensitivity value does
- 46:46is measures how much sort of how sensitive your resources is
- 46:51in some sense, so, how much deviation
- 46:53from the unmeasured confounding it takes
- 46:55to alter your conclusion.
- 46:58And for sensitivity value,
- 47:01there often exists a phase transition phenomenon
- 47:04for partially identified inference.
- 47:07This is because if you take your hyper parameter gamma
- 47:11to be very large,
- 47:13then essentially your partially identify region
- 47:15already covered in null.
- 47:17So, no matter how large your sample size is
- 47:20you can never reject null.
- 47:23So, this is sort of an interesting phenomenon
- 47:28and explained first discovered by Rosenbaum
- 47:32in this paper I wrote also clarified some problems
- 47:38some issues in both the phase transition.
- 47:44So, the second idea is the calibration
- 47:46using measured confounders.
- 47:49So, you have already seen an example
- 47:51in a motivating study.
- 47:54It's really a very necessary and practical solution
- 47:59to quantify the sensitivity,
- 48:01because it's not really very useful if you tell people,
- 48:05we are sensitive at gamma equals to two,
- 48:08what does that really mean?
- 48:09That depends on some mathematical model.
- 48:13But if we can somehow compare that
- 48:15with what we do observe,
- 48:18and we have,
- 48:20often the practitioners have some good sense
- 48:23about what are the important confounders and what are not.
- 48:27Then this really gives us a way to calibrate
- 48:31and strengthen the conclusions of a sensitivity analysis.
- 48:35But unfortunately, although there are some good heuristics
- 48:38about the calibration,
- 48:40they're often suffer from some subtle issues,
- 48:44like the ones that I described
- 48:46in the beginning of the talk.
- 48:49If you carefully parameterize your models
- 48:51this can become easier.
- 48:54And this recent paper sort of explored this
- 48:56in terms of linear models.
- 49:01But really there's not a unifying framework
- 49:04then you can cover more general cases
- 49:08and lots of work are needed.
- 49:11And when I was writing the slides,
- 49:13I thought maybe what we really need
- 49:15is to somehow build this calibration
- 49:18into the sensitivity model.
- 49:20Because currently our workflow is that
- 49:22we assume a sensitivity model,
- 49:24and we see where things get changed,
- 49:26and then we try to interpret those values
- 49:29where things get changed.
- 49:31But suppose if we somehow build that,
- 49:34if we left the range H eta to be defined
- 49:38in terms of this calibration.
- 49:40Perhaps gamma directly means some kind of comparisons
- 49:45that measured confounders this would solve some
- 49:49a lot of the issues.
- 49:50This is just a thought I came up
- 49:53when I was preparing for this talk.
- 49:56Okay, so to summarize,
- 49:59so there is number of messages,
- 50:01which I hope you can take home.
- 50:05There are three components of a sensitivity analysis.
- 50:08Model augmentations, statistical inference
- 50:11and the interpretation of sensitivity analysis.
- 50:14So sensitivity model is about parameterizing,
- 50:17the full data distribution.
- 50:19And that's basically about over parameterizing
- 50:23the observed data distribution.
- 50:25And you can understand these models
- 50:26by the observational equivalence classes.
- 50:30You can get different model augmentations
- 50:33by factorizing the distribution differently
- 50:35and specify different models
- 50:38for those that are on identifiable.
- 50:41And there's a difference between point identified inference
- 50:45and partially identified inference,
- 50:47and partially identified inference is usually much harder.
- 50:52And there are two general approaches
- 50:55for partially identified inference,
- 50:57bound estimation and combining point identified inference.
- 51:02For interpretation of sensitivity analysis,
- 51:05there seem to be two good ideas so far,
- 51:08to use the sensitivity value,
- 51:10and to calibrate that sensitivity value
- 51:13using measured confounders.
- 51:16But overall,
- 51:18I'd say this is still a very,
- 51:23this is still a very open area
- 51:26that a lot of work is needed.
- 51:28Even for this prototypical example
- 51:31that people have studied for decades,
- 51:33it seems there's still a lot of questions
- 51:36that are unresolved.
- 51:38And there are methods that need to be developed
- 51:41for this sensitivity analysis
- 51:45to be regularly used in practice.
- 51:48And then there are many other related problems
- 51:51in missing data in causal inference
- 51:54that need to see more developments of sensitivity analysis.
- 51:59So that's the end of my talk.
- 52:01And there are some references that are used.
- 52:05I'm happy to take any questions.
- 52:08Still have about four minutes left.
- 52:11- Thank you.
- 52:13That yeah, thank you.
- 52:14Thank you, I'm sorry I couldn't introduce you earlier,
- 52:17but my connection but it did not to work.
- 52:21So we have time for a couple of questions.
- 52:26You can write the question in the chat box,
- 52:29or just unmute yourselves.
- 52:43Any questions?
- 52:54I guess I'll start with a question.
- 52:56Yeah I guess I'll start with a question.
- 53:00This was a great connection between I think,
- 53:04sensitivity analysis literature
- 53:06and the missing data literature.
- 53:09Which I think it's kind of overlooked.
- 53:12Even when you when you run a prometric sensitivity analysis,
- 53:17it's really something, like most of the times
- 53:20people really don't understand
- 53:22how much information is given.
- 53:25Like, how much information the model actually gives
- 53:29on the sensitivity parameters.
- 53:32And as you said,
- 53:34like it's kind of inconsistent
- 53:36to set the sensitivity parameters
- 53:37when sensitivity parameters are actually identified
- 53:40by the model.
- 53:43So I think like my I guess a question of like,
- 53:46clarifying question is,
- 53:49you mentioned there is this there this testable models,
- 53:54this testable models essentially are wherein
- 53:56the sensitivity model is such that
- 54:00the sensitivity barometer are actually point identified.
- 54:04Right?
- 54:04- Yes.
- 54:05So it re, so you said,
- 54:08you reshooting use the sensitivity analysis
- 54:11to actually to set the parameters
- 54:14if the sensitivity parameters
- 54:16are actually identified model.
- 54:18- Yeah.
- 54:19- Is that what you're trying?
- 54:21All right, so and. - Yes, yeah.
- 54:23Basically what happened there is the model is too specific,
- 54:27and it wasn't constructed carefully.
- 54:30So it's possible to construct parametric models
- 54:33that are not testable that are perfectly fine.
- 54:37But sometimes, if you just sort of
- 54:40write down the most natural model,
- 54:42if it just extend what the parametric model
- 54:46you used for observed data to also model full data,
- 54:52then you don't do it carefully,
- 54:54then the entire full data distribution becomes identifiable.
- 55:00So it does makes sense to treat those parameters
- 55:02as sensitivity parameters.
- 55:05So this kind of is a reminiscent of the discussion
- 55:08in the 80s about the Hackmann selection model.
- 55:12Because in that case,
- 55:14there was also sir Hackmann has this great selection model
- 55:18for reducing or getting rid of selection bias,
- 55:23but it's based on very heavy parametric assumptions.
- 55:27And you can adapt certainly identify the selection effect
- 55:32directly from the model where you actually have no data
- 55:36to support that identification.
- 55:40Which led to some criticisms in the 80s.
- 55:45But I think we are seeing this things repeatedly
- 55:51again and again in different areas.
- 55:55And it's, I think it's fine
- 55:59to use the power metric models that are testable, actually,
- 56:05if you really believe in those models,
- 56:07but it doesn't seem that they should be used
- 56:09this sensitivity analysis,
- 56:12because just logically,
- 56:13it's a bit strange.
- 56:15It's hard to interpret those models.
- 56:20And but sometimes I've also seen people
- 56:24who use the sort of parameterize the model
- 56:28in a way that you include enough terms.
- 56:31So the sensitivity parameters are weakly identified
- 56:35in a practical example.
- 56:38So with a practical data set of maybe the likelihood test,
- 56:44Likelihood Ratio Test rejection region,
- 56:46that acceptance region is very, very large.
- 56:50So there are a suggestions like that,
- 56:53that kind of it's a sort of a compromise
- 56:58for good practice.
- 57:02- Right in that case you gave it either set the parameters
- 57:06and drag the causal effects,
- 57:09or kind of treat that as a partial identification problem
- 57:13and just write use bounds or the methods
- 57:17you were mentioning, I guess.
- 57:20- Yeah.
- 57:21- Yep, thanks.
- 57:26Other questions?
- 57:34Well I guess you can read the question?
- 57:37- It's a question from Kiel Sint.
- 57:41Sorry if I didn't pronounce your name correctly.
- 57:43"In the applications of observational studies ideally,
- 57:46what confounders should be collected
- 57:47for sensitivity analysis,
- 57:49power sensitivity analysis for unmeasured confounding?"
- 57:54Thank you.
- 57:54So if I understand your question correctly,
- 58:01basically what sensitivity analysis does
- 58:04is you have observational study,
- 58:06where you for already collected confounders
- 58:10that you believe are important or relevant
- 58:13that really that are real confounders,
- 58:16that they change the causal unchanged the treatment
- 58:20and the outcome.
- 58:22But often that's not enough.
- 58:25And what sensitivity analysis does is it tries to say,
- 58:29"based on what the components already
- 58:33you have already collected,
- 58:34what if there is still something missing
- 58:37that we didn't collect?
- 58:39And then if those things behave in a certain way,
- 58:44does that change our results?"
- 58:47So, I guess sensitivity analysis is always relative
- 58:52to a primary analysis.
- 58:54So I think you should use the same set of confounders
- 58:58that the primary analysis uses.
- 59:02I don't see a lot of reasons to vary to say
- 59:10use a primary analysis with more confounders,
- 59:14but a sensitivity analysis with fewer confounders.
- 59:21Sensitivity analysis is really a supplement
- 59:23to what you have in the primary analysis.
- 59:35- Just one more question if we have?
- 59:40There not.
- 59:41Yes.
- 59:43- So from Ching Hou Soo,
- 59:45"How to specify the setup sensitivity parameter gamma
- 59:49in the real life question?
- 59:51When gamma is too large the inference results
- 59:54will always be non informative?"
- 59:57Yes, this is always a tricky problem any,
- 01:00:01and essentially the sensitivity values kind of
- 01:00:06trying to get past that.
- 01:00:09So it tries to directly look at the value
- 01:00:11of this sensitivity parameter that changes your conclusion.
- 01:00:15So in some sense, you don't need to specify
- 01:00:19a parameter a priori.
- 01:00:21But obviously, in the end of the day,
- 01:00:25we need some clue about what value of sensitivity parameter
- 01:00:30is considered large.
- 01:00:31In a practical sense, in this application.
- 01:00:36That's something this calibration clause
- 01:00:39this calibration analysis is trying to address.
- 01:00:44But as I said,
- 01:00:44they're not perfect at the moment.
- 01:00:47So for some time, now, at the least,
- 01:00:52we'll have to sort of live through this and
- 01:00:56or will either need to understand really
- 01:01:01what the sensitivity model means,
- 01:01:02and then use your domain knowledge
- 01:01:06to set the sensitivity parameter,
- 01:01:10or we have to use these rely on these
- 01:01:15imperfect visualization tools to calibrate analysis.
- 01:01:28- Yeah, all right.
- 01:01:29Thank you.
- 01:01:30I think we need to wrap up we've run over time.
- 01:01:33So thank you again Qingyuan,
- 01:01:36for sharing your work with us.
- 01:01:38And thank you, everyone for joining.
- 01:01:41Thank you.
- 01:01:42Bye bye.
- 01:01:43See you next week.
- 01:01:44- It's a great pleasure.
- 01:01:45Thank you.