# YSPH Biostatistics Seminar: "Sensitivity Analysis for Observational Studies"

September 10, 2020
• 00:00- Seminar, so hello everyone.
• 00:03My name is Qingyuan Zhao,
• 00:05I'm currently a University Lecturer in Statistics
• 00:10in University of Cambridge.
• 00:13I visited Yale Biostats,
• 00:15briefly last year in February.
• 00:21And so it's nice to see every guest very shortly this time.
• 00:28And today I'll talk
• 00:30about sensitivity analysis for observational studies,
• 00:34looking back and moving forward.
• 00:37So this is based on ongoing work
• 00:39with several people Bo Zhang, Ting Ye and Dylan Small
• 00:45at University of Pennsylvania,
• 00:46and also Joe Hogan at Brown University.
• 00:52So sensitivity analysis is really a very broad term
• 00:59and you can find in almost any area
• 01:02that uses mathematical models.
• 01:08what it tries to do is it studies how the uncertainty
• 01:12in the input of a mathematical model or system,
• 01:17numerical or otherwise can be apportioned
• 01:20to different sources of uncertainty in it's input.
• 01:24So it's an extremely broad concept.
• 01:27And you can even fit statistics as part
• 01:30of a sensitivity analysis in some sense.
• 01:35But here, there can be a lot of kinds of model inputs.
• 01:41So, in particular,
• 01:43it can be any factor that can be changed in a model
• 01:47prior to its execution.
• 01:50So one example is structural
• 01:53or epistemic sources of uncertainty.
• 01:57And this is sort of the things we'll talk about.
• 02:01So basically, what our talk about today
• 02:03is those things that we don't really know.
• 02:07I mean, we made a lot of assumptions
• 02:09about when proposing such a model.
• 02:13So in the context of observational studies,
• 02:16a very common and typical question
• 02:20that requires sensitivity analysis is the following.
• 02:24How do the qualitative and or the quantitative conclusions
• 02:29of the observational study change
• 02:31if the no unmeasured confounding assumption is violated?
• 02:35So this is really common because essentially,
• 02:39in the vast majority of observational studies,
• 02:42it's essential to assume this
• 02:45no unmeasured confounding assumption,
• 02:47and this is an assumption that we cannot test
• 02:50with empirical data,
• 02:52at least with just observational data.
• 02:55So any, if you do any observational studies,
• 02:59so you're almost bound to be asked this question
• 03:02that, what if this assumption doesn't hold?
• 03:06And I'd like to point out that this question
• 03:08is fundamentally connected to missing not at random
• 03:12in the missing data literature.
• 03:14So what I will do today is I'll focus
• 03:16on sensitivity analysis for observational studies,
• 03:20but a lot of the ideas are drawn
• 03:22from the missing data literature.
• 03:24And most of the ideas that I'll talk about
• 03:28today can be also applied there
• 03:30and to related problems as well.
• 03:35So, currently, a state of the art of sensitivity analysis
• 03:40for observational studies is the following.
• 03:43There are many, many masters gazillions of methods
• 03:47of exaggeration, but certainly many many methods
• 03:50that are specifically designed for different
• 03:54kinds of sensitivity analysis.
• 03:58It often also depends on how you analyze your data
• 04:03under unmeasured confounding assumption.
• 04:06There are various forms of statistical guarantees
• 04:09that have been proposed.
• 04:11And oftentimes, these methods are not always
• 04:15straightforward to interpret,
• 04:17at least for inexperienced researchers,
• 04:20it can be quite complicated and confusing.
• 04:26The goal of this talk is to give you a high level overview.
• 04:31So this is not a talk where I'm gonna unveil
• 04:34a lot of new methods.
• 04:36This is more of an overview kind of talk
• 04:40that just to try to go through
• 04:42some of the main ideas in this area.
• 04:46So in particular,
• 04:47what I wanted to address is the following two questions.
• 04:52What is the common structure behind
• 04:54all these sensitivity analysis methods?
• 04:57And what are some good principles and ideas we should follow
• 05:02and perhaps extend when we have similar problems?
• 05:06The perspective of this talk will be global and frequentist.
• 05:10By that, I mean,
• 05:12there's an area in sensitivity analysis
• 05:14called local sensitivity analysis,
• 05:16where you're only allowed to move your parameter
• 05:19near its maximum likelihood estimate, usually.
• 05:25But global sensitivity analysis refer to the method
• 05:29that you can model your sensitivity parameter
• 05:31freely in a space.
• 05:35So that's what we'll focus on today.
• 05:38And also, I'll take a frequentist perspective.
• 05:40So I won't talk about Bayesian sensitivity analysis,
• 05:44which is also a big area.
• 05:46And I'll use this portal typical setup
• 05:50in observational studies,
• 05:52where you have iid copies of these observed data O,
• 05:56which has three parts, x is the covariance,
• 06:00A the binary treatment, Y is the outcome
• 06:04and these observed observed data
• 06:06that come from underlying full data, F,
• 06:10which includes X and A
• 06:13and the potential outcomes, Y(0) and Y(1).
• 06:17Okay, so this is,
• 06:19if you haven't, if most of you probably have seen this
• 06:24but if you haven't seen that this
• 06:25is the most typical setup in observational studies.
• 06:29And it kind of gets a little bit boring
• 06:30when you see it so many times.
• 06:32But what we're trying to do
• 06:34is to use this as the simplest example,
• 06:37to demonstrate the structure and ideas.
• 06:41And hopefully, if you understand these good ideas,
• 06:46you can apply them to your problems
• 06:50that are maybe slightly more complicated than this.
• 06:55So here's the outline
• 06:57and I'll give a motivating example
• 06:59then I'll talk about three components
• 07:01in the sensitivity analysis.
• 07:03There the sensitivity model,
• 07:04the statistical inference and the interpretation.
• 07:10So the motivating example will sort of demonstrate
• 07:13where these three components come from.
• 07:16So this example is in the social sciences actually
• 07:24a paper by Blattman and Annan, 2010.
• 07:30On the review of economics and statistics,
• 07:34so what they studied is this period of time in Uganda,
• 07:41from 1995 to 2004,
• 07:44where there was a civil war
• 07:46and about 60,000 to 80,000 youth
• 07:49were abducted by a rebel force.
• 07:53So the question is,
• 07:54what is the impact of child soldiering
• 07:58sort of this abduction by the rebel force,
• 08:01as on various outcomes,
• 08:04such as years of education,
• 08:08and in this paper to actually study the number of outcomes.
• 08:13The authors controlled for a variety of baseline covariates,
• 08:17like the children's age, their household size,
• 08:20their parental education, et cetera.
• 08:23They were quite concerned about
• 08:26this possible unmeasured confounder.
• 08:28That is the child's ability to hide from the rebel.
• 08:33So it's possible that maybe if this child is smart,
• 08:39and if he knows that he or she knows
• 08:41how to hide from the rebel,
• 08:44then he's less likely to be abducted
• 08:49to be in this data set.
• 08:52And he'll probably also be more likely
• 08:55to receive longer education just because maybe
• 09:00the skin is a bit more small, let's say.
• 09:06So in their analysis,
• 09:07they follow the model proposed by Imbens,
• 09:11which is the following.
• 09:12So basically, they assume this no unmeasured confounding
• 09:18after you conditional on this unmeasured confounder U.
• 09:22Okay, so X are all covariates
• 09:24that U controlled for,
• 09:26and U is they assumed is a binary, unmeasured confounder.
• 09:32That's just a coin flip.
• 09:36And then they assume the logistic model
• 09:39for the probability of being abducted
• 09:44and the normal linear model for the potential outcomes.
• 09:49So notice that here the linear these terms
• 09:55depends on not only the observed covariance,
• 09:58but also the unmeasured covariates U.
• 10:01And of course,
• 10:02we don't measure this U.
• 10:04So we cannot directly fit these models.
• 10:09But what they did is they because they made
• 10:12some distribution assumptions on U,
• 10:16you can treat U as unmeasured variable.
• 10:19And then, for example,
• 10:21fit maximum likelihood estimate.
• 10:25So they're treated this two parameters lambda and delta,
• 10:29as sensitivity parameters.
• 10:32So these are the parameters that you vary
• 10:35in a sensitivity analysis.
• 10:37So when they're both equal to zero,
• 10:39that means that there is no unmeasured confounding.
• 10:43So you can actually just ignore this confounder U.
• 10:46So it corresponds to your primary analysis,
• 10:48but in a sensitivity analysis,
• 10:50you change the values of lambda and U
• 10:53and you see how that changes your result
• 10:55above this parameter beta,
• 10:57which is interpreted as a causal effect.
• 11:02Okay, so the results can be summarized in this one slide.
• 11:06I mean they've done a lot more definitely.
• 11:08But for the purpose of this talk, basically,
• 11:12what they found is that the primary analysis
• 11:15found that the average treatment effect is -0.76.
• 11:19So remember the outcome was years of education.
• 11:21So being abducted,
• 11:23has a significant negative effect on education.
• 11:30And then it did a sensitivity analysis,
• 11:32which can be summarized in this calibration plot.
• 11:36What is shown here is that these two axis
• 11:40are basically the two sensitivity parameters,
• 11:43lambda and delta.
• 11:45So what the paper did is they transform it
• 11:48to the increase in R-squared.
• 11:51But that's that can be mapped to lambda and delta,
• 11:56and then they compared
• 11:59this curve, so this dashed curve
• 12:03is where the values of lambda and delta such that
• 12:07the treatment in fact is reduced by half.
• 12:11And then they compare this curve
• 12:13with all the measured confounders,
• 12:15like year and a location,
• 12:17location of birth, year of birth, et cetera.
• 12:21And then you compare it with the corresponding coefficients
• 12:25of those variables in the model
• 12:31and then they just plot these in the same figure.
• 12:37What is supposed to show is that look,
• 12:39this is the point where the treatment effect
• 12:42is reduced by half,
• 12:44and this is about the same strength
• 12:47as location or birth alone.
• 12:50So, if you think your unmeasured confounder is in some sense
• 12:54as strong as the location or the year of birth,
• 12:58then it is possible that the treatment infact,
• 13:01is half of what it is estimated to be.
• 13:05Okay, so it's a pretty neat way
• 13:08to present a sensitivity analysis.
• 13:12So in this example, you see,
• 13:14there's three components of sensitivity analysis.
• 13:17First is model augmentation.
• 13:19And you need to expand the model used by primary analysis
• 13:24to allow for unmeasured confounding.
• 13:26Second, you need to do statistical inference.
• 13:30So you vary the sensitivity parameter,
• 13:32estimate the effect,
• 13:33and then control some statistical errors.
• 13:36So what they did
• 13:38is, it's they essentially varied lambda and delta,
• 13:42and they estimated the average treatment effect
• 13:45under that lambda and delta.
• 13:49And the third component is to interpret the results.
• 13:52So this paper relied on that calibration plot
• 13:56for that purpose.
• 13:58But this is often quite a tricky
• 14:01because the sensitivity analysis is complicated
• 14:05as we need to probe different directions
• 14:07of unmeasured confounding.
• 14:09So the interpretation is actually not always straightforward
• 14:14and sometimes can be quite complicated.
• 14:19There did you have there do exist two issues
• 14:23with this analysis.
• 14:26So this is just the model and rewriting it.
• 14:30The first issue is that actually the sensitivity parameters
• 14:33lambda and Dota,
• 14:34where we vary in a sensitivity analysis
• 14:38are identifiable from the observed data.
• 14:41This is because this is a perfect parametric model.
• 14:44And then it's not constructed in any way
• 14:47so that these lambda and delta are not identifiable.
• 14:51In fact, in the next slide,
• 14:53I'm going to show you some empirical evidence
• 14:55that you can actually estimate these two parameters.
• 14:59So, logically it is inconsistent for us
• 15:02to vary the sensitivity parameter.
• 15:05Because if we truly believe in this model
• 15:07and the data actually tell us what the values
• 15:09of lambda and delta is.
• 15:11So this is the similar criticism
• 15:13that for Hattman selection model, for example.
• 15:20The second issue is a bit subtle
• 15:23is that in a calibration plot,
• 15:25what they did is they use the partial R squared
• 15:27as a way to measure lambda and delta
• 15:33in a more interpretable way
• 15:36But actually the partial R squared for the observed
• 15:38and unobserved confounders are not directly comparable.
• 15:42This is because they're they use different reference model
• 15:48So, actually you need to be quite careful
• 15:50about these interpretation this calibration quotes.
• 15:56So, here is what I promised that suggests
• 16:01that you can actually identify
• 16:02these two sensitivity parameters lambda and delta.
• 16:06So here the red dots
• 16:08are the maximum likelihood estimators.
• 16:11And then these solid curves this regions,
• 16:14or the rejection,
• 16:16or I should say acceptance region
• 16:20for the likelihood ratio test.
• 16:23So this is at level 0.50,
• 16:26this is 0.10, this is 0.05.
• 16:30There is a symmetry around the origin that's
• 16:34because the U number is symmetric.
• 16:37So, lambda like delta is the same
• 16:41as minus lambda minus delta.
• 16:43But what you see
• 16:44is that you can actually estimate lambda and delta
• 16:47and you can sort of estimate it
• 16:50to be in a certain region.
• 16:53So, something a bit interesting here
• 16:56is that there's more you can say about Delta,
• 17:01which is the parameter for the outcome,
• 17:04than the parameter for the treatment lambda.
• 17:09But in any case,
• 17:11it didn't look like we can just vary
• 17:13this parameter lambda delta freely in this space
• 17:16and then expect to get different results
• 17:19for each each point.
• 17:23What we actually can get is some estimate
• 17:25of this sensitivity parameters.
• 17:28So the lesson here is that
• 17:30if you use a parametric sensitivity models,
• 17:32then they need to be carefully constructed
• 17:35to avoid these kind of issues.
• 17:40So next I'll talk about the first component
• 17:43of the sensitivity analysis,
• 17:44which is your sensitivity model.
• 17:48So very generally,
• 17:51if you think about what is the sensitivity model,
• 17:54is essentially it's a model for the full data F,
• 18:00that include some things that are not observed.
• 18:03So, what we are trying to do here
• 18:05is to infer the full data distribution
• 18:08from some observed data, O.
• 18:11So a sensitivity model is basically
• 18:14a family of distributions of the full data,
• 18:18is parameterized by two parameters theta and eta.
• 18:23So, I'm using eta to stand for the sensitivity parameters
• 18:27and theta is some other parameters
• 18:29that parameterize the distribution.
• 18:33So the sensitivity model needs to satisfy two properties.
• 18:38So first of all,
• 18:40if we set the sensitivity parameter eta to be equal to zero,
• 18:44then that should correspond to our primary analysis
• 18:48assuming no unmeasured confounders.
• 18:49So I call this augmentation.
• 18:51A second property is that given the value of the
• 18:56of this sensitivity prior to eta,
• 18:59then we can actually identify this parameters data
• 19:03from the observed data.
• 19:06So this is sort of a minimal assumption.
• 19:08Otherwise, this model is simply too rich,
• 19:12and so I call model identifiability.
• 19:15So the statistical problem in sensitivity analysis
• 19:18is that if I give you the value of eta
• 19:20or the range of eta,
• 19:23can you use observed data to make inference
• 19:26about some causal parameter that is a function
• 19:29of the theta and eta.
• 19:32Okay, so this is a very general abstraction
• 19:37of what we have seen in the previous example.
• 19:43But it's a bit too general.
• 19:45So let's make it slightly more concrete
• 19:49by understanding these observational equivalence causes.
• 19:55So essentially, what we're trying to do
• 19:58is we observe some data,
• 19:59but then we know there's an underlying full data
• 20:02some other observe.
• 20:05And instead of just modeling the observed data,
• 20:08we're modeling the full data set.
• 20:10So that makes our model quite rich,
• 20:14because we're modeling something that are all observed.
• 20:18For that purpose is useful to define this
• 20:21observationally equivalence relation
• 20:24between two full data distribution,
• 20:27which just means that their implied
• 20:30observed data distributions are exactly the same.
• 20:34So we write this as this approximate equal
• 20:39to this equivalence symbol.
• 20:43So then we can define the equivalence class
• 20:45of a distribution of a full data distribution,
• 20:48which are all the other full data distributions
• 20:51in this family that are observationally equivalent
• 20:55to that distribution.
• 20:58Then we can sort of classify these sensitivity models
• 21:02based on the behavior of these equivalence classes.
• 21:07So, what happened in the last example
• 21:10is that the full data distribution full data model
• 21:15is not rich enough.
• 21:16So these equivalence classes are just singleton's
• 21:20so can actually identify the sensitivity parameter eta
• 21:24from the observed data.
• 21:26So, this makes this model testable in some sense
• 21:31with the choice of sensitivity parameter testable,
• 21:35and this should generally be avoided in practice.
• 21:39Then there are the global sensitivity models
• 21:43where you can basically freely vary
• 21:46the sensitivity parameter eta.
• 21:48And for any eta you can always find the theta
• 21:51such that it is observational equivalent
• 21:54to where you started from.
• 21:57And then even nicer models the separable model
• 22:01where basically, this eta,
• 22:04the sensitivity parameter doesn't change
• 22:07the observation of the observed data distribution.
• 22:12So for any theta and eta,
• 22:14theta and eta is equivalent to theta and zero.
• 22:18So these are really nice models to work with.
• 22:22So understand the difference between global models
• 22:26and separable models.
• 22:28So basically, it's just that they have different shapes
• 22:34of the equivalence classes.
• 22:37So for separable models,
• 22:40these equivalence classes,
• 22:42needs to be perpendicular to the theta axis.
• 22:46But that's not needed for global sensitivity models.
• 22:53So I've talked about what a sensitivity model means
• 22:57and some basic properties of it,
• 23:00but haven't talked about how to build them.
• 23:02So generally, in this setup,
• 23:05there's three ways to build a sensitivity model.
• 23:08And then they essentially correspond
• 23:09with different factorizations
• 23:11of the full data distribution.
• 23:13So there's a simultaneous model
• 23:15that tries to factorize distribution this way.
• 23:19So introduces unmeasured confounder, U,
• 23:22and then you need to model
• 23:24these three conditional probabilities.
• 23:27There's also the treatment model
• 23:31that doesn't rely on this unmeasured confounder U.
• 23:35But whether you need to specify is the distribution
• 23:39of the treatment given the unmeasured cofounders and x.
• 23:44And once you've specified that you can use Bayes formula
• 23:46to get this part.
• 23:50And then there's the outcome model that factorizes
• 23:54this distribution in the other way.
• 23:57So this is basically the propensity score
• 24:00and the third turn is what we need to specify
• 24:03it's a sensitivity parameter.
• 24:06So in the missing data literature,
• 24:09second model kind of model
• 24:11is usually called selection model.
• 24:13And the third kind of models usually called
• 24:16pattern mixture model,
• 24:17and there are other names that have been given to it.
• 24:23And basically different sensitivity models,
• 24:26they amount to different ways of specifying these
• 24:31either non identifiable distributions,
• 24:33which are these ones that are underlined.
• 24:37A good review is this report by a committee
• 24:42organized by the National Research Council.
• 24:46This ongoing review paper that we're writing
• 24:50also gives a comprehensive review of many models
• 24:54that have been proposed using these factorizations.
• 25:00Okay, so that's about the sensitivity model.
• 25:03The next component is statistical inference.
• 25:11Things get a little bit tricky here,
• 25:14because there are two kinds of inference
• 25:17or two modes of inference we can talk about
• 25:19in this study.
• 25:21So, the first mode of inference is point identify inference.
• 25:24So you only care about a fixed value
• 25:27of the sensitivity parameter eta.
• 25:32And the second kind of inference
• 25:34is partial identified inference,
• 25:36where you perform the statistical inference simultaneously
• 25:40for a range of security parameters eta.
• 25:44And that range H is given to you.
• 25:50And in these different modes of inferences,
• 25:54it comes differences to core guarantees.
• 25:57So for point identified inference usually let's say
• 26:03for interval estimators,
• 26:04you want to construct confidence intervals.
• 26:08And these confidence intervals depend on the observed theta
• 26:12and the sensitivity parameter which
• 26:17in a point of identified inference
• 26:20and it must cover the true parameter
• 26:23with one minus alpha probability
• 26:25for all the distributions in your model.
• 26:28Okay that's the infimum.
• 26:30But for partial identified inference,
• 26:35you're only allowed to use an interval
• 26:38that depends on the range, H.
• 26:41So, it cannot depend on a specific values
• 26:43of the sensitivity parameter,
• 26:46because you only know eta is in this range H.
• 26:50It need to satisfy this very similar criteria.
• 26:56So I call this intervals that satisfy this criteria
• 26:59in the sensitivity interval.
• 27:01But in the literature people have also called this
• 27:03uncertainty interval and or just confidence interval.
• 27:08But to make it different from the first case,
• 27:11we're calling a sensitivity interval here.
• 27:15So you can see that these two equations,
• 27:19two criterias look very similar,
• 27:22besides just that this interval needs to depend on the range
• 27:25instead of a particular value of the sensitivity parameter.
• 27:29But actually, they're quite different.
• 27:31This is usually much wider.
• 27:34The reason is,
• 27:36you can actually write an equivalent form
• 27:37of this equation one,
• 27:40because this only depends on the observed data
• 27:45and the range H.
• 27:46Then for every theta in that,
• 27:49sorry for every eta in that range H,
• 27:52is missing here, eta in H and also
• 27:56that's observationally equivalent to a two distribution.
• 28:00This interval also needs to cover
• 28:02the corresponding theta parameter.
• 28:07So in that sense,
• 28:08this is a much stronger guarantee that you have.
• 28:16So, in terms of the statistical methods,
• 28:21point identified inference is usually quite straightforward.
• 28:26It's very similar to our primary analysis.
• 28:29So, primary analysis just assumes this eta equals to zero,
• 28:32but this sensitivity analysis assumes eta is known.
• 28:36So usually you just you can just plug in
• 28:38this eta in some way as an offset to your model.
• 28:42And then everything works out in almost the same way
• 28:45as a primary analysis.
• 28:48But for partially identified analysis,
• 28:50things become quite more challenging.
• 28:55And there are several methods several approaches
• 28:58that you can take.
• 29:00So, essentially there are two big classes of methods,
• 29:05one is bound estimation,
• 29:08one is combining point identified inference.
• 29:11So, for bond estimation,
• 29:14it tries to directly make inference about the two ends
• 29:18of this partial identify region.
• 29:21So, this set this is the region of the parameter beta
• 29:26that are sort of indistinguishable,
• 29:29if I only know this sensitivity parameter eta is in H.
• 29:35If we can somehow directly estimate the infimum and supremum
• 29:40of this in this set,
• 29:44but then that gotta get us a way
• 29:46to make partial identified inference.
• 29:50The second method is basically
• 29:53to try to combine the results of point identified inference.
• 29:59The main idea is to sort of construct
• 30:02let's say interval estimators,
• 30:05for each individual sensitivity parameter
• 30:08and then take a union of them.
• 30:11So, these are the two broad approaches
• 30:14to the partially identified inference.
• 30:18And so, within the first approach
• 30:20the bound estimation approach,
• 30:22there are also several variety of,
• 30:25there are several possible methods
• 30:29So, the first problem,
• 30:31the first method is called separable balance.
• 30:35But before that, let's just slightly change our notation
• 30:39and parameterize this range H by a hyper parameter gamma.
• 30:47So, this is useful when we outline these methods.
• 30:52And then this beta L of gamma,
• 30:55this is the lower end of the partial identify region.
• 31:01So the first method is called separable bounds.
• 31:06What it tries to do is to write this lower end
• 31:11as a function of beta star and gamma,
• 31:15where beta star is your primary analysis estimate.
• 31:21So let's say theta star zero
• 31:24is what you would do in a primary analysis
• 31:27that is observationally equivalent to the true distribution.
• 31:32And then, if beta star is the corresponding causal effect,
• 31:37from that model,
• 31:39and if somehow can write this lower end
• 31:42as a function of beta star and gamma
• 31:46and the function is known,
• 31:47then our life is quite easy,
• 31:50because we already know how to make inference
• 31:53about beta star from the primary analysis.
• 31:55And all we need to do is just plug in
• 31:57that beta star in this formula,
• 31:59and then we're all done.
• 32:02And we call this separable because it allows us
• 32:06to separate the primary analysis
• 32:09from the sensitivity analysis.
• 32:11And statistical inference becomes a trivial extension
• 32:15of the primary analysis.
• 32:17So, some examples of this kind of method
• 32:20include the classical cornfields bound
• 32:26and the E-value,
• 32:27if you have heard about them,
• 32:29and E-value seems quite popular
• 32:31these days at demonology.
• 32:37The second type of bound estimation
• 32:41is called tractable bounds.
• 32:45So, in these cases,
• 32:48we may derive this lower bound as a function
• 32:52of theta star and gamma.
• 32:54So we are not able to reduce it to just depend
• 32:58on beta star the causal effect
• 33:00under no unmeasured confounding,
• 33:04but we're able to express in terms of theta star.
• 33:07And then the function gl is also some practical functions
• 33:11that we can compute.
• 33:13And then this also makes our lives quite a lot easier,
• 33:17because we can just replace this theta star,
• 33:21which can be nonparametric can be parametric,
• 33:25by its empirical estimate.
• 33:28And, often in these cases,
• 33:31we can find some central limit theorems
• 33:35for the corresponding sample estimator,
• 33:38such that the sample estimator of the bounds
• 33:42converges to its truth at root and rate
• 33:46and it follows the normal limit.
• 33:51And then if we can estimate this standard error,
• 33:55then we can use this central limit theorem
• 33:58to make partial identified inference
• 34:02because we can estimate the bounds.
• 34:07There's some examples in the literature,
• 34:09you're familiar with these papers.
• 34:12But one thing to be careful about
• 34:14these kind of tractable bounds
• 34:16is that things that get a little bit tricky
• 34:21with syntactic theory.
• 34:24This is because in a syntactic theory,
• 34:27the confidence intervals or the sensitivity intervals
• 34:30in this case,
• 34:32can be point wise or uniform in terms of the sample size.
• 34:38So it's possible that if the convergence,
• 34:45if there are statistical guarantee is point wise,
• 34:49then you sometimes in extreme cases,
• 34:56even with very large sample size,
• 34:58they're still exist data distributions
• 35:01such that your coverage is very poor.
• 35:05So this point is discussed very heavily
• 35:08in econometrics literature.
• 35:10And these are some references.
• 35:15So that's the second type of method
• 35:18in the first broad approach.
• 35:22The third kind of method
• 35:25is called stochastic programming.
• 35:28And this applies when the model is separable.
• 35:34So and we can write this parameter we're interested in
• 35:40as some expectation of some function
• 35:43of the theta and the sensitivity parameter eta.
• 35:48Okay, so in this case,
• 35:51the bound becomes the optimal value
• 35:54for an optimization problem,
• 35:56which you want to minimize expectation of some function.
• 36:01And the parameter in this function is in some set
• 36:05as defined by U.
• 36:08So, this is known as stochastic programming.
• 36:11So, this type of problem is known as stochastic programming
• 36:14in the optimization literature.
• 36:17And what people do there
• 36:19is they sample from the distribution,
• 36:22and then they try to use it to solve the empirical version
• 36:26and try to use that as approximate solution
• 36:29to this population optimization problem,
• 36:33which we can't directly U value evaluate.
• 36:36And the method is called sample average approximation
• 36:39in the optimization literature.
• 36:42So, what is shown there.
• 36:47And Alex Shapiro did a lot of great work on this,
• 36:51is that nice problems with compact set age,
• 36:57and everything is euclidean.
• 36:59So it's finite dimensional.
• 37:01Then you actually have a central limit theorem
• 37:04for the sample optimal value.
• 37:12and stochastic programming is made in this paper
• 37:16by Tudball et al.
• 37:20Okay, so that's the first broad approach
• 37:23with doing bounds estimation.
• 37:26The second broad approach is to combine the results
• 37:29of points identified inference.
• 37:32So, the first possibility is to take a union
• 37:37of the individual confidence intervals.
• 37:40Suppose these are the confidence intervals
• 37:43when the sensitivity from eta is given.
• 37:47Then, it is very simple to just apply a union bound
• 37:51and to show that if you take a union
• 37:54of these individual confidence intervals,
• 37:57then they should satisfy the criteria
• 38:01for sensitivity interval.
• 38:03So now, if you take a union this interval only depends
• 38:07on the range H,
• 38:08and then you just apply the union bound
• 38:12and get this formula from the first.
• 38:17And this can be slightly improved
• 38:20to cover not just these parameters,
• 38:23but also the entire partial identified region
• 38:27if the intervals if the confidence intervals
• 38:30have the same tail probabilities.
• 38:35So we discussed this in our paper.
• 38:39And here, so, all we need to do
• 38:43is to compute this union.
• 38:46So, which essentially is an optimization problem
• 38:49we'd like to minimize the lower bound,
• 38:52that the lower confidence point Cl of eta over eta in H
• 38:59and similarly for the upper bound.
• 39:02And usually using of syntactic theory,
• 39:05we can get some normal base confidence
• 39:09intervals for each fixed eta.
• 39:12And then we just need to optimize
• 39:14this thing this confidence interval over eta.
• 39:20But for many problems this can be
• 39:22computationally challenging because the standard errors
• 39:26are usually quite complicated
• 39:30and it has some very nonlinear dependence
• 39:32on the parameter eta.
• 39:34So optimizing this can be tricky.
• 39:40This is where another method of percentile bootstrap method
• 39:44can greatly simplify the problem.
• 39:47It's proposed by this paper that we wrote,
• 39:53and what it does is instead of using
• 39:56the syntactic confidence interval for fixed eta,
• 40:01we use the percentile bootstrap interval.
• 40:04Where we take theta samples,
• 40:06and then you estimate the causal effect beta
• 40:11in each resample and then take quantiles.
• 40:15Okay, so if you use this confidence interval,
• 40:19then there is a general,
• 40:25generalized minimax inequality that allows us to construct
• 40:29this percentile bootstrap sensitivity interval.
• 40:33So what it does is this thing in the inside
• 40:37is just the union of these percentile construct
• 40:41intervals for fixed eta,
• 40:45taken over eta in H.
• 40:49And then this generalized minimax inequality
• 40:51allows us to interchange the infimum with quanto
• 40:57and the supremum of a quanto.
• 41:00Okay, so the infimum of a quanto
• 41:01is greater than equal to the quanto of infimum
• 41:05and that it's always true.
• 41:07So it's just a generalization
• 41:09of the familia minimax inequality.
• 41:13Now, if you look at this order interval,
• 41:16this is much easier to compute,
• 41:19because all it needs to do
• 41:20is you gather data resample,
• 41:25then you just need to repeat method 1.3.
• 41:29So just get the infimum of the point estimate
• 41:34for that resample and the supremum for that resample.
• 41:37Then you do this over many, many resamples
• 41:41and then you take the quantiles of the infimum,
• 41:44lower of the infimum and upper quantile of the supremum,
• 41:48and then you're done.
• 41:50And because this union sensitivity interval
• 41:53is always valid,
• 41:55if the individual confidence intervals are valid.
• 41:58So you almost got a very you got a free lunch
• 42:02in some sense,
• 42:03you don't need to show any heavy theory.
• 42:06All you need to show is that
• 42:08these percentile bootstrap intervals are valid
• 42:11for each fixed eta,
• 42:13which are much easier to establish in real problems.
• 42:23And this is sort of selfish,
• 42:25where I'd like to compare this idea
• 42:27with Efron's bootstrap,
• 42:29where what was found there
• 42:31is that you've got a point estimator,
• 42:35and then many times and then use bootstrap
• 42:38to get the confidence interval.
• 42:41For partially identified inference,
• 42:44you need to do a bit more.
• 42:46So for each resample you need
• 42:48to get extrema optimal estimator.
• 42:52Then the minimax inequality allows you just
• 42:55sort of transfer the intuition from the bootstrap,
• 43:00for bootstrap from point identification
• 43:02to partial identification.
• 43:08So the third approach in this,
• 43:11is a third method in this general approach
• 43:14is to take the supremum of key value.
• 43:15And this is used in Rosenbaum sensitivity analysis.
• 43:18If you're familiar with that.
• 43:22Essentially it's a hypothesis testing analog
• 43:24of the Union confidence interval method.
• 43:29What it does is that
• 43:30if you have individually valid P values for a fixed eta,
• 43:35then you just take the supremum of the P values
• 43:38over all the etas in this range.
• 43:41And that can be used for partially identified inference.
• 43:46So what Rosenbaum did,
• 43:49and Rosenbaum is really a pioneer in this area
• 43:52in the partially identify sensitivity analysis.
• 43:56So what he did was use randomization tests
• 43:59to construct these key values.
• 44:03So, this is usually done for matched observational studies
• 44:07and the inside of this line of work
• 44:12is that you can use these inequalities
• 44:16particularly Holley's inequality
• 44:19in probabilistic combinatorics
• 44:22to efficiently compute these supremum of the P values.
• 44:26So, usually what is done there is that
• 44:30the Holley's inequality gives you a way
• 44:32to upper bound the distribution of a that,
• 44:39to upper bound family of distributions
• 44:42in the stochastic dominance sense.
• 44:45So, that is used to get these supremum of the P values.
• 44:51And so, basically the idea is to use some theoretical tool
• 44:59to simplify the computation.
• 45:05Okay, so that's the statistical inference.
• 45:08The third part, the third component
• 45:10is interpretation of sensitivity analysis.
• 45:13And this is the area that we actually really need
• 45:17a lot of good work at the moment.
• 45:20So, overall, there are two good ideas that seem to work,
• 45:26that seem to improve the interpretation
• 45:28of sensitivity analysis.
• 45:30The first is sensitivity value,
• 45:32the second is the calibration using measured confounders.
• 45:36So the sensitivity value is basically
• 45:38the value of the sensitivity parameter
• 45:41or the hyper parameter,
• 45:48And in our motivating example,
• 45:51this is where the estimated average treatment effect
• 45:55is reduced by half an Rosenbaum sensitivity analysis
• 45:59if you are familiar with that.
• 46:01This is where, this is the value of the gamma
• 46:03in his model,
• 46:05where we can no longer reject the causal null hypothesis.
• 46:10So, this is can be seen as kind of an extension
• 46:14of the idea of a P value.
• 46:17So P value is used for primary analysis,
• 46:19so assuming no unmeasure confounding,
• 46:22and then for sensitivity analysis,
• 46:24you can use the sensitivity value to sort of sorry,
• 46:30that's the P value it basically measures
• 46:36your sort of false rejection is due to
• 46:39sort of random chance.
• 46:44But then what a sensitivity value does
• 46:46is measures how much sort of how sensitive your resources is
• 46:51in some sense, so, how much deviation
• 46:53from the unmeasured confounding it takes
• 46:58And for sensitivity value,
• 47:01there often exists a phase transition phenomenon
• 47:04for partially identified inference.
• 47:07This is because if you take your hyper parameter gamma
• 47:11to be very large,
• 47:13then essentially your partially identify region
• 47:17So, no matter how large your sample size is
• 47:20you can never reject null.
• 47:23So, this is sort of an interesting phenomenon
• 47:28and explained first discovered by Rosenbaum
• 47:32in this paper I wrote also clarified some problems
• 47:38some issues in both the phase transition.
• 47:44So, the second idea is the calibration
• 47:46using measured confounders.
• 47:49So, you have already seen an example
• 47:51in a motivating study.
• 47:54It's really a very necessary and practical solution
• 47:59to quantify the sensitivity,
• 48:01because it's not really very useful if you tell people,
• 48:05we are sensitive at gamma equals to two,
• 48:08what does that really mean?
• 48:09That depends on some mathematical model.
• 48:13But if we can somehow compare that
• 48:15with what we do observe,
• 48:18and we have,
• 48:20often the practitioners have some good sense
• 48:23about what are the important confounders and what are not.
• 48:27Then this really gives us a way to calibrate
• 48:31and strengthen the conclusions of a sensitivity analysis.
• 48:35But unfortunately, although there are some good heuristics
• 48:40they're often suffer from some subtle issues,
• 48:44like the ones that I described
• 48:46in the beginning of the talk.
• 48:49If you carefully parameterize your models
• 48:51this can become easier.
• 48:54And this recent paper sort of explored this
• 48:56in terms of linear models.
• 49:01But really there's not a unifying framework
• 49:04then you can cover more general cases
• 49:08and lots of work are needed.
• 49:11And when I was writing the slides,
• 49:13I thought maybe what we really need
• 49:15is to somehow build this calibration
• 49:18into the sensitivity model.
• 49:20Because currently our workflow is that
• 49:22we assume a sensitivity model,
• 49:24and we see where things get changed,
• 49:26and then we try to interpret those values
• 49:29where things get changed.
• 49:31But suppose if we somehow build that,
• 49:34if we left the range H eta to be defined
• 49:38in terms of this calibration.
• 49:40Perhaps gamma directly means some kind of comparisons
• 49:45that measured confounders this would solve some
• 49:49a lot of the issues.
• 49:50This is just a thought I came up
• 49:53when I was preparing for this talk.
• 49:56Okay, so to summarize,
• 49:59so there is number of messages,
• 50:01which I hope you can take home.
• 50:05There are three components of a sensitivity analysis.
• 50:08Model augmentations, statistical inference
• 50:11and the interpretation of sensitivity analysis.
• 50:14So sensitivity model is about parameterizing,
• 50:17the full data distribution.
• 50:19And that's basically about over parameterizing
• 50:23the observed data distribution.
• 50:25And you can understand these models
• 50:26by the observational equivalence classes.
• 50:30You can get different model augmentations
• 50:33by factorizing the distribution differently
• 50:35and specify different models
• 50:38for those that are on identifiable.
• 50:41And there's a difference between point identified inference
• 50:45and partially identified inference,
• 50:47and partially identified inference is usually much harder.
• 50:52And there are two general approaches
• 50:55for partially identified inference,
• 50:57bound estimation and combining point identified inference.
• 51:02For interpretation of sensitivity analysis,
• 51:05there seem to be two good ideas so far,
• 51:08to use the sensitivity value,
• 51:10and to calibrate that sensitivity value
• 51:13using measured confounders.
• 51:16But overall,
• 51:18I'd say this is still a very,
• 51:23this is still a very open area
• 51:26that a lot of work is needed.
• 51:28Even for this prototypical example
• 51:31that people have studied for decades,
• 51:33it seems there's still a lot of questions
• 51:36that are unresolved.
• 51:38And there are methods that need to be developed
• 51:41for this sensitivity analysis
• 51:45to be regularly used in practice.
• 51:48And then there are many other related problems
• 51:51in missing data in causal inference
• 51:54that need to see more developments of sensitivity analysis.
• 51:59So that's the end of my talk.
• 52:01And there are some references that are used.
• 52:05I'm happy to take any questions.
• 52:08Still have about four minutes left.
• 52:11- Thank you.
• 52:13That yeah, thank you.
• 52:14Thank you, I'm sorry I couldn't introduce you earlier,
• 52:17but my connection but it did not to work.
• 52:21So we have time for a couple of questions.
• 52:26You can write the question in the chat box,
• 52:29or just unmute yourselves.
• 52:43Any questions?
• 53:00This was a great connection between I think,
• 53:04sensitivity analysis literature
• 53:06and the missing data literature.
• 53:09Which I think it's kind of overlooked.
• 53:12Even when you when you run a prometric sensitivity analysis,
• 53:17it's really something, like most of the times
• 53:20people really don't understand
• 53:22how much information is given.
• 53:25Like, how much information the model actually gives
• 53:29on the sensitivity parameters.
• 53:32And as you said,
• 53:34like it's kind of inconsistent
• 53:36to set the sensitivity parameters
• 53:37when sensitivity parameters are actually identified
• 53:40by the model.
• 53:43So I think like my I guess a question of like,
• 53:46clarifying question is,
• 53:49you mentioned there is this there this testable models,
• 53:54this testable models essentially are wherein
• 53:56the sensitivity model is such that
• 54:00the sensitivity barometer are actually point identified.
• 54:04Right?
• 54:04- Yes.
• 54:05So it re, so you said,
• 54:08you reshooting use the sensitivity analysis
• 54:11to actually to set the parameters
• 54:14if the sensitivity parameters
• 54:16are actually identified model.
• 54:18- Yeah.
• 54:19- Is that what you're trying?
• 54:21All right, so and. - Yes, yeah.
• 54:23Basically what happened there is the model is too specific,
• 54:27and it wasn't constructed carefully.
• 54:30So it's possible to construct parametric models
• 54:33that are not testable that are perfectly fine.
• 54:37But sometimes, if you just sort of
• 54:40write down the most natural model,
• 54:42if it just extend what the parametric model
• 54:46you used for observed data to also model full data,
• 54:52then you don't do it carefully,
• 54:54then the entire full data distribution becomes identifiable.
• 55:00So it does makes sense to treat those parameters
• 55:02as sensitivity parameters.
• 55:05So this kind of is a reminiscent of the discussion
• 55:08in the 80s about the Hackmann selection model.
• 55:12Because in that case,
• 55:14there was also sir Hackmann has this great selection model
• 55:18for reducing or getting rid of selection bias,
• 55:23but it's based on very heavy parametric assumptions.
• 55:27And you can adapt certainly identify the selection effect
• 55:32directly from the model where you actually have no data
• 55:36to support that identification.
• 55:40Which led to some criticisms in the 80s.
• 55:45But I think we are seeing this things repeatedly
• 55:51again and again in different areas.
• 55:55And it's, I think it's fine
• 55:59to use the power metric models that are testable, actually,
• 56:05if you really believe in those models,
• 56:07but it doesn't seem that they should be used
• 56:09this sensitivity analysis,
• 56:12because just logically,
• 56:13it's a bit strange.
• 56:15It's hard to interpret those models.
• 56:20And but sometimes I've also seen people
• 56:24who use the sort of parameterize the model
• 56:28in a way that you include enough terms.
• 56:31So the sensitivity parameters are weakly identified
• 56:35in a practical example.
• 56:38So with a practical data set of maybe the likelihood test,
• 56:44Likelihood Ratio Test rejection region,
• 56:46that acceptance region is very, very large.
• 56:50So there are a suggestions like that,
• 56:53that kind of it's a sort of a compromise
• 56:58for good practice.
• 57:02- Right in that case you gave it either set the parameters
• 57:06and drag the causal effects,
• 57:09or kind of treat that as a partial identification problem
• 57:13and just write use bounds or the methods
• 57:17you were mentioning, I guess.
• 57:20- Yeah.
• 57:21- Yep, thanks.
• 57:26Other questions?
• 57:34Well I guess you can read the question?
• 57:37- It's a question from Kiel Sint.
• 57:41Sorry if I didn't pronounce your name correctly.
• 57:43"In the applications of observational studies ideally,
• 57:46what confounders should be collected
• 57:47for sensitivity analysis,
• 57:49power sensitivity analysis for unmeasured confounding?"
• 57:54Thank you.
• 57:54So if I understand your question correctly,
• 58:01basically what sensitivity analysis does
• 58:04is you have observational study,
• 58:06where you for already collected confounders
• 58:10that you believe are important or relevant
• 58:13that really that are real confounders,
• 58:16that they change the causal unchanged the treatment
• 58:20and the outcome.
• 58:22But often that's not enough.
• 58:25And what sensitivity analysis does is it tries to say,
• 58:29"based on what the components already
• 58:34what if there is still something missing
• 58:37that we didn't collect?
• 58:39And then if those things behave in a certain way,
• 58:44does that change our results?"
• 58:47So, I guess sensitivity analysis is always relative
• 58:52to a primary analysis.
• 58:54So I think you should use the same set of confounders
• 58:58that the primary analysis uses.
• 59:02I don't see a lot of reasons to vary to say
• 59:10use a primary analysis with more confounders,
• 59:14but a sensitivity analysis with fewer confounders.
• 59:21Sensitivity analysis is really a supplement
• 59:23to what you have in the primary analysis.
• 59:35- Just one more question if we have?
• 59:40There not.
• 59:41Yes.
• 59:43- So from Ching Hou Soo,
• 59:45"How to specify the setup sensitivity parameter gamma
• 59:49in the real life question?
• 59:51When gamma is too large the inference results
• 59:54will always be non informative?"
• 59:57Yes, this is always a tricky problem any,
• 01:00:01and essentially the sensitivity values kind of
• 01:00:06trying to get past that.
• 01:00:09So it tries to directly look at the value
• 01:00:11of this sensitivity parameter that changes your conclusion.
• 01:00:15So in some sense, you don't need to specify
• 01:00:19a parameter a priori.
• 01:00:21But obviously, in the end of the day,
• 01:00:25we need some clue about what value of sensitivity parameter
• 01:00:30is considered large.
• 01:00:31In a practical sense, in this application.
• 01:00:36That's something this calibration clause
• 01:00:39this calibration analysis is trying to address.
• 01:00:44But as I said,
• 01:00:44they're not perfect at the moment.
• 01:00:47So for some time, now, at the least,
• 01:00:52we'll have to sort of live through this and
• 01:00:56or will either need to understand really
• 01:01:01what the sensitivity model means,
• 01:01:02and then use your domain knowledge
• 01:01:06to set the sensitivity parameter,
• 01:01:10or we have to use these rely on these
• 01:01:15imperfect visualization tools to calibrate analysis.
• 01:01:28- Yeah, all right.
• 01:01:29Thank you.
• 01:01:30I think we need to wrap up we've run over time.
• 01:01:33So thank you again Qingyuan,
• 01:01:36for sharing your work with us.
• 01:01:38And thank you, everyone for joining.
• 01:01:41Thank you.
• 01:01:42Bye bye.
• 01:01:43See you next week.
• 01:01:44- It's a great pleasure.
• 01:01:45Thank you.