YSPH Biostatistics Seminar: "Sensitivity Analysis for Observational Studies"

September 10, 2020

Information

Qingyuan Zhao, University of Cambridge

ID5568

To CiteDCA Citation Guide

00:00- Seminar, so hello everyone.
00:03My name is Qingyuan Zhao,
00:05I'm currently a University Lecturer in Statistics
00:10in University of Cambridge.
00:13I visited Yale Biostats,
00:15briefly last year in February.
00:21And so it's nice to see every guest very shortly this time.
00:28And today I'll talk
00:30about sensitivity analysis for observational studies,
00:34looking back and moving forward.
00:37So this is based on ongoing work
00:39with several people Bo Zhang, Ting Ye and Dylan Small
00:45at University of Pennsylvania,
00:46and also Joe Hogan at Brown University.
00:52So sensitivity analysis is really a very broad term
00:59and you can find in almost any area
01:02that uses mathematical models.
01:06So, broadly speaking,
01:08what it tries to do is it studies how the uncertainty
01:12in the input of a mathematical model or system,
01:17numerical or otherwise can be apportioned
01:20to different sources of uncertainty in it's input.
01:24So it's an extremely broad concept.
01:27And you can even fit statistics as part
01:30of a sensitivity analysis in some sense.
01:35But here, there can be a lot of kinds of model inputs.
01:41So, in particular,
01:43it can be any factor that can be changed in a model
01:47prior to its execution.
01:50So one example is structural
01:53or epistemic sources of uncertainty.
01:57And this is sort of the things we'll talk about.
02:01So basically, what our talk about today
02:03is those things that we don't really know.
02:07I mean, we made a lot of assumptions
02:09about when proposing such a model.
02:13So in the context of observational studies,
02:16a very common and typical question
02:20that requires sensitivity analysis is the following.
02:24How do the qualitative and or the quantitative conclusions
02:29of the observational study change
02:31if the no unmeasured confounding assumption is violated?
02:35So this is really common because essentially,
02:39in the vast majority of observational studies,
02:42it's essential to assume this
02:45no unmeasured confounding assumption,
02:47and this is an assumption that we cannot test
02:50with empirical data,
02:52at least with just observational data.
02:55So any, if you do any observational studies,
02:59so you're almost bound to be asked this question
03:02that, what if this assumption doesn't hold?
03:06And I'd like to point out that this question
03:08is fundamentally connected to missing not at random
03:12in the missing data literature.
03:14So what I will do today is I'll focus
03:16on sensitivity analysis for observational studies,
03:20but a lot of the ideas are drawn
03:22from the missing data literature.
03:24And most of the ideas that I'll talk about
03:28today can be also applied there
03:30and to related problems as well.
03:35So, currently, a state of the art of sensitivity analysis
03:40for observational studies is the following.
03:43There are many, many masters gazillions of methods
03:47of exaggeration, but certainly many many methods
03:50that are specifically designed for different
03:54kinds of sensitivity analysis.
03:58It often also depends on how you analyze your data
04:03under unmeasured confounding assumption.
04:06There are various forms of statistical guarantees
04:09that have been proposed.
04:11And oftentimes, these methods are not always
04:15straightforward to interpret,
04:17at least for inexperienced researchers,
04:20it can be quite complicated and confusing.
04:26The goal of this talk is to give you a high level overview.
04:31So this is not a talk where I'm gonna unveil
04:34a lot of new methods.
04:36This is more of an overview kind of talk
04:40that just to try to go through
04:42some of the main ideas in this area.
04:46So in particular,
04:47what I wanted to address is the following two questions.
04:52What is the common structure behind
04:54all these sensitivity analysis methods?
04:57And what are some good principles and ideas we should follow
05:02and perhaps extend when we have similar problems?
05:06The perspective of this talk will be global and frequentist.
05:10By that, I mean,
05:12there's an area in sensitivity analysis
05:14called local sensitivity analysis,
05:16where you're only allowed to move your parameter
05:19near its maximum likelihood estimate, usually.
05:25But global sensitivity analysis refer to the method
05:29that you can model your sensitivity parameter
05:31freely in a space.
05:35So that's what we'll focus on today.
05:38And also, I'll take a frequentist perspective.
05:40So I won't talk about Bayesian sensitivity analysis,
05:44which is also a big area.
05:46And I'll use this portal typical setup
05:50in observational studies,
05:52where you have iid copies of these observed data O,
05:56which has three parts, x is the covariance,
06:00A the binary treatment, Y is the outcome
06:04and these observed observed data
06:06that come from underlying full data, F,
06:10which includes X and A
06:13and the potential outcomes, Y(0) and Y(1).
06:17Okay, so this is,
06:19if you haven't, if most of you probably have seen this
06:21many, many times already,
06:24but if you haven't seen that this
06:25is the most typical setup in observational studies.
06:29And it kind of gets a little bit boring
06:30when you see it so many times.
06:32But what we're trying to do
06:34is to use this as the simplest example,
06:37to demonstrate the structure and ideas.
06:41And hopefully, if you understand these good ideas,
06:46you can apply them to your problems
06:50that are maybe slightly more complicated than this.
06:55So here's the outline
06:57and I'll give a motivating example
06:59then I'll talk about three components
07:01in the sensitivity analysis.
07:03There the sensitivity model,
07:04the statistical inference and the interpretation.
07:10So the motivating example will sort of demonstrate
07:13where these three components come from.
07:16So this example is in the social sciences actually
07:21it's about child soldiering,
07:24a paper by Blattman and Annan, 2010.
07:30On the review of economics and statistics,
07:34so what they studied is this period of time in Uganda,
07:41from 1995 to 2004,
07:44where there was a civil war
07:46and about 60,000 to 80,000 youth
07:49were abducted by a rebel force.
07:53So the question is,
07:54what is the impact of child soldiering
07:58sort of this abduction by the rebel force,
08:01as on various outcomes,
08:04such as years of education,
08:08and in this paper to actually study the number of outcomes.
08:13The authors controlled for a variety of baseline covariates,
08:17like the children's age, their household size,
08:20their parental education, et cetera.
08:23They were quite concerned about
08:26this possible unmeasured confounder.
08:28That is the child's ability to hide from the rebel.
08:33So it's possible that maybe if this child is smart,
08:39and if he knows that he or she knows
08:41how to hide from the rebel,
08:44then he's less likely to be abducted
08:49to be in this data set.
08:52And he'll probably also be more likely
08:55to receive longer education just because maybe
09:00the skin is a bit more small, let's say.
09:06So in their analysis,
09:07they follow the model proposed by Imbens,
09:11which is the following.
09:12So basically, they assume this no unmeasured confounding
09:18after you conditional on this unmeasured confounder U.
09:22Okay, so X are all covariates
09:24that U controlled for,
09:26and U is they assumed is a binary, unmeasured confounder.
09:32That's just a coin flip.
09:36And then they assume the logistic model
09:39for the probability of being abducted
09:44and the normal linear model for the potential outcomes.
09:49So notice that here the linear these terms
09:55depends on not only the observed covariance,
09:58but also the unmeasured covariates U.
10:01And of course,
10:02we don't measure this U.
10:04So we cannot directly fit these models.
10:09But what they did is they because they made
10:12some distribution assumptions on U,
10:16you can treat U as unmeasured variable.
10:19And then, for example,
10:21fit maximum likelihood estimate.
10:25So they're treated this two parameters lambda and delta,
10:29as sensitivity parameters.
10:32So these are the parameters that you vary
10:35in a sensitivity analysis.
10:37So when they're both equal to zero,
10:39that means that there is no unmeasured confounding.
10:43So you can actually just ignore this confounder U.
10:46So it corresponds to your primary analysis,
10:48but in a sensitivity analysis,
10:50you change the values of lambda and U
10:53and you see how that changes your result
10:55above this parameter beta,
10:57which is interpreted as a causal effect.
11:02Okay, so the results can be summarized in this one slide.
11:06I mean they've done a lot more definitely.
11:08But for the purpose of this talk, basically,
11:12what they found is that the primary analysis
11:15found that the average treatment effect is -0.76.
11:19So remember the outcome was years of education.
11:21So being abducted,
11:23has a significant negative effect on education.
11:30And then it did a sensitivity analysis,
11:32which can be summarized in this calibration plot.
11:36What is shown here is that these two axis
11:40are basically the two sensitivity parameters,
11:43lambda and delta.
11:45So what the paper did is they transform it
11:48to the increase in R-squared.
11:51But that's that can be mapped to lambda and delta,
11:56and then they compared
11:59this curve, so this dashed curve
12:03is where the values of lambda and delta such that
12:07the treatment in fact is reduced by half.
12:11And then they compare this curve
12:13with all the measured confounders,
12:15like year and a location,
12:17location of birth, year of birth, et cetera.
12:21And then you compare it with the corresponding coefficients
12:25of those variables in the model
12:31and then they just plot these in the same figure.
12:37What is supposed to show is that look,
12:39this is the point where the treatment effect
12:42is reduced by half,
12:44and this is about the same strength
12:47as location or birth alone.
12:50So, if you think your unmeasured confounder is in some sense
12:54as strong as the location or the year of birth,
12:58then it is possible that the treatment infact,
13:01is half of what it is estimated to be.
13:05Okay, so it's a pretty neat way
13:08to present a sensitivity analysis.
13:12So in this example, you see,
13:14there's three components of sensitivity analysis.
13:17First is model augmentation.
13:19And you need to expand the model used by primary analysis
13:24to allow for unmeasured confounding.
13:26Second, you need to do statistical inference.
13:30So you vary the sensitivity parameter,
13:32estimate the effect,
13:33and then control some statistical errors.
13:36So what they did
13:38is, it's they essentially varied lambda and delta,
13:42and they estimated the average treatment effect
13:45under that lambda and delta.
13:49And the third component is to interpret the results.
13:52So this paper relied on that calibration plot
13:56for that purpose.
13:58But this is often quite a tricky
14:01because the sensitivity analysis is complicated
14:05as we need to probe different directions
14:07of unmeasured confounding.
14:09So the interpretation is actually not always straightforward
14:14and sometimes can be quite complicated.
14:19There did you have there do exist two issues
14:23with this analysis.
14:26So this is just the model and rewriting it.
14:30The first issue is that actually the sensitivity parameters
14:33lambda and Dota,
14:34where we vary in a sensitivity analysis
14:38are identifiable from the observed data.
14:41This is because this is a perfect parametric model.
14:44And then it's not constructed in any way
14:47so that these lambda and delta are not identifiable.
14:51In fact, in the next slide,
14:53I'm going to show you some empirical evidence
14:55that you can actually estimate these two parameters.
14:59So, logically it is inconsistent for us
15:02to vary the sensitivity parameter.
15:05Because if we truly believe in this model
15:07and the data actually tell us what the values
15:09of lambda and delta is.
15:11So this is the similar criticism
15:13that for Hattman selection model, for example.
15:20The second issue is a bit subtle
15:23is that in a calibration plot,
15:25what they did is they use the partial R squared
15:27as a way to measure lambda and delta
15:33in a more interpretable way
15:36But actually the partial R squared for the observed
15:38and unobserved confounders are not directly comparable.
15:42This is because they're they use different reference model
15:46to start with.
15:48So, actually you need to be quite careful
15:50about these interpretation this calibration quotes.
15:56So, here is what I promised that suggests
16:01that you can actually identify
16:02these two sensitivity parameters lambda and delta.
16:06So here the red dots
16:08are the maximum likelihood estimators.
16:11And then these solid curves this regions,
16:14or the rejection,
16:16or I should say acceptance region
16:20for the likelihood ratio test.
16:23So this is at level 0.50,
16:26this is 0.10, this is 0.05.
16:30There is a symmetry around the origin that's
16:34because the U number is symmetric.
16:37So, lambda like delta is the same
16:41as minus lambda minus delta.
16:43But what you see
16:44is that you can actually estimate lambda and delta
16:47and you can sort of estimate it
16:50to be in a certain region.
16:53So, something a bit interesting here
16:56is that there's more you can say about Delta,
17:01which is the parameter for the outcome,
17:04than the parameter for the treatment lambda.
17:09But in any case,
17:11it didn't look like we can just vary
17:13this parameter lambda delta freely in this space
17:16and then expect to get different results
17:19for each each point.
17:23What we actually can get is some estimate
17:25of this sensitivity parameters.
17:28So the lesson here is that
17:30if you use a parametric sensitivity models,
17:32then they need to be carefully constructed
17:35to avoid these kind of issues.
17:40So next I'll talk about the first component
17:43of the sensitivity analysis,
17:44which is your sensitivity model.
17:48So very generally,
17:51if you think about what is the sensitivity model,
17:54is essentially it's a model for the full data F,
18:00that include some things that are not observed.
18:03So, what we are trying to do here
18:05is to infer the full data distribution
18:08from some observed data, O.
18:11So a sensitivity model is basically
18:14a family of distributions of the full data,
18:18is parameterized by two parameters theta and eta.
18:23So, I'm using eta to stand for the sensitivity parameters
18:27and theta is some other parameters
18:29that parameterize the distribution.
18:33So the sensitivity model needs to satisfy two properties.
18:38So first of all,
18:40if we set the sensitivity parameter eta to be equal to zero,
18:44then that should correspond to our primary analysis
18:48assuming no unmeasured confounders.
18:49So I call this augmentation.
18:51A second property is that given the value of the
18:56of this sensitivity prior to eta,
18:59then we can actually identify this parameters data
19:03from the observed data.
19:06So this is sort of a minimal assumption.
19:08Otherwise, this model is simply too rich,
19:12and so I call model identifiability.
19:15So the statistical problem in sensitivity analysis
19:18is that if I give you the value of eta
19:20or the range of eta,
19:23can you use observed data to make inference
19:26about some causal parameter that is a function
19:29of the theta and eta.
19:32Okay, so this is a very general abstraction
19:37of what we have seen in the previous example.
19:43But it's a bit too general.
19:45So let's make it slightly more concrete
19:49by understanding these observational equivalence causes.
19:55So essentially, what we're trying to do
19:58is we observe some data,
19:59but then we know there's an underlying full data
20:02some other observe.
20:05And instead of just modeling the observed data,
20:08we're modeling the full data set.
20:10So that makes our model quite rich,
20:14because we're modeling something that are all observed.
20:18For that purpose is useful to define this
20:21observationally equivalence relation
20:24between two full data distribution,
20:27which just means that their implied
20:30observed data distributions are exactly the same.
20:34So we write this as this approximate equal
20:39to this equivalence symbol.
20:43So then we can define the equivalence class
20:45of a distribution of a full data distribution,
20:48which are all the other full data distributions
20:51in this family that are observationally equivalent
20:55to that distribution.
20:58Then we can sort of classify these sensitivity models
21:02based on the behavior of these equivalence classes.
21:07So, what happened in the last example
21:10is that the full data distribution full data model
21:15is not rich enough.
21:16So these equivalence classes are just singleton's
21:20so can actually identify the sensitivity parameter eta
21:24from the observed data.
21:26So, this makes this model testable in some sense
21:31with the choice of sensitivity parameter testable,
21:35and this should generally be avoided in practice.
21:39Then there are the global sensitivity models
21:43where you can basically freely vary
21:46the sensitivity parameter eta.
21:48And for any eta you can always find the theta
21:51such that it is observational equivalent
21:54to where you started from.
21:57And then even nicer models the separable model
22:01where basically, this eta,
22:04the sensitivity parameter doesn't change
22:07the observation of the observed data distribution.
22:12So for any theta and eta,
22:14theta and eta is equivalent to theta and zero.
22:18So these are really nice models to work with.
22:22So understand the difference between global models
22:26and separable models.
22:28So basically, it's just that they have different shapes
22:34of the equivalence classes.
22:37So for separable models,
22:40these equivalence classes,
22:42needs to be perpendicular to the theta axis.
22:46But that's not needed for global sensitivity models.
22:53So I've talked about what a sensitivity model means
22:57and some basic properties of it,
23:00but haven't talked about how to build them.
23:02So generally, in this setup,
23:05there's three ways to build a sensitivity model.
23:08And then they essentially correspond
23:09with different factorizations
23:11of the full data distribution.
23:13So there's a simultaneous model
23:15that tries to factorize distribution this way.
23:19So introduces unmeasured confounder, U,
23:22and then you need to model
23:24these three conditional probabilities.
23:27There's also the treatment model
23:31that doesn't rely on this unmeasured confounder U.
23:35But whether you need to specify is the distribution
23:39of the treatment given the unmeasured cofounders and x.
23:44And once you've specified that you can use Bayes formula
23:46to get this part.
23:50And then there's the outcome model that factorizes
23:54this distribution in the other way.
23:57So this is basically the propensity score
24:00and the third turn is what we need to specify
24:03it's a sensitivity parameter.
24:06So in the missing data literature,
24:09second model kind of model
24:11is usually called selection model.
24:13And the third kind of models usually called
24:16pattern mixture model,
24:17and there are other names that have been given to it.
24:23And basically different sensitivity models,
24:26they amount to different ways of specifying these
24:31either non identifiable distributions,
24:33which are these ones that are underlined.
24:37A good review is this report by a committee
24:42organized by the National Research Council.
24:46This ongoing review paper that we're writing
24:50also gives a comprehensive review of many models
24:54that have been proposed using these factorizations.
25:00Okay, so that's about the sensitivity model.
25:03The next component is statistical inference.
25:11Things get a little bit tricky here,
25:14because there are two kinds of inference
25:17or two modes of inference we can talk about
25:19in this study.
25:21So, the first mode of inference is point identify inference.
25:24So you only care about a fixed value
25:27of the sensitivity parameter eta.
25:32And the second kind of inference
25:34is partial identified inference,
25:36where you perform the statistical inference simultaneously
25:40for a range of security parameters eta.
25:44And that range H is given to you.
25:50And in these different modes of inferences,
25:54it comes differences to core guarantees.
25:57So for point identified inference usually let's say
26:03for interval estimators,
26:04you want to construct confidence intervals.
26:08And these confidence intervals depend on the observed theta
26:12and the sensitivity parameter which
26:15your last to use
26:17in a point of identified inference
26:20and it must cover the true parameter
26:23with one minus alpha probability
26:25for all the distributions in your model.
26:28Okay that's the infimum.
26:30But for partial identified inference,
26:35you're only allowed to use an interval
26:38that depends on the range, H.
26:41So, it cannot depend on a specific values
26:43of the sensitivity parameter,
26:46because you only know eta is in this range H.
26:50It need to satisfy this very similar criteria.
26:56So I call this intervals that satisfy this criteria
26:59in the sensitivity interval.
27:01But in the literature people have also called this
27:03uncertainty interval and or just confidence interval.
27:08But to make it different from the first case,
27:11we're calling a sensitivity interval here.
27:15So you can see that these two equations,
27:19two criterias look very similar,
27:22besides just that this interval needs to depend on the range
27:25instead of a particular value of the sensitivity parameter.
27:29But actually, they're quite different.
27:31This is usually much wider.
27:34The reason is,
27:36you can actually write an equivalent form
27:37of this equation one,
27:40because this only depends on the observed data
27:45and the range H.
27:46Then for every theta in that,
27:49sorry for every eta in that range H,
27:52is missing here, eta in H and also
27:56that's observationally equivalent to a two distribution.
28:00This interval also needs to cover
28:02the corresponding theta parameter.
28:07So in that sense,
28:08this is a much stronger guarantee that you have.
28:16So, in terms of the statistical methods,
28:21point identified inference is usually quite straightforward.
28:26It's very similar to our primary analysis.
28:29So, primary analysis just assumes this eta equals to zero,
28:32but this sensitivity analysis assumes eta is known.
28:36So usually you just you can just plug in
28:38this eta in some way as an offset to your model.
28:42And then everything works out in almost the same way
28:45as a primary analysis.
28:48But for partially identified analysis,
28:50things become quite more challenging.
28:55And there are several methods several approaches
28:58that you can take.
29:00So, essentially there are two big classes of methods,
29:05one is bound estimation,
29:08one is combining point identified inference.
29:11So, for bond estimation,
29:14it tries to directly make inference about the two ends
29:18of this partial identify region.
29:21So, this set this is the region of the parameter beta
29:26that are sort of indistinguishable,
29:29if I only know this sensitivity parameter eta is in H.
29:35If we can somehow directly estimate the infimum and supremum
29:40of this in this set,
29:44but then that gotta get us a way
29:46to make partial identified inference.
29:50The second method is basically
29:53to try to combine the results of point identified inference.
29:59The main idea is to sort of construct
30:02let's say interval estimators,
30:05for each individual sensitivity parameter
30:08and then take a union of them.
30:11So, these are the two broad approaches
30:14to the partially identified inference.
30:18And so, within the first approach
30:20the bound estimation approach,
30:22there are also several variety of,
30:25there are several possible methods
30:27depending on your problem.
30:29So, the first problem,
30:31the first method is called separable balance.
30:35But before that, let's just slightly change our notation
30:39and parameterize this range H by a hyper parameter gamma.
30:47So, this is useful when we outline these methods.
30:52And then this beta L of gamma,
30:55this is the lower end of the partial identify region.
31:01So the first method is called separable bounds.
31:06What it tries to do is to write this lower end
31:11as a function of beta star and gamma,
31:15where beta star is your primary analysis estimate.
31:21So let's say theta star zero
31:24is what you would do in a primary analysis
31:27that is observationally equivalent to the true distribution.
31:32And then, if beta star is the corresponding causal effect,
31:37from that model,
31:39and if somehow can write this lower end
31:42as a function of beta star and gamma
31:46and the function is known,
31:47then our life is quite easy,
31:50because we already know how to make inference
31:53about beta star from the primary analysis.
31:55And all we need to do is just plug in
31:57that beta star in this formula,
31:59and then we're all done.
32:02And we call this separable because it allows us
32:06to separate the primary analysis
32:09from the sensitivity analysis.
32:11And statistical inference becomes a trivial extension
32:15of the primary analysis.
32:17So, some examples of this kind of method
32:20include the classical cornfields bound
32:26and the E-value,
32:27if you have heard about them,
32:29and E-value seems quite popular
32:31these days at demonology.
32:37The second type of bound estimation
32:41is called tractable bounds.
32:45So, in these cases,
32:48we may derive this lower bound as a function
32:52of theta star and gamma.
32:54So we are not able to reduce it to just depend
32:58on beta star the causal effect
33:00under no unmeasured confounding,
33:04but we're able to express in terms of theta star.
33:07And then the function gl is also some practical functions
33:11that we can compute.
33:13And then this also makes our lives quite a lot easier,
33:17because we can just replace this theta star,
33:21which can be nonparametric can be parametric,
33:25by its empirical estimate.
33:28And, often in these cases,
33:31we can find some central limit theorems
33:35for the corresponding sample estimator,
33:38such that the sample estimator of the bounds
33:42converges to its truth at root and rate
33:46and it follows the normal limit.
33:51And then if we can estimate this standard error,
33:55then we can use this central limit theorem
33:58to make partial identified inference
34:02because we can estimate the bounds.
34:07There's some examples in the literature,
34:09you're familiar with these papers.
34:12But one thing to be careful about
34:14these kind of tractable bounds
34:16is that things that get a little bit tricky
34:21with syntactic theory.
34:24This is because in a syntactic theory,
34:27the confidence intervals or the sensitivity intervals
34:30in this case,
34:32can be point wise or uniform in terms of the sample size.
34:38So it's possible that if the convergence,
34:45if there are statistical guarantee is point wise,
34:49then you sometimes in extreme cases,
34:56even with very large sample size,
34:58they're still exist data distributions
35:01such that your coverage is very poor.
35:05So this point is discussed very heavily
35:08in econometrics literature.
35:10And these are some references.
35:15So that's the second type of method
35:18in the first broad approach.
35:22The third kind of method
35:25is called stochastic programming.
35:28And this applies when the model is separable.
35:34So and we can write this parameter we're interested in
35:40as some expectation of some function
35:43of the theta and the sensitivity parameter eta.
35:48Okay, so in this case,
35:51the bound becomes the optimal value
35:54for an optimization problem,
35:56which you want to minimize expectation of some function.
36:01And the parameter in this function is in some set
36:05as defined by U.
36:08So, this is known as stochastic programming.
36:11So, this type of problem is known as stochastic programming
36:14in the optimization literature.
36:17And what people do there
36:19is they sample from the distribution,
36:22and then they try to use it to solve the empirical version
36:26and try to use that as approximate solution
36:29to this population optimization problem,
36:33which we can't directly U value evaluate.
36:36And the method is called sample average approximation
36:39in the optimization literature.
36:42So, what is shown there.
36:47And Alex Shapiro did a lot of great work on this,
36:51is that nice problems with compact set age,
36:57and everything is euclidean.
36:59So it's finite dimensional.
37:01Then you actually have a central limit theorem
37:04for the sample optimal value.
37:07And this link, is a link between sensitivity analysis
37:12and stochastic programming is made in this paper
37:16by Tudball et al.
37:20Okay, so that's the first broad approach
37:23with doing bounds estimation.
37:26The second broad approach is to combine the results
37:29of points identified inference.
37:32So, the first possibility is to take a union
37:37of the individual confidence intervals.
37:40Suppose these are the confidence intervals
37:43when the sensitivity from eta is given.
37:47Then, it is very simple to just apply a union bound
37:51and to show that if you take a union
37:54of these individual confidence intervals,
37:57then they should satisfy the criteria
38:01for sensitivity interval.
38:03So now, if you take a union this interval only depends
38:07on the range H,
38:08and then you just apply the union bound
38:12and get this formula from the first.
38:17And this can be slightly improved
38:20to cover not just these parameters,
38:23but also the entire partial identified region
38:27if the intervals if the confidence intervals
38:30have the same tail probabilities.
38:35So we discussed this in our paper.
38:39And here, so, all we need to do
38:43is to compute this union.
38:46So, which essentially is an optimization problem
38:49we'd like to minimize the lower bound,
38:52that the lower confidence point Cl of eta over eta in H
38:59and similarly for the upper bound.
39:02And usually using of syntactic theory,
39:05we can get some normal base confidence
39:09intervals for each fixed eta.
39:12And then we just need to optimize
39:14this thing this confidence interval over eta.
39:20But for many problems this can be
39:22computationally challenging because the standard errors
39:26are usually quite complicated
39:30and it has some very nonlinear dependence
39:32on the parameter eta.
39:34So optimizing this can be tricky.
39:40This is where another method of percentile bootstrap method
39:44can greatly simplify the problem.
39:47It's proposed by this paper that we wrote,
39:53and what it does is instead of using
39:56the syntactic confidence interval for fixed eta,
40:01we use the percentile bootstrap interval.
40:04Where we take theta samples,
40:06and then you estimate the causal effect beta
40:11in each resample and then take quantiles.
40:15Okay, so if you use this confidence interval,
40:19then there is a general,
40:25generalized minimax inequality that allows us to construct
40:29this percentile bootstrap sensitivity interval.
40:33So what it does is this thing in the inside
40:37is just the union of these percentile construct
40:41intervals for fixed eta,
40:45taken over eta in H.
40:49And then this generalized minimax inequality
40:51allows us to interchange the infimum with quanto
40:57and the supremum of a quanto.
41:00Okay, so the infimum of a quanto
41:01is greater than equal to the quanto of infimum
41:05and that it's always true.
41:07So it's just a generalization
41:09of the familia minimax inequality.
41:13Now, if you look at this order interval,
41:16this is much easier to compute,
41:19because all it needs to do
41:20is you gather data resample,
41:25then you just need to repeat method 1.3.
41:29So just get the infimum of the point estimate
41:34for that resample and the supremum for that resample.
41:37Then you do this over many, many resamples
41:41and then you take the quantiles of the infimum,
41:44lower of the infimum and upper quantile of the supremum,
41:48and then you're done.
41:50And because this union sensitivity interval
41:53is always valid,
41:55if the individual confidence intervals are valid.
41:58So you almost got a very you got a free lunch
42:02in some sense,
42:03you don't need to show any heavy theory.
42:06All you need to show is that
42:08these percentile bootstrap intervals are valid
42:11for each fixed eta,
42:13which are much easier to establish in real problems.
42:23And this is sort of selfish,
42:25where I'd like to compare this idea
42:27with Efron's bootstrap,
42:29where what was found there
42:31is that you've got a point estimator,
42:33you resample your data,
42:35and then many times and then use bootstrap
42:38to get the confidence interval.
42:41For partially identified inference,
42:44you need to do a bit more.
42:46So for each resample you need
42:48to get extrema optimal estimator.
42:52Then the minimax inequality allows you just
42:55sort of transfer the intuition from the bootstrap,
43:00for bootstrap from point identification
43:02to partial identification.
43:08So the third approach in this,
43:11is a third method in this general approach
43:14is to take the supremum of key value.
43:15And this is used in Rosenbaum sensitivity analysis.
43:18If you're familiar with that.
43:22Essentially it's a hypothesis testing analog
43:24of the Union confidence interval method.
43:29What it does is that
43:30if you have individually valid P values for a fixed eta,
43:35then you just take the supremum of the P values
43:38over all the etas in this range.
43:41And that can be used for partially identified inference.
43:46So what Rosenbaum did,
43:49and Rosenbaum is really a pioneer in this area
43:52in the partially identify sensitivity analysis.
43:56So what he did was use randomization tests
43:59to construct these key values.
44:03So, this is usually done for matched observational studies
44:07and the inside of this line of work
44:12is that you can use these inequalities
44:16particularly Holley's inequality
44:19in probabilistic combinatorics
44:22to efficiently compute these supremum of the P values.
44:26So, usually what is done there is that
44:30the Holley's inequality gives you a way
44:32to upper bound the distribution of a that,
44:39to upper bound family of distributions
44:42in the stochastic dominance sense.
44:45So, that is used to get these supremum of the P values.
44:51And so, basically the idea is to use some theoretical tool
44:59to simplify the computation.
45:05Okay, so that's the statistical inference.
45:08The third part, the third component
45:10is interpretation of sensitivity analysis.
45:13And this is the area that we actually really need
45:17a lot of good work at the moment.
45:20So, overall, there are two good ideas that seem to work,
45:26that seem to improve the interpretation
45:28of sensitivity analysis.
45:30The first is sensitivity value,
45:32the second is the calibration using measured confounders.
45:36So the sensitivity value is basically
45:38the value of the sensitivity parameter
45:41or the hyper parameter,
45:42where some qualitative conclusions about your study change.
45:48And in our motivating example,
45:51this is where the estimated average treatment effect
45:55is reduced by half an Rosenbaum sensitivity analysis
45:59if you are familiar with that.
46:01This is where, this is the value of the gamma
46:03in his model,
46:05where we can no longer reject the causal null hypothesis.
46:10So, this is can be seen as kind of an extension
46:14of the idea of a P value.
46:17So P value is used for primary analysis,
46:19so assuming no unmeasure confounding,
46:22and then for sensitivity analysis,
46:24you can use the sensitivity value to sort of sorry,
46:30that's the P value it basically measures
46:33how likely your results,
46:36your sort of false rejection is due to
46:39sort of random chance.
46:44But then what a sensitivity value does
46:46is measures how much sort of how sensitive your resources is
46:51in some sense, so, how much deviation
46:53from the unmeasured confounding it takes
46:55to alter your conclusion.
46:58And for sensitivity value,
47:01there often exists a phase transition phenomenon
47:04for partially identified inference.
47:07This is because if you take your hyper parameter gamma
47:11to be very large,
47:13then essentially your partially identify region
47:15already covered in null.
47:17So, no matter how large your sample size is
47:20you can never reject null.
47:23So, this is sort of an interesting phenomenon
47:28and explained first discovered by Rosenbaum
47:32in this paper I wrote also clarified some problems
47:38some issues in both the phase transition.
47:44So, the second idea is the calibration
47:46using measured confounders.
47:49So, you have already seen an example
47:51in a motivating study.
47:54It's really a very necessary and practical solution
47:59to quantify the sensitivity,
48:01because it's not really very useful if you tell people,
48:05we are sensitive at gamma equals to two,
48:08what does that really mean?
48:09That depends on some mathematical model.
48:13But if we can somehow compare that
48:15with what we do observe,
48:18and we have,
48:20often the practitioners have some good sense
48:23about what are the important confounders and what are not.
48:27Then this really gives us a way to calibrate
48:31and strengthen the conclusions of a sensitivity analysis.
48:35But unfortunately, although there are some good heuristics
48:38about the calibration,
48:40they're often suffer from some subtle issues,
48:44like the ones that I described
48:46in the beginning of the talk.
48:49If you carefully parameterize your models
48:51this can become easier.
48:54And this recent paper sort of explored this
48:56in terms of linear models.
49:01But really there's not a unifying framework
49:04then you can cover more general cases
49:08and lots of work are needed.
49:11And when I was writing the slides,
49:13I thought maybe what we really need
49:15is to somehow build this calibration
49:18into the sensitivity model.
49:20Because currently our workflow is that
49:22we assume a sensitivity model,
49:24and we see where things get changed,
49:26and then we try to interpret those values
49:29where things get changed.
49:31But suppose if we somehow build that,
49:34if we left the range H eta to be defined
49:38in terms of this calibration.
49:40Perhaps gamma directly means some kind of comparisons
49:45that measured confounders this would solve some
49:49a lot of the issues.
49:50This is just a thought I came up
49:53when I was preparing for this talk.
49:56Okay, so to summarize,
49:59so there is number of messages,
50:01which I hope you can take home.
50:05There are three components of a sensitivity analysis.
50:08Model augmentations, statistical inference
50:11and the interpretation of sensitivity analysis.
50:14So sensitivity model is about parameterizing,
50:17the full data distribution.
50:19And that's basically about over parameterizing
50:23the observed data distribution.
50:25And you can understand these models
50:26by the observational equivalence classes.
50:30You can get different model augmentations
50:33by factorizing the distribution differently
50:35and specify different models
50:38for those that are on identifiable.
50:41And there's a difference between point identified inference
50:45and partially identified inference,
50:47and partially identified inference is usually much harder.
50:52And there are two general approaches
50:55for partially identified inference,
50:57bound estimation and combining point identified inference.
51:02For interpretation of sensitivity analysis,
51:05there seem to be two good ideas so far,
51:08to use the sensitivity value,
51:10and to calibrate that sensitivity value
51:13using measured confounders.
51:16But overall,
51:18I'd say this is still a very,
51:23this is still a very open area
51:26that a lot of work is needed.
51:28Even for this prototypical example
51:31that people have studied for decades,
51:33it seems there's still a lot of questions
51:36that are unresolved.
51:38And there are methods that need to be developed
51:41for this sensitivity analysis
51:45to be regularly used in practice.
51:48And then there are many other related problems
51:51in missing data in causal inference
51:54that need to see more developments of sensitivity analysis.
51:59So that's the end of my talk.
52:01And there are some references that are used.
52:05I'm happy to take any questions.
52:08Still have about four minutes left.
52:11- Thank you.
52:13That yeah, thank you.
52:14Thank you, I'm sorry I couldn't introduce you earlier,
52:17but my connection but it did not to work.
52:21So we have time for a couple of questions.
52:26You can write the question in the chat box,
52:29or just unmute yourselves.
52:43Any questions?
52:54I guess I'll start with a question.
52:56Yeah I guess I'll start with a question.
53:00This was a great connection between I think,
53:04sensitivity analysis literature
53:06and the missing data literature.
53:09Which I think it's kind of overlooked.
53:12Even when you when you run a prometric sensitivity analysis,
53:17it's really something, like most of the times
53:20people really don't understand
53:22how much information is given.
53:25Like, how much information the model actually gives
53:29on the sensitivity parameters.
53:32And as you said,
53:34like it's kind of inconsistent
53:36to set the sensitivity parameters
53:37when sensitivity parameters are actually identified
53:40by the model.
53:43So I think like my I guess a question of like,
53:46clarifying question is,
53:49you mentioned there is this there this testable models,
53:54this testable models essentially are wherein
53:56the sensitivity model is such that
54:00the sensitivity barometer are actually point identified.
54:04Right?
54:04- Yes.
54:05So it re, so you said,
54:08you reshooting use the sensitivity analysis
54:11to actually to set the parameters
54:14if the sensitivity parameters
54:16are actually identified model.
54:18- Yeah.
54:19- Is that what you're trying?
54:21All right, so and. - Yes, yeah.
54:23Basically what happened there is the model is too specific,
54:27and it wasn't constructed carefully.
54:30So it's possible to construct parametric models
54:33that are not testable that are perfectly fine.
54:37But sometimes, if you just sort of
54:40write down the most natural model,
54:42if it just extend what the parametric model
54:46you used for observed data to also model full data,
54:52then you don't do it carefully,
54:54then the entire full data distribution becomes identifiable.
55:00So it does makes sense to treat those parameters
55:02as sensitivity parameters.
55:05So this kind of is a reminiscent of the discussion
55:08in the 80s about the Hackmann selection model.
55:12Because in that case,
55:14there was also sir Hackmann has this great selection model
55:18for reducing or getting rid of selection bias,
55:23but it's based on very heavy parametric assumptions.
55:27And you can adapt certainly identify the selection effect
55:32directly from the model where you actually have no data
55:36to support that identification.
55:40Which led to some criticisms in the 80s.
55:45But I think we are seeing this things repeatedly
55:51again and again in different areas.
55:55And it's, I think it's fine
55:59to use the power metric models that are testable, actually,
56:05if you really believe in those models,
56:07but it doesn't seem that they should be used
56:09this sensitivity analysis,
56:12because just logically,
56:13it's a bit strange.
56:15It's hard to interpret those models.
56:20And but sometimes I've also seen people
56:24who use the sort of parameterize the model
56:28in a way that you include enough terms.
56:31So the sensitivity parameters are weakly identified
56:35in a practical example.
56:38So with a practical data set of maybe the likelihood test,
56:44Likelihood Ratio Test rejection region,
56:46that acceptance region is very, very large.
56:50So there are a suggestions like that,
56:53that kind of it's a sort of a compromise
56:58for good practice.
57:02- Right in that case you gave it either set the parameters
57:06and drag the causal effects,
57:09or kind of treat that as a partial identification problem
57:13and just write use bounds or the methods
57:17you were mentioning, I guess.
57:20- Yeah.
57:21- Yep, thanks.
57:26Other questions?
57:34Well I guess you can read the question?
57:37- It's a question from Kiel Sint.
57:41Sorry if I didn't pronounce your name correctly.
57:43"In the applications of observational studies ideally,
57:46what confounders should be collected
57:47for sensitivity analysis,
57:49power sensitivity analysis for unmeasured confounding?"
57:54Thank you.
57:54So if I understand your question correctly,
58:01basically what sensitivity analysis does
58:04is you have observational study,
58:06where you for already collected confounders
58:10that you believe are important or relevant
58:13that really that are real confounders,
58:16that they change the causal unchanged the treatment
58:20and the outcome.
58:22But often that's not enough.
58:25And what sensitivity analysis does is it tries to say,
58:29"based on what the components already
58:33you have already collected,
58:34what if there is still something missing
58:37that we didn't collect?
58:39And then if those things behave in a certain way,
58:44does that change our results?"
58:47So, I guess sensitivity analysis is always relative
58:52to a primary analysis.
58:54So I think you should use the same set of confounders
58:58that the primary analysis uses.
59:02I don't see a lot of reasons to vary to say
59:10use a primary analysis with more confounders,
59:14but a sensitivity analysis with fewer confounders.
59:21Sensitivity analysis is really a supplement
59:23to what you have in the primary analysis.
59:35- Just one more question if we have?
59:40There not.
59:41Yes.
59:43- So from Ching Hou Soo,
59:45"How to specify the setup sensitivity parameter gamma
59:49in the real life question?
59:51When gamma is too large the inference results
59:54will always be non informative?"
59:57Yes, this is always a tricky problem any,
01:00:01and essentially the sensitivity values kind of
01:00:06trying to get past that.
01:00:09So it tries to directly look at the value
01:00:11of this sensitivity parameter that changes your conclusion.
01:00:15So in some sense, you don't need to specify
01:00:19a parameter a priori.
01:00:21But obviously, in the end of the day,
01:00:25we need some clue about what value of sensitivity parameter
01:00:30is considered large.
01:00:31In a practical sense, in this application.
01:00:36That's something this calibration clause
01:00:39this calibration analysis is trying to address.
01:00:44But as I said,
01:00:44they're not perfect at the moment.
01:00:47So for some time, now, at the least,
01:00:52we'll have to sort of live through this and
01:00:56or will either need to understand really
01:01:01what the sensitivity model means,
01:01:02and then use your domain knowledge
01:01:06to set the sensitivity parameter,
01:01:10or we have to use these rely on these
01:01:15imperfect visualization tools to calibrate analysis.
01:01:28- Yeah, all right.
01:01:29Thank you.
01:01:30I think we need to wrap up we've run over time.
01:01:33So thank you again Qingyuan,
01:01:36for sharing your work with us.
01:01:38And thank you, everyone for joining.
01:01:41Thank you.
01:01:42Bye bye.
01:01:43See you next week.
01:01:44- It's a great pleasure.
01:01:45Thank you.