Climate Change and Health Seminar Series: “Critical Window Variable Selection for Mixtures: Estimating the Impact of Multiple Air Pollutants on Stillbirth”

May 17, 2022

Information

April 25, 2022
Dr. Josh Warren joined the Center on Climate Change and Health to discuss his work on air pollution and health outcomes including, preterm birth and low birth weight.

ID7850

To CiteDCA Citation Guide

00:00<v ->Lets get started</v>
00:01and thank you everyone for coming today.
00:03And this is will be your final seminar
00:07for this semester for the (indistinct) the house seminar.
00:09And we are very, very pleasant
00:11to have very our own affiliate faculty,
00:16Dr. Josh Warren joining us.
00:19Dr. Warren is a associate professor
00:21at the Biostatistics Department here,
00:24and his research focuses on statistical method
00:28in public health with the emphasis
00:30on environmental health programs,
00:32and much of his work involves introducing spatial
00:36and spatial temporal models in the basin setting
00:39to learn about the association
00:41between environmental exposures,
00:42such as air pollution and various health outcomes,
00:46including the stillbirth that we are here today.
00:50He's also interested in applying and developing
00:52some spatial temper models in collaborative settings,
00:56such as the infectious disease
00:58we been considered during the COVID pandemic.
01:02So without further ado, Josh,
01:04the floor is yours, thank you.
01:06<v ->Thank thank you Kai for the introduction.</v>
01:08Can everyone hear me?
01:10<v Kai>Yes.</v>
01:11<v ->All right, perfect.</v>
01:14And thanks to Kai for the invitation
01:15and Mulholland for setting all of this up
01:17and allowing me to do this virtually.
01:19It's nice to be here talking about something
01:22other than COVID.
01:23And I guess more recently in my past,
01:26I've been doing a lot of infectious disease work,
01:28so it's kind of nice to be back into something
01:30that I'm still passionate about
01:32and still working heavily on.
01:34And so hopefully some of this today
01:36will be a little bit of review of what we've done
01:38and really current project
01:40that we've just completed and published,
01:43but hopefully there are some elements in here
01:45that you can find overlap within your own work.
01:48And so if you have,
01:50if you see something that brings a bell,
01:52just please reach out and we can kind of talk.
01:54My goal and all of this work
01:56is to kind of develop user friendly methods
01:59that are useful for people outside
02:01of statistics and biostatistics.
02:02So the EPI community and at large usually.
02:06So, yeah, just feel free to reach out afterwards,
02:08and I can share more information,
02:10but today we're gonna be talking about
02:12critical window variable selection for mixtures
02:15and particularly air pollution and stillbirth.
02:17So we'll go ahead and jump into it.
02:21I think probably most people here will know air pollution,
02:24reproductive outcomes.
02:25There's a pretty substantial literature at this point
02:29that suggests exposure to ambient air pollution
02:32during pregnancies associated
02:33with a number of adverse birth outcomes,
02:35including preterm pregnancy, low birth weight,
02:38congenital heart defects, stillbirth, and others.
02:42These are some of the main ones.
02:43Stillbirth is a more recently
02:45kind of emerging outcome of study.
02:48Traditionally, it's been pre-term birth
02:49and low birth weight have gotten a lot of attention,
02:52but these associations are stable robust,
02:55and have been observed across a number of different study
02:57settings, designs, pollutants
02:59and there are a number of good review papers.
03:01If you're interested in a lot of the EPI literature
03:03on this topic,
03:06I would kind of summarize previous a number
03:09of the previous EPI studies,
03:10but as they like to use pollution exposures
03:14that are summarized kind of A priorities,
03:18so they wanna focus on a trimester,
03:20they wanna focus on the entire pregnancy,
03:22like, what is the exposure across the entire pregnancy?
03:25What impact does that have with respect to this outcome?
03:27So these are usually pre-specified averaging periods
03:30and they're explored separately
03:33in these different usually kind
03:35of traditional statistical models like logistic regression
03:38or (indistinct) if you're using some kind of count model.
03:41And so lots of different pollutants
03:43are floating around in these analyses,
03:44lots of different averaging periods
03:46in terms of the exposure, relevance exposure period.
03:50Luckily working with pregnancy,
03:52we have a relatively stable idea
03:56of when exposure potentially affects the fetus.
04:02So lots of models floating around lots of pollutants
04:05and exposure weeks,
04:07but this method is inefficient
04:09and doesn't allow for a joint identification
04:11of more kind of specific periods
04:13across the entire pregnancy in a continuous manner.
04:16So more recently there has been a focus on
04:19critical window estimation and identification.
04:22So this is where I have done quite a bit of work, I think,
04:25in this world.
04:27And then even more recently, I would say,
04:29and I know a number of people I work with even here
04:31at Yale pollution mixers are becoming a really big deal.
04:35So in this talk,
04:36we're trying to combine both of these things,
04:38things that we know really well
04:39or that my group knows really well,
04:40critical windows, estimation identification,
04:42and then pollution mixers,
04:43things that we're getting into more and more it seems.
04:48So starting with critical windows of exposure
04:50and exactly what am I talking about
04:52when I'm talking about critical windows?
04:55So there's an increasing interest in identifying
04:57more specific periods of increased vulnerability.
05:00Usually we're thinking about pregnancy,
05:01but this can go for any really health outcome
05:04that you're interested in,
05:06but more vulnerable periods of the pregnancy
05:08to environmental exposures
05:10and doing this within a single modeling framework.
05:12So estimation of these effects,
05:14we're calling critical windows
05:15or windows of susceptibility.
05:17The NIHS included this identification of critical windows
05:21as a part of its strategic goals back in 2012.
05:24And the focus has remained since then.
05:27So understanding like specific timing of exposure
05:31with respect to outcome development
05:33has a number of features but importantly,
05:35it could lead to improve mechanistic explanations
05:38of disease development,
05:40and ultimately focus guidelines for protection
05:42of the unborn child.
05:45So we have, like I mentioned,
05:46we've done a lot of methods work here,
05:50trying to understand variability in these windows
05:53essentially, and how to estimate them appropriately.
05:56So you'll start to see, I show some pictures,
05:58some figures here that the models become really
06:02lots of parameters in these models.
06:03So you, it really becomes an estimation challenge.
06:06Like how do you,
06:07the model makes sense, you can write it down,
06:09but can you actually fit these models?
06:10So we've done these or consider these models
06:14in a number of different settings,
06:15including the space temporal settings,
06:17survival statistics setting, semi parametric,
06:20non-parametric bays with multi-varied outcomes,
06:23and then more recently variable selection.
06:26And so inferences typically carried out
06:28in the Bayesian setting where I do most of my work
06:31due to increased computational flexibility
06:33and importantly incorporation
06:35of stabilizing prior structure.
06:38So not only have these been done on the method side
06:42where a lot of my time is spent,
06:44but I really like seeing them translated
06:46to actual practice too.
06:48So these methods and kind of variants
06:51of these methods have been,
06:53has successfully identified these critical windows
06:56in a number of outcomes and settings
06:58and different populations,
06:59but pre-term birth, low birth weight,
07:02CHDs so across a number of studies now.
07:04So they're getting good traction in other studies.
07:07Well, not just in the stat literature,
07:08which is nice to see.
07:11To give you a more kind of practical view
07:13of what I'm talking about,
07:14this is one of the first studies we published on
07:18way back in 2012.
07:19And this is for Harris County Texas,
07:22home of Houston, Texas.
07:24And on the left two panels,
07:25you'll see output from our newly developed method
07:29on the right two panels,
07:30you'll see output from more of a naive approach
07:32that was that we were considering at the time.
07:35So what we're talking about these critical windows
07:37are exactly what you're seeing.
07:39Maybe you can see my mouse here,
07:41but these periods where these risk ratios
07:45in this case kind of exclude zero
07:49or these risk parameters,
07:50they're not on any particular scale.
07:52That's easily to interpreted in this case, unfortunately,
07:54but this means that elevated exposure
07:57during pregnancy week 10 for example,
07:59leads to an increase in this case,
08:01was preterm birth, a preterm birth risk.
08:04So during your early kind of mid first
08:07and early second trimester pregnancy,
08:09we were noticing some interesting elevated risk to PM 2.5.
08:14And what we've seen across a number of studies now
08:16is that these windows vary by pollutant by outcome
08:20they're very different.
08:22There's lots of variability for ozone for example,
08:24it seemed to be early on in the first trimester.
08:28So this new methodology allows us to kind of hone in
08:31on the signal and reduce some of this noise.
08:35So if you try to basically imagine your data set,
08:38you have lots of pregnant women in your study,
08:41and you have linked with that pollution exposure
08:44for the first 36 weeks of pregnancy.
08:46A really naive thing to do would be,
08:47let's just throw all of those
08:49into a multiple regression model,
08:50some binary regression model,
08:52all at the same time.
08:53Clearly there's going to be correlation across time
08:55because exposure week one looks
08:57like exposure week two, et cetera.
08:58And if you do that, you can expect multicollinearity,
09:01which is jumping around of point estimates,
09:04increased variability,
09:05which is exactly what you see here.
09:07So our new methodology,
09:08which relied on like Gaussian processes
09:11and other smoothing techniques
09:13allowed us to in a data driven way,
09:16kind of tease out signal
09:17that you could almost make out by eye here.
09:19So if you look hard enough,
09:20you can see kind of a similar shape in both cases,
09:24but we were able to see a better shape here.
09:26So this is what we're generally in the past
09:28have been talking about with critical window estimation
09:30and identification.
09:33We mentioned that we worked on the survival outcome,
09:36we started to think about preterm birth
09:38instead of just a binary outcome yes or no.
09:41We wanted to consider it as a survival outcome.
09:43So what's the probability you make it
09:44to week 35 of your pregnancy,
09:46given that you've made it to 34 for example.
09:49So what this opened up was,
09:50well, maybe there are different exposure windows
09:53given different outcome weeks.
09:55So you can think of outcome week on the X axis
09:58on the Y axis here on an exposure week on the Y axis.
10:01So if you gave birth that week 27,
10:04you only had 27 weeks of exposure, for example.
10:07So people were leaving the set as pregnancy happened.
10:10And so we introduced methodology
10:12that not only kind of smoothed in the exposure direction,
10:15but also smooth across the outcome direction.
10:17And so these darker areas indicate weeks
10:21and outcome weeks, exposure weeks and outcome weeks
10:23where elevated exposure more adversely impacts
10:27like the risk of preterm birth in this case.
10:29So there was a distinct difference in this early preterm
10:32and then this late preterm,
10:33which kind of was impacted by exposures later
10:35in the pregnancy.
10:38And so underlying all of these kind of simplified plots
10:42I'm showing you were
10:43these individual outcome week specific critical window plots
10:47that we kind of are more accustomed to interpreting.
10:52So more recently we got into the spacial world noticing
10:57that, well, we started noticing that
10:59when we applied these methods
11:00to different data sets in different areas,
11:03we were seeing different shapes, different windows,
11:05different pollutants, emerging as important.
11:08And so we begin to think,
11:09well, is there spatial variability in even at a local scale?
11:13And so we develop new methodology
11:15that can kind of tease out
11:17not only temporal changes and exposure risk,
11:21but also spatial variability as well.
11:22So there's spatial correlation component here along
11:26with kind of these critical windows floating around as well.
11:29So this was 11 counties in North Carolina,
11:31including Wake County and the county to House Charlotte,
11:35and this was a low birth weight study.
11:37So there's methodology around that can do this.
11:41So we were working on these for a number of years
11:44and we got approached basically with a question,
11:47how are you actually defining
11:49a critical pregnancy window?
11:51And it seemed obvious at first,
11:52but then we started to really question
11:54the assumptions we had been making,
11:56but obviously what we had been doing,
11:58if I go back a few slides here is just looking,
12:01when did these individual week
12:03or time specific parameters exclude the critical value
12:07in zero in this case?
12:08And we were calling that a critical window
12:13but we started to worry that
12:16this might not be getting exactly what we're hoping
12:18is it capturing the true set?
12:20Is this doing a good job?
12:22In particular, we were worried about over smoothing
12:25with something like a Gaussian process
12:27and specifically with the endpoint.
12:29So if you can imagine, I'll go back one more time,
12:32sorry to scroll.
12:34Imagine the end points here and here,
12:37we begin to worry that the over smoothness
12:41could be pulling some of these actually null results
12:45into the critical set or vice versa,
12:48kind of pulling some important ones down to the null set.
12:51So we were very concerned about the endpoints here
12:54when we started working on this more recent work.
12:57So our solution to this
12:59was critical window variable selection.
13:01So we like the smoothness, we like the plots that emerge.
13:04We like how we can interpret these things,
13:06but a variable selection component
13:07would allow us to turn some of these effects off,
13:10even if they appear to be significant in the plots.
13:14And so what this meant is,
13:15we introduced like a bayesian variable selection technique
13:20called critical window variable selection,
13:22where basically you still have the critical window plots
13:25that you know and love, and you know how to interpret,
13:28but underlying each effect now,
13:30you actually have this binary exclusionary,
13:33or inclusion variable
13:34that tells you whether this thing should be included.
13:37This particular weekly effect should be included
13:39in the critical window set.
13:40And what we found is that there are a number of times,
13:44not in this particular real case study in North Carolina,
13:47but through simulation,
13:48we noticed that there were times
13:50when exactly what we had worried was happening
13:53had been happening so effects
13:54near the border here were being pulled into the set,
13:58but luckily they were not being included
14:00in the variable selection component.
14:01So to be in the variable selection set now,
14:05you had to have posterior inclusion probability bigger
14:08than point five, so bigger than this line
14:11and your individual weekly effects
14:13had to be exclude zero with a 95% credible vulnerable.
14:16So with these two kind of definitions we were doing,
14:19we were getting a much better kind of recovering
14:23the true set of critical windows in simulation, at least.
14:27So this really outperformed
14:29what we had been doing previously.
14:30So we've been moving forward
14:32with this variable selection concept since then.
14:36All right, so we like critical window variable selection,
14:38we like a lot of these other methods.
14:39The problem is that as I know,
14:42a number of you are aware,
14:43the literature has really moved towards the science
14:46has moved towards pollution, mixtures
14:48and multiple exposures.
14:50And a lot of these methodologies were developed
14:52with one pollutant in mind at the most two to three,
14:58but they were not generally meant
15:00for pollution mixtures for example.
15:02So our goal in this work
15:04was to extend what we liked the CWVS,
15:07critical and variable selection to accommodate mixtures.
15:10And so when we started to thinking about mixtures,
15:12when you have time varying exposures
15:14and time varying effects,
15:16it became relatively conceptually complicated
15:19because you have lots of parameters floating around.
15:22So we wanted something that could do
15:23like a dimension reduction essentially.
15:25So what we thought is a nice solution,
15:28like in a single pollutant context, or I'm sorry,
15:32in a single exposure time period context
15:34is this weighted quantile sum regression,
15:36which I know a lot of you are familiar with,
15:38'cause I've helped write pieces of grants
15:40that have discussed weighted quantile sum regression here,
15:44but it offers a nice interpretable solution
15:46for estimating the impact of a mixture on an outcome.
15:49And it has this really nice sum to one constraint
15:53on the regression parameters.
15:55And so you get in the end,
15:56you have 20 pollutants for example,
15:58and you get to see the relative contribution
16:01of each of these pollutants in terms of the entire mixture.
16:04So you have these little sum to one between zero
16:06and one probabilities or proportions
16:09that describe the role of individual pollutants.
16:12And then you have this global regression parameter
16:15that describes the impact of that mixture
16:17as defined by those weights on the health outcome.
16:21So it does a little two stage process estimate weights
16:24and then global regression parameter,
16:26not important for this talk.
16:29More recently in 2020, this was extended
16:32to the lag weighted quantile sum regression.
16:35And yeah, it extended WQS to the multiple pollutants setting
16:42in a really, I think of it as a relatively ad hoc solution,
16:47but basically WQS has fit at each exposure week separately.
16:51The weights are estimated,
16:53the mixtures are combined based on those weights.
16:55And then those kind of package mixtures
16:58are thrown into like a distributed live model
17:00to estimate similar curves
17:02is what I've been showing you so far.
17:03So the estimation of the weights
17:05and their relative importance in the mixture
17:07are done separately outside of kind of the estimation
17:11of the regression parameters as well.
17:13So this more, again, more of a two stage approach.
17:16All right, so we like WQS
17:19because of its relative simplicity and its interpretability,
17:22we liked critical and variable selection.
17:24So the goals here were to combine
17:26that estimation identification ability of CWVS
17:29with the interpretability and shrinkage properties
17:31of WQS within a unified modeling framework and extending
17:37oh yeah, so WQS is nice.
17:39It has zero to some to one components
17:42that are between zero and one,
17:44but you don't actually get a sense of variable selection
17:47when doing this.
17:48So none of the weights can exactly equal zero.
17:50We wanted a more sparse solution
17:53and so we introduced also a way
17:54to make these weights exactly zero.
17:57So you can get a better sense of
17:58which pollutants are the main players in the mixture.
18:02And so what we're calling this is CWVS for mixtures
18:06or CWVS mix.
18:09And so some features before we get
18:11into a little bit of the details of the model,
18:13these are like the high,
18:14just if you take nothing else away from like
18:16what this model does, this is,
18:18I think the important slide here is that,
18:20we have main effects and first order interactions
18:23between the pollutants during each exposure period.
18:25So week one of pregnancy,
18:27week two of pregnancy, all of these interactions,
18:30all of these main effects are included.
18:31So there's lots of parameters you can already imagine
18:34are floating around here.
18:35We still hold onto this sum to one mixture weights
18:38at each exposure week separately.
18:41But we want to account for the fact that,
18:43what's happening in exposure week one
18:44may be similar to exposure week two to three to four,
18:48with this correlation dying out as you get further apart
18:50in exposure time.
18:51So we want these weights not to have to be estimated
18:55kind of independently at each exposure week.
18:57We want to enforce some smoothness,
19:00data driven smoothness preferably to estimate these weights.
19:04And as I mentioned, we want these weights
19:06to have a variable selection component.
19:08So we can actually identify individual elements
19:10of the mixture and we still have this global risk parameter,
19:14and this is going to follow the CWVS model
19:17so that we can estimate
19:18these critical windows more accurately.
19:22All right, so the goals of this study
19:24before you jump into some of the methodology here
19:26are to develop CWVS mix.
19:28As I mentioned,
19:30simulation is really important in this world.
19:32I wanna make sure that what we're doing
19:34is not just duplicating other efforts
19:36and that it's actually offering something new,
19:38something helpful to the literature
19:40that we can point to.
19:41I think I know the shortcomings of something like lag,
19:45weighted quantile sum regression,
19:46but until I see it actually happen in simulation
19:49it's just kind of hypothetical.
19:51So finally we wanna investigate the impact
19:53using this new methodology
19:55of multiple ambient air pollutants on stillbirth risk.
19:58And in this case,
19:59we're focusing on New Jersey from 2005 to 2014.
20:03And actually we have really nice output
20:06from a novel data fusion model.
20:08There are lots of data fusion models floating
20:10around right now, but this is a one from 2019,
20:12from our collaborator at Georgia tech and at Emory
20:16that provided 12 pollutants,
20:1812 kilometer grid cell size across the entire US
20:22daily no missing this things like that.
20:26So for these particular pollutants.
20:29All right, so let's talk a little bit about
20:31the model and what it does
20:33and some of the intuitive features
20:34that I think it has and why it might work well.
20:37So yeah, we're starting with some outcome,
20:42it could be some adverse health outcome
20:44like preterm pregnancy or not,
20:46or stillbirth or not some be newly outcome
20:49where this PI describes kind of the probability
20:52that person I experiences this outcome.
20:55We model this probability using logistic regression
20:58as we normally would,
21:00these green I'm kind of trying to different.
21:02I'm trying to keep people's attention
21:04to the parameters
21:05and how I'm mentally grouping them as well.
21:08So these green represent these typical like demographics.
21:12We know there are certain risk factors
21:13for different health outcomes,
21:16particularly pregnancy outcomes being over 35 for example,
21:19with preterm pregnancy, alcohol, smoking, et cetera.
21:23So this would go into this exi transpose data.
21:26This specter here where a lot of our work came in
21:31are on these blue parameters,
21:33which are the weights that I've been talking about.
21:35So these weights, these blue parameters
21:37actually sum to one at each exposure week.
21:42So each exposure week T
21:44we basically have a vector of Lambdas
21:47that are weights between zero and one
21:49could be actually equal to zero exactly.
21:52And they sum to one at each exposure week separately,
21:55you notice their index Byte
21:57because we're allowing the possibility
21:59that the exposure profile changes across the pregnancy.
22:03So it early on in the pregnancy,
22:05maybe the risk is primarily driven by pollutant A
22:10but later on in the pregnancy,
22:12perhaps that shifts.
22:13And so the weights would shift well as well,
22:16but we expect this shift to be smoother
22:19rather than complete choppiness across the exposure weeks.
22:23And so what these weights do are
22:25they kind of multiply here with the main effects
22:29and these first order interactions.
22:31And if you think about taking this sum across
22:33main effects and interactions,
22:34you have this package of weighted exposure essentially.
22:38And the alpha here tells us whether at exposure period T
22:42this package has any impact on
22:44your ultimate probability of developing the outcome.
22:48So we have this nice sense of the weights,
22:50help us describe what's happening with the mixture profiles.
22:53And, but the alpha keeps us honest
22:55and keeps us able to say,
22:57well, you know, this mixture's interesting,
23:00but it has no impact on the health outcome of interest here.
23:06So how do we do these mixture weights?
23:09As I mentioned, two features that we're interested in
23:11the ability to actually equal zero
23:14and smoothness across time.
23:15And so first point is to,
23:19well, we introduce these latent weight parameters
23:22that I'm calling Lambda star,
23:23not to don't get too caught up in them.
23:25Basically they're continuously varying parameters
23:28that as soon as they cross the zero threshold,
23:31they turn on in our model.
23:33So that's what this maximum is doing.
23:35So they turn on and they give you some weight
23:37and then as soon as they cross into negative territory,
23:40they go to zero.
23:40So this is how we're getting actual zeros in these weights.
23:43So the Lambdas and the Lambda Tilda
23:45can actually equal zero
23:47based on these underlying latent weight parameters.
23:52All right, so we keep them summing to one
23:55by dividing by the sum of the numerator, essentially.
23:58So whatever weights are positive gets summed
24:00and we're dividing by,
24:01we're basically self kind of correcting here
24:04so that the weights always come to one,
24:07these weights combined.
24:08For the interactions, we don't want the case.
24:12We prefer sparse model,
24:14particularly as the number of pollutants get really large.
24:17So the number of interactions will grow.
24:19So what we want is our interactions
24:21that are only turned on essentially
24:24when the main effects are turned on.
24:25So you can see these two indicators I've added
24:27basically say if the main effect themselves
24:30aren't both turned on,
24:31this interaction effect gets zeroed out already.
24:34So the interaction has a kind of a higher bar clear
24:37this strict hierarchy basically
24:40where both main effects have to be on
24:43and the interaction latent variable has to be on.
24:46So there's the zero component now,
24:48how do we do smoothness across time?
24:50Well, it's all about this correlation structure.
24:52So these latent Lambda star parameters
24:55that control the weights are actually modeled
24:57as a multi Gaussian process.
24:59And I think the key thing to focus on here is that
25:01there's this underlying correlation structure
25:04that tells us as two exposure time points get further apart.
25:09This exponential of a negative number
25:11will get closer to zero.
25:12So correlation dies out as exposure time gets further apart
25:16now, as they get closer together,
25:17this correlation is gonna be higher.
25:19And the main parameter that controls this level
25:22of correlation is this fee parameter.
25:23And we actually put prior distributions on this
25:27to allow the data, to drive the inference,
25:29rather than like our view of what we expect
25:32this smoothness to look like.
25:33So yeah, this is data driven
25:35kind of smoothness across exposure time.
25:39All right, so now, so we've got the weights handled
25:42they have both properties that we care about.
25:43Now let's talk about the mixture impact itself.
25:46So this alpha recall tells us whether the mixture
25:48that we observe at time point T
25:50or that we estimated exposure time point T
25:53is actually relevant to the health outcome.
25:55So we want, again,
25:57we want this variable selection here
25:59because we've noticed the problem with the end points
26:01that I described earlier.
26:02So to do this, we decompose this effect into two pieces,
26:06a continuously varying piece.
26:08And then this binary piece
26:09that I mentioned earlier on in the talk,
26:11the binary piece are just independent
26:13but newly random variables.
26:15But we imagine that if you're in the critical window set
26:20at time one, then you may be in it at time two
26:22and may be more likely to be in at time three.
26:25So there may be some sense of correlation
26:27across exposure time here as well.
26:29So while we model these things as independent,
26:31the probabilities that underlie these zero
26:34and one variables are actually smoothly varying
26:37and correlated across time.
26:39So again,
26:40we use this kind of exponential correlation structure.
26:43We allow for cross correlation between the continuous
26:46and the binary piece.
26:48Not important to get into here,
26:49you can kind of read back over.
26:52I can share a paper with you if you want to,
26:54or talk more about it offline,
26:56but essentially there's some cross correlation
26:58there's correlation across time,
26:59but this allows for smoothness in the effects
27:02and the kind of the regression parameter effects
27:04that we've been looking at,
27:05but also in the variable selection as well.
27:08And these, both of these things come together to kind
27:11of define the critical window variable selection model.
27:14To finish the model recall everything's in the base setting
27:17so really weekly informative prior distributions
27:21kind of standard prior distributions when possible,
27:24nothing too interesting here.
27:26So the model you may be looking at on this previous slide
27:29and thinking there's a lot of parameters floating
27:31around here.
27:32There's a lot of output that you're going to be estimating.
27:35So how do you make sense of this as a practitioner,
27:38someone who actually wants to know if a mixture
27:40is having an impact on your health?
27:42Well, luckily we still have relatively nice
27:46and estimable kind of effects here,
27:50associations that we can talk about.
27:52So for example,
27:53for a change in the log odds for a one unit increase
27:55in each pollutant during a particular exposure period,
27:59this would be the quantity
28:01that you would make (indistinct).
28:02You would exponentiate this,
28:03and you would have like an odd ratio, for example,
28:05now recall for any model that includes interactions.
28:08The interpretation is always increasingly complicated
28:12because it matters where you start
28:14when you have interactions.
28:15So if you're already at a high level,
28:18so the values themselves of exposure have to come into play,
28:22but nonetheless, you can still get nice quantities
28:25to estimate in the end.
28:27And if you're only interested in what happens
28:29if pollutant A increased during exposure period T
28:32you can write down actually
28:34what that looks like as well.
28:35So you can estimate both of these things relatively easily
28:38from our output, from our model.
28:41Alright, so we have a model
28:42that kind of checked all the boxes,
28:44at least in my head when I was writing it down
28:46and we can, I tested it, we can fit it,
28:49it seems to work and that it's converging
28:53and it's producing things that look reasonable,
28:56but the simulation study really allows us to dig deeper
28:59and say, is there anything, this it's obviously new,
29:02but is there anything beneficial to what we're doing?
29:04Or should we just be doing something simpler
29:06that already exists?
29:08So we wanted particularly to ask,
29:10how does CWVS mix compared
29:13to some of these existing approaches
29:15for three different factors that we're interested in?
29:17So first identifying the true critical window set,
29:20obviously probably the most important part
29:22of critical window research here is like,
29:25let's get the critical window set right
29:28when we're estimating and identifying these parameters.
29:31But obviously when you're talking about mixtures,
29:33we also care about these weights.
29:35We want to know that the mixture profile we're looking at
29:38on a certain exposure period actually is,
29:43reflective of the true mixture profile
29:45that makes sense here.
29:46So how well do we do at estimating these Lambdas
29:49and Lambdas Tilda parameters
29:52that describe the effects of main effects and interactions,
29:55and then finally,
29:56how well do we do it at estimating the magnitude of risk,
30:00these alpha T parameters.
30:02We wanna make sure we're getting these right as well.
30:04And as a side issue, I guess,
30:06just more of our curiosity,
30:07how well does this variable selection process work
30:10for the weights that we've introduced?
30:13So now we need to think about
30:15what are competing methods in this space.
30:17There aren't a lot of methods out there
30:19that aim to estimate critical windows with.
30:24So time bearing exposures and multiple pollutants
30:27and the ones that are out there
30:30give different enough output
30:31that's hard to compare one model to the next,
30:34but here are three approaches that we kind of came up with.
30:38One is the most naive kind of
30:40where I would always start
30:42as a practitioner with a new data set,
30:44this equal weights approach.
30:45So maybe just averaging all of the exposures for a person
30:49on a given exposure week and including that average
30:55and the interactions with the other exposure periods
30:59in a framework, a distributed lag framework.
31:04So yeah, this is called equal weights or EW.
31:08A PCA approach also makes sense.
31:10So let's allow the data to determine
31:11the correct weights of these Lambdas,
31:15but let's focus it only on the exposure period,
31:18only the exposure data.
31:19So at each exposure period,
31:20fit a PCA to the person specific exposures
31:24and generate these weights.
31:27That kind of describe the relative contribution
31:29of the different interactions and main effects in a mixture,
31:34and then weight the mixtures in that way
31:36and throw that weighted value
31:38into the distributed regression model.
31:41So for all of these methods,
31:42we're using the original CWVS,
31:44so that we're comparable so that the method
31:47so that the results are actually comparable across.
31:49And that the only thing that is changing essentially
31:52is how we define the weights.
31:54And then finally, the most sophisticated approach
31:56at that time was this lag,
31:57weighted quantal sum regression
31:58that we talked a little bit about
32:01where we applied weighted quantal sum regression
32:03separately to each exposure period,
32:06let that estimate the weights,
32:07create the little package of exposure,
32:09and then throw those packages
32:10into the regression model using CWVS.
32:13So once you have the weights,
32:15like once you condition on the weights
32:17and you know the weights,
32:18you basically have one exposure
32:19and that exposure is the package,
32:22the mixture package that you've made.
32:24So the model,
32:25the modeling becomes much simpler in that case.
32:29So how did we go about to test these different methods?
32:35Well, we started very simply.
32:36So these represent the weights cross exposure period.
32:41In this case, I'm pretending like there's only five weeks
32:43in the exposure set.
32:45In reality, I let that vary for each data set
32:47the length and the start time of the exposure window changed
32:51but for this case,
32:52we assumed it started at pregnancy week one
32:55and went to week five.
32:56And so in the simplest case,
32:57we had just assumed there was one pollutant at play
33:00and it stayed constant across the exposure period.
33:02This is really simple.
33:03One pollutant is driving the entire risk that we're seeing.
33:08In another setting, we assumed that there were two,
33:11but there was no changes over time.
33:13They were always static across time
33:15and three, there were three that were coming into play
33:18at four, four, and then five, five of them,
33:21obviously as more come online
33:23and become important players in the mixture.
33:26The weights generally go down
33:27because all of lots of these have to be non zero.
33:31In setting B,
33:32we wanted to allow for some variability
33:34among the important pollutants.
33:35So we still allow for the same important pollutants
33:38to be important at each exposure period,
33:42but we allowed their relative contribution
33:43to change across time.
33:44So early on in pregnancy, this one was important,
33:47but then it's contribution went down
33:49and it was kind of surpassed by number two here
33:53at pollutant two,
33:54and then they can keep swapping in and out
33:56across the exposure.
33:57And in setting C it was complete chaos essentially
34:02different pollutants could come online
34:04and then leave and become important
34:05or not important go to zero.
34:07We don't anticipate this would ever,
34:10or this would be the case,
34:11but it would be nice to know if our model
34:13can somehow collapse and kind of accommodate this reckless,
34:17this wild behavior, I guess.
34:20So, yeah,
34:21this is something that kind of testing the extreme
34:23of all these methods is what we were trying to do here.
34:27So we'll jump right into the results.
34:28Just to give you a sense of what happened
34:31when we tested these models
34:32with lots of simulated data sets,
34:34CWVS mix continuously and kind of consistently
34:40was able to get the critical windows set
34:43more accurately than the other methods,
34:45which struggled kind of in varying degrees
34:48across these different settings,
34:50in terms of estimating the weight parameters.
34:54There's a generally CWVS mix has a lower means scored error
34:59so it's doing a better job of estimating these parameters,
35:02as you would expect, like with equal weights,
35:04if you assume each weight,
35:05each pollutant and interaction is playing
35:09an equal part in the story,
35:10you can be very bad off a lot of times,
35:13which is given, which is why these weights
35:16these values are so high for some of these methods.
35:20And finally,
35:21with the estimation of the regression parameters
35:23that describe the magnitude of risk.
35:26Generally, we're seeing improved performance with CWVS mix,
35:31but interestingly,
35:32at least at the time when we first saw this
35:34is that the equal weights method does a pretty good job
35:38of estimating these risk magnitude parameters
35:43as the number of important pollutants increases.
35:46So if you tell me that every one of your pollutants
35:48are important,
35:49then it's going to be hard to beat that something
35:53that gives all of the pollutants equal weight.
35:55So that's kind of the intuition behind it.
35:57As more pollutants become important,
35:58giving everything equal weight is not such a bad ideas,
36:01almost it's just averaging away some of that error,
36:04but generally, we're still doing well.
36:07And specifically in comparison
36:09to the lag weight quantile sum regression,
36:10that's really importantly,
36:12'cause at the time this was the kind
36:13of the main method out there
36:15that aimed to do the same thing we were doing.
36:17So in summary here with a simulation study,
36:21we did really well in critical in terms of accuracy, sorry,
36:26weight parameter estimation,
36:28and even in the risk magnitude parameter estimation.
36:32So models that don't have,
36:34that they don't actually estimate weights are more efficient
36:36when the complexity
36:37or the number of important pollutants grow
36:40and a little bit about the variable selection
36:42that we introduced with these latent variables.
36:45It appeared to do really well again,
36:47as the number of important pollutants was relatively small.
36:51So if you have lots of pollutants that are important
36:54and their interactions are important,
36:56it was hard for the variable section process
36:59to kind of tease out
36:59when something's included or excluded.
37:01It tended to just say everything was included.
37:04So something to keep in mind,
37:06I guess, as a limitation per perhaps of this approach.
37:10All right, so now onto the real data application
37:13that we had,
37:14and this is part of a larger kind of climate change,
37:18heat preterm birth study,
37:21we collected lots of state specific data birth records
37:26for all the way back to 1990 for maybe 12, 14 states.
37:31And so this one was set in New Jersey,
37:34but we focused on stillbirth given
37:36their really strong stillbirth data collection
37:40kind of methodology that New Jersey was using.
37:43So stillbirth the death or loss of a baby,
37:46at least 20 weeks of pregnancy affects about
37:48one in 160 births in the US.
37:51There are some known maternal risk factors,
37:54black mother, 35 years age or more of age,
38:00low SES, smoking, et cetera.
38:03And recent literature review meta analysis suggest that,
38:06PM 2.5 CO2 and O3 are associated with increased risk.
38:11This was really recent,
38:13but that more studies are definitely needed.
38:14There's not a lot as in comparison
38:17to some of the other adverse birth outcomes,
38:18there's not as much done with stillbirth, at least.
38:21However a majority of these previous studies
38:23have focused on again, single pollutant approaches,
38:27wide exposure periods like the entire relevant pregnancy
38:31before the delivery.
38:35So there is a need
38:36for kind of multiple pollutant critical window
38:37methods in this setting.
38:38So this is what kind of made us think about
38:43developing this methodology,
38:44but also applying it in this case study.
38:48So a little bit about the data we had access to.
38:50We had live birth and fetal death records
38:52from New Jersey from 2005 to 14.
38:55We included singletons with gestational age
38:57of at least 20 weeks,
38:58no birth defects, conception date in 25 to 2005 to 2013,
39:05we ran a case control analysis here
39:07where we five link live births were linked
39:10with each stillbirth matching only on race ethnicity.
39:13And we actually ended up running these analysis separately
39:16for each group non-Hispanic black,
39:17non-Hispanic white and Hispanic.
39:19And in terms of what our exposures,
39:22we included weekly pollution exposures
39:24through gestational week 20 were included in this analysis.
39:30All right, a little bit about the pollutants
39:32I mentioned we relied on a data fusion model
39:35that gave us kind of fine scale spatially
39:40and temporally estimates of 12 pollutants across New Jersey
39:46across the US actually, but focusing here on New Jersey.
39:49So you can see the pollutants listed here
39:50and we linked each woman's residence at delivery
39:54with the closest grid be where data were available
39:57or the estimates and predictions were available
39:59and assigned weekly exposures across
40:00the first 20 weeks of gestation.
40:02I know there's always a lot of pushback
40:04in these birth records
40:05because we don't have residential mobility,
40:08we don't have sense of like how often people move.
40:10And we know moving is differentialable
40:13by socioeconomic status for example,
40:15there are a lot of factors
40:16that influence moving during pregnancy,
40:18but if maybe this will make you feel somewhat better,
40:22but we did a study in 2019,
40:25the kind of assess the robustness
40:27of these critical window methods more generally
40:29to lots of different sources of error,
40:32including residential mobility
40:34and the results were actually very promising.
40:36I thought so the findings are robust generally
40:39to kind of this exposure misclassification
40:43or exposure error that's introduced through mobility.
40:47All right, so in summary,
40:49I guess for the data we had around 1300 non-Hispanic black,
40:53stillbirths in this time 928 Hispanic,
40:56and 1100 non-Hispanic white.
40:59our covariates that we included were a year of conception,
41:02season of conception to control for this kind of seasonality
41:05and long term time trends and pollution exposure,
41:09tobacco use indicator, age category, education.
41:13We had this sex of the fetus
41:14and to control for spatial kind of residual correlation.
41:18We actually included latitude, longitude
41:21of the residents had delivery and their interaction term
41:24as a pre-screening
41:26because we had 12 pollutants to work with.
41:28We didn't wanna introduce a lot of noise if possible,
41:30into the new framework.
41:31So we did a pre-run of the original critical window variable
41:35selection on each pollutant individually,
41:37as most analysis would do anyway,
41:40and identified a subset across all
41:43of the different data sets and by different data sets.
41:45I mean the non-Hispanic black,
41:47non-Hispanic white, and Hispanic.
41:49So all of the relevant and kind of significant exposures
41:54that came up and during any exposure period
41:57were included as a subset into this bigger framework.
41:59And so in total, we had PM 2.5 sulfate, nitrogen oxide,
42:03ammonium, and nitrate that kind
42:05of made this pre-screening period into the final subset.
42:10So here is some of the output
42:12that we thought was interesting.
42:14There's a lot of output
42:15that can be shown as you already know.
42:17I guess now there's weight at each exposure period,
42:20there's regression parameters,
42:22there's just a lot that can happen here
42:24and there's interactions, there's main effects,
42:26but first let's focus on the first column here,
42:29and this is at least something we can hold onto
42:31that we understand from previous work in this space.
42:35So what we can see for the non-Hispanic black population
42:40that we were working with in New Jersey during this time,
42:42that elevated exposure,
42:44I'm not gonna say to what yet but elevated exposure
42:46to some combination of these five pollutants
42:50during pregnancy week two,
42:51and then later on in the pregnancy, 16, 17,
42:54and 20 actually led to increased odds.
42:59So these are odds or ratios being presented of excuse me,
43:03of stillbirth.
43:04And so we can kind of take these in and say,
43:07we get a sense of the critical windows
43:09that are identified.
43:10We also get a sense of the variable selection component
43:13that I mentioned
43:14and in this case, they line up pretty perfectly.
43:17These are consistently in the model actually included
43:19in the Bayesian variable selection model,
43:21but also they're when they are in the model they're positive
43:25So there this risk is in the right direction.
43:27So more pollution during these pregnancy windows,
43:32more risk of stillbirth in this population.
43:34Now the question becomes,
43:35well, what are you talking about
43:36when you talk about the exposure?
43:38Like, what is the mixture that you're talking about
43:40in week two, for example?
43:42Because we have five pollutants
43:43and their interactions floating around.
43:45So focusing first, so now let's move to the second column.
43:49This represents the interactions,
43:51this top part and the bottom part represents, I'm sorry,
43:53this is main effects.
43:54And the bottom part represents interactions.
43:57So you can see ammonium is playing a big role throughout
44:01until week 16,
44:02which is dominated sharply by nitrogen oxides.
44:06And then ammonium comes back into play here
44:11in terms of the interactions that are important,
44:12it looks like PM 2.5 and ammonium early on.
44:16And then later on it's nitrogen oxides and ammonium
44:20kind of come into play.
44:21So a lot of this is noise.
44:23I did not show you the variable section component,
44:25but it probably would be nice
44:28to kind of gray these out
44:29if they're not selected in the model.
44:32But a lot of these actually are selected in the model
44:34with our variable selection.
44:35So while these look to be non zero weights,
44:37some of them are actually exactly zero essentially
44:41because of the variable selection component.
44:44But there's so much output,
44:45it's hard to figure out what exactly
44:47to show in a digestible way.
44:48So this is where we landed.
44:50So, interesting results you get to see how
44:52the exposure kind of the mixture transitions
44:55across exposure time,
44:57you get to see what impact that has
44:59on the actual risk of the outcome that you're talking about.
45:03So a nice, I think coherent story can come,
45:06can be told, if you're picturing your own analysis here,
45:10you get to talk about the risk overall
45:12to the mixture kind of combination or profile,
45:14but also then dig deeper into individual weeks
45:16and talk about which ones are important,
45:18which interactions are reporting for example.
45:21For the non-Hispanic white,
45:22there was very little indication
45:25that these pollutants were planning a role,
45:28I guess, in the kind of development of stillbirth
45:30or the risk of stillbirth in this population
45:33and for the Hispanic population,
45:36it looked like there potentially was some uptick here
45:38at the end, but nothing significantly jumped out either.
45:42And so at this point, it's almost...
45:45You don't start to investigate
45:47and over interpret these white parameters,
45:50given that you're not seeing anything here.
45:52So I kind of consider this to be noise essentially
45:57for the Hispanic and non-Hispanic white results for example.
46:01So a little brief kind of wrapping up here,
46:04summary of our findings is that,
46:06for the non-Hispanic black data set
46:08and variable selection results
46:10PM 2.5 and its chemical constituents
46:13are primary drivers of risk.
46:15And this was actually changing across exposure week.
46:17So driven in week two by a lot of interactions
46:21and kind of individual pieces.
46:23Week 16, mainly heavily driven by nitrogen oxides
46:27and then week 17,
46:29one or two pollutants and their interactions.
46:31So all the other interactions
46:33that are not listed here among the five variables
46:36were actually not significantly important here.
46:39So no nothing kind of nothing seen
46:46for the non-Hispanic white and Hispanic populations.
46:50And I guess in conclusion,
46:51we introduce CWVS mix with which combines smooth variable
46:56Bayesian variable selection in the weights
46:58and the regression parameters
46:59with interpretable weighted quantile sum regression
47:02shrinkage to identify critical windows,
47:04but also kind of understand
47:06and kind of dig deeper into the mixture itself.
47:10And importantly, at least from our perspective
47:13is that CWVS mix seemed to offer something
47:17that the existing methods didn't,
47:19which so consistently outperforming these other methods
47:22for identifying the true critical window set,
47:25estimating weight parameters,
47:26which is really important for interpreting the mixtures
47:29and then estimating the risk magnitude parameters as well.
47:32And our stillbirth results from New Jersey
47:34were in qualitative agreement with those in the literature,
47:37in that PM 2.5 consistent signal across many studies
47:42while developing kind of gaining new insights
47:45regarding the exposure timing in this particular study,
47:49obviously more work is needed.
47:50And so I guess before jumping to this,
47:53we were working on extending this framework.
47:57So I'm working with the group at Emory here
48:00on extending this framework to allow the windows themselves
48:03to vary by something like socioeconomic status
48:07or race ethnicity, or other individual level factors.
48:10So there's this effect modification floating around now
48:13plus the mixtures.
48:15So it's becoming a really big task to kind of do all of this
48:19in a single framework,
48:20but we're trying to take baby steps, essentially.
48:23We like where we're at now, we think it works well,
48:25it's robust, it fits well
48:28and can we extend it next to the questions
48:30that are being asked?
48:31So again, if you're someone who is asking similar questions,
48:34please, we can talk.
48:35And I really like enjoy sitting down with collaborators
48:38and trying to figure out,
48:40develop new methods that can answer the questions
48:41that they have.
48:43But if you find that,
48:45your setting can already be answered
48:47by some of these methods that I've discussed today
48:50on my website and on my GitHub site,
48:52I keep a lot of these packages that I've created
48:55with help documentation
48:56and then you are always free to reach out to me as well.
48:59But if you're looking to do this original Gaussian process,
49:03critical window estimation,
49:04we have a package for that.
49:06Howard Chang at Emory, go through my website again,
49:08you'll find this his survival version
49:11of the model up there as well.
49:13CWVS in this original form is there for download
49:16the spatial version,
49:17which hopefully we're thinking about extending in
49:22soon to account for something like oxidative potential
49:25of these pollutants that's also there.
49:28And then the newly developed methodology
49:30is also there for download and for use as well.
49:33And this obviously could not have happened
49:35without collaborators, including Howard at Chang at Emory,
49:38Lauren at RTI did a lot of data management,
49:42Matthew Strickland, and Lindsey
49:45at University of Nevada Reno,
49:47and then James for providing the,
49:50or helping with the data fusion output as well.
49:53And here, this grant support here
49:56that I mentioned in extreme heat duration,
49:58and then data integration methods
49:59for environmental exposures.
50:01So yeah, please feel free to reach out
50:05if you have any questions.
50:06This work that I went over today
50:09is in press at Annals of Applied Statistics,
50:12not on their website yet,
50:12but should be really soon.
50:15But I think there's a version on archive
50:16if you're interested
50:17or if you want the most up to date version.
50:19I actually think I sent it tomorrow
50:20who may have passed it out to the class,
50:22but yeah, definitely feel free to reach out
50:24if there are any questions or anything I can help with.
50:27Yeah, that's it.
50:30<v ->Thank you so much.</v>
50:31(applause)
50:35Our students were impressed with this
50:39heavy quantitative focused lecture.
50:44We already collected some questions
50:45from our students already,
50:47but for folks who are joining online,
50:50if you do have questions,
50:51please feel free to put in the chat box.
50:54So the first question,
50:56one of the students is observing that
50:58in your study, you found the elevator risk
51:01was found in week two of the pregnancy,
51:05which is very early.
51:06So perhaps many pregnant women are not aware
51:11of the pregnancy at that time.
51:13So in terms of the intervention
51:16at this early stage of pregnancy,
51:19what's the kind of policy implications that we'll find?
51:22<v ->Now that's a really great point.</v>
51:24And this is something we've tried to,
51:27we haven't figured out how to deal with either,
51:29but has we've run into a number of interesting results
51:35that we've seen early in the pregnancy.
51:38We've particularly,
51:38we've seen protective effects at some points
51:42for like PM 2.5 exposure and pre-term pregnancy
51:45very early on in the pregnancy.
51:47And we believe it could be due to the exactly
51:51what we're talking about.
51:51People who don't actually know they're pregnant
51:53at that point.
51:54And so miscarriage is an issue
51:57that isn't well kind of documented by a lot of these states.
52:00There could be just fetal loss in general,
52:03that we're not capturing in the birth records.
52:05And so there's this population
52:08that we're not even including in a lot of our analysis
52:12that are lurking around
52:13and kind of could be biasing
52:14some of these early week results.
52:16In terms of policy implications
52:20it's a really good question.
52:21I don't know other than if I guess it really,
52:25if you're trying to get pregnant,
52:27if you know you're on that, in that stage,
52:30I mean, maybe it's helpful for you,
52:31but if you're someone who doesn't know unanticipated
52:35there's only so much that can go into outside
52:40of just cleaner air altogether.
52:42Which is something everyone can kind of agree on.
52:45But I think it may only affect a subset of people
52:48who are either attempting to get pregnant
52:50or kind of really regimented and like,
52:52know their schedule for example.
52:55But there's this whole other issue about people
52:57who aren't in our data set.
52:58That's a really great point
52:59and we have not figured out how to solve that yet.
53:02<v Kai>Yet, tough question.</v>
53:04Thanks, Josh.
53:05We do have another question
53:06from actually two students read this.
53:09They really appreciate your talk about this new metrics.
53:13And we realize this is the package.
53:16Our package is available from your GitHub website.
53:19So anyone who's interested in applying that
53:21you can download the app package and run,
53:25but the students are wondering like
53:26beyond this time wearing air pollution mixtures
53:30a lot other mixtures in terms of (indistinct)
53:33like temperature, green space, other things.
53:36So how does your approach this
53:39the CWVS mix apply to a broader setting
53:43of environment exposures?
53:45<v ->I think, my push, and if you read the paper,</v>
53:48you'll notice that I really push for people
53:50to think about that in their own setting.
53:54Cause I think it's generally applicable to any,
53:57it doesn't have to be a pregnancy outcome.
53:59It doesn't have to be air pollution.
54:01What it does have to be is consistently measured
54:04across some exposure period.
54:06So I'll often get questions that,
54:08I have two time periods measured,
54:10in the first trimester and then in the third trimester,
54:14can I fit your methodology?
54:16Well, we need more fine grained exposure information.
54:19That's consistent across the individuals
54:21in order to estimate these critical windows.
54:23So I think the only barrier for entry
54:25is that you have consistently estimated
54:27kind of exposures for the population of interest.
54:30It doesn't matter so much what the exposure is now.
54:32I say that, but if you're bringing binary exposures
54:36and you have limit of detection issues,
54:38there are obviously some issues
54:40that will need to be sorted out,
54:41but the framework itself should work really well.
54:44The other covariate is, you'll notice that
54:46a lot of my work has been focused on pregnancy outcomes
54:49and that's because the exposure period is so well defined
54:52if you're working with something like cancer for example,
54:55well, how far do you extend back in time,
55:00the exposures like how you could go years and years back.
55:03So there's this cumulative idea as well.
55:07That's really hard to understand
55:08and these distributed lag models are great.
55:10As long as you can a priority tell me
55:13what the relevant exposure period is.
55:14I can tell you if any of the interior parts
55:17of that exposure period are important,
55:19but if you're telling me you don't know
55:21when the exposure period potentially started
55:23or it's a completely different conversation.
55:25So your outcome has to have,
55:27or preferably would have some type
55:29of relevant exposure period.
55:32It's actually even better
55:32for something like cardiac heart defects,
55:35which we know the heart forms between like weeks three
55:37and eight of pregnancy.
55:39So you can really focus in on something like daily
55:41or even sub daily
55:42if you had that type of exposure information.
55:45So yeah, those are the two,
55:46generally it should work,
55:47but just make sure you have a good sense
55:48of the exposure period.
55:52<v Kai>Very good point, thanks Josh.</v>
55:53And we do have one comment from our on artist.
55:57So I read Dr. Warren
55:59could you please share your thought on applying
56:01the critical window analysis?
56:04(mutters)
56:07<v ->Sorry, with what?</v>
56:09(overlapping conversation)
56:10That's a really great point.
56:11So over, so I'm actually on sabbatical right now,
56:14which is why I couldn't be there in person with you guys,
56:17but over the sabbatical
56:19I've developed the framework and the code
56:22to account for binary outcomes,
56:25continuous outcomes and count outcomes as well.
56:28Luckily if you've taken my (indistinct) course
56:31or you're gonna take it next fall,
56:32you'll see how all of these connect
56:34and lend themselves really nicely
56:36to kind of full conditional distribution updates
56:40that make the model fitting process
56:41really kind of slick and nice.
56:44So you can we have a negative binomial regression,
56:47for example, that can do the same thing.
56:49You just have count out outcome data,
56:51if you have a continuous measure, for example,
56:53so I'm really aiming this.
56:55I hope this method doesn't just pop up
56:57and then disappear, I want people to use it,
56:59I want it to be useful.
57:00And so that's why I'm trying to extend it
57:02and trying to get people to use it in different contexts.
57:04So, yeah, definitely I love those types of questions.
57:08<v Kai>Thanks Josh.</v>
57:09Because we actually have another (speech distorted)
57:13So we have to end early
57:14and we do have a lot of students questions
57:17and I'm sure contact you for just once.
57:21So thanks again, Josh for wonderful talk.
57:24<v ->No, yeah thanks for being here.</v>