Skip to Main Content

Climate Change and Health Seminar Series: “Critical Window Variable Selection for Mixtures: Estimating the Impact of Multiple Air Pollutants on Stillbirth”

May 17, 2022
  • 00:00<v ->Lets get started</v>
  • 00:01and thank you everyone for coming today.
  • 00:03And this is will be your final seminar
  • 00:07for this semester for the (indistinct) the house seminar.
  • 00:09And we are very, very pleasant
  • 00:11to have very our own affiliate faculty,
  • 00:16Dr. Josh Warren joining us.
  • 00:19Dr. Warren is a associate professor
  • 00:21at the Biostatistics Department here,
  • 00:24and his research focuses on statistical method
  • 00:28in public health with the emphasis
  • 00:30on environmental health programs,
  • 00:32and much of his work involves introducing spatial
  • 00:36and spatial temporal models in the basin setting
  • 00:39to learn about the association
  • 00:41between environmental exposures,
  • 00:42such as air pollution and various health outcomes,
  • 00:46including the stillbirth that we are here today.
  • 00:50He's also interested in applying and developing
  • 00:52some spatial temper models in collaborative settings,
  • 00:56such as the infectious disease
  • 00:58we been considered during the COVID pandemic.
  • 01:02So without further ado, Josh,
  • 01:04the floor is yours, thank you.
  • 01:06<v ->Thank thank you Kai for the introduction.</v>
  • 01:08Can everyone hear me?
  • 01:10<v Kai>Yes.</v>
  • 01:11<v ->All right, perfect.</v>
  • 01:14And thanks to Kai for the invitation
  • 01:15and Mulholland for setting all of this up
  • 01:17and allowing me to do this virtually.
  • 01:19It's nice to be here talking about something
  • 01:22other than COVID.
  • 01:23And I guess more recently in my past,
  • 01:26I've been doing a lot of infectious disease work,
  • 01:28so it's kind of nice to be back into something
  • 01:30that I'm still passionate about
  • 01:32and still working heavily on.
  • 01:34And so hopefully some of this today
  • 01:36will be a little bit of review of what we've done
  • 01:38and really current project
  • 01:40that we've just completed and published,
  • 01:43but hopefully there are some elements in here
  • 01:45that you can find overlap within your own work.
  • 01:48And so if you have,
  • 01:50if you see something that brings a bell,
  • 01:52just please reach out and we can kind of talk.
  • 01:54My goal and all of this work
  • 01:56is to kind of develop user friendly methods
  • 01:59that are useful for people outside
  • 02:01of statistics and biostatistics.
  • 02:02So the EPI community and at large usually.
  • 02:06So, yeah, just feel free to reach out afterwards,
  • 02:08and I can share more information,
  • 02:10but today we're gonna be talking about
  • 02:12critical window variable selection for mixtures
  • 02:15and particularly air pollution and stillbirth.
  • 02:17So we'll go ahead and jump into it.
  • 02:21I think probably most people here will know air pollution,
  • 02:24reproductive outcomes.
  • 02:25There's a pretty substantial literature at this point
  • 02:29that suggests exposure to ambient air pollution
  • 02:32during pregnancies associated
  • 02:33with a number of adverse birth outcomes,
  • 02:35including preterm pregnancy, low birth weight,
  • 02:38congenital heart defects, stillbirth, and others.
  • 02:42These are some of the main ones.
  • 02:43Stillbirth is a more recently
  • 02:45kind of emerging outcome of study.
  • 02:48Traditionally, it's been pre-term birth
  • 02:49and low birth weight have gotten a lot of attention,
  • 02:52but these associations are stable robust,
  • 02:55and have been observed across a number of different study
  • 02:57settings, designs, pollutants
  • 02:59and there are a number of good review papers.
  • 03:01If you're interested in a lot of the EPI literature
  • 03:03on this topic,
  • 03:06I would kind of summarize previous a number
  • 03:09of the previous EPI studies,
  • 03:10but as they like to use pollution exposures
  • 03:14that are summarized kind of A priorities,
  • 03:18so they wanna focus on a trimester,
  • 03:20they wanna focus on the entire pregnancy,
  • 03:22like, what is the exposure across the entire pregnancy?
  • 03:25What impact does that have with respect to this outcome?
  • 03:27So these are usually pre-specified averaging periods
  • 03:30and they're explored separately
  • 03:33in these different usually kind
  • 03:35of traditional statistical models like logistic regression
  • 03:38or (indistinct) if you're using some kind of count model.
  • 03:41And so lots of different pollutants
  • 03:43are floating around in these analyses,
  • 03:44lots of different averaging periods
  • 03:46in terms of the exposure, relevance exposure period.
  • 03:50Luckily working with pregnancy,
  • 03:52we have a relatively stable idea
  • 03:56of when exposure potentially affects the fetus.
  • 04:02So lots of models floating around lots of pollutants
  • 04:05and exposure weeks,
  • 04:07but this method is inefficient
  • 04:09and doesn't allow for a joint identification
  • 04:11of more kind of specific periods
  • 04:13across the entire pregnancy in a continuous manner.
  • 04:16So more recently there has been a focus on
  • 04:19critical window estimation and identification.
  • 04:22So this is where I have done quite a bit of work, I think,
  • 04:25in this world.
  • 04:27And then even more recently, I would say,
  • 04:29and I know a number of people I work with even here
  • 04:31at Yale pollution mixers are becoming a really big deal.
  • 04:35So in this talk,
  • 04:36we're trying to combine both of these things,
  • 04:38things that we know really well
  • 04:39or that my group knows really well,
  • 04:40critical windows, estimation identification,
  • 04:42and then pollution mixers,
  • 04:43things that we're getting into more and more it seems.
  • 04:48So starting with critical windows of exposure
  • 04:50and exactly what am I talking about
  • 04:52when I'm talking about critical windows?
  • 04:55So there's an increasing interest in identifying
  • 04:57more specific periods of increased vulnerability.
  • 05:00Usually we're thinking about pregnancy,
  • 05:01but this can go for any really health outcome
  • 05:04that you're interested in,
  • 05:06but more vulnerable periods of the pregnancy
  • 05:08to environmental exposures
  • 05:10and doing this within a single modeling framework.
  • 05:12So estimation of these effects,
  • 05:14we're calling critical windows
  • 05:15or windows of susceptibility.
  • 05:17The NIHS included this identification of critical windows
  • 05:21as a part of its strategic goals back in 2012.
  • 05:24And the focus has remained since then.
  • 05:27So understanding like specific timing of exposure
  • 05:31with respect to outcome development
  • 05:33has a number of features but importantly,
  • 05:35it could lead to improve mechanistic explanations
  • 05:38of disease development,
  • 05:40and ultimately focus guidelines for protection
  • 05:42of the unborn child.
  • 05:45So we have, like I mentioned,
  • 05:46we've done a lot of methods work here,
  • 05:50trying to understand variability in these windows
  • 05:53essentially, and how to estimate them appropriately.
  • 05:56So you'll start to see, I show some pictures,
  • 05:58some figures here that the models become really
  • 06:02lots of parameters in these models.
  • 06:03So you, it really becomes an estimation challenge.
  • 06:06Like how do you,
  • 06:07the model makes sense, you can write it down,
  • 06:09but can you actually fit these models?
  • 06:10So we've done these or consider these models
  • 06:14in a number of different settings,
  • 06:15including the space temporal settings,
  • 06:17survival statistics setting, semi parametric,
  • 06:20non-parametric bays with multi-varied outcomes,
  • 06:23and then more recently variable selection.
  • 06:26And so inferences typically carried out
  • 06:28in the Bayesian setting where I do most of my work
  • 06:31due to increased computational flexibility
  • 06:33and importantly incorporation
  • 06:35of stabilizing prior structure.
  • 06:38So not only have these been done on the method side
  • 06:42where a lot of my time is spent,
  • 06:44but I really like seeing them translated
  • 06:46to actual practice too.
  • 06:48So these methods and kind of variants
  • 06:51of these methods have been,
  • 06:53has successfully identified these critical windows
  • 06:56in a number of outcomes and settings
  • 06:58and different populations,
  • 06:59but pre-term birth, low birth weight,
  • 07:02CHDs so across a number of studies now.
  • 07:04So they're getting good traction in other studies.
  • 07:07Well, not just in the stat literature,
  • 07:08which is nice to see.
  • 07:11To give you a more kind of practical view
  • 07:13of what I'm talking about,
  • 07:14this is one of the first studies we published on
  • 07:18way back in 2012.
  • 07:19And this is for Harris County Texas,
  • 07:22home of Houston, Texas.
  • 07:24And on the left two panels,
  • 07:25you'll see output from our newly developed method
  • 07:29on the right two panels,
  • 07:30you'll see output from more of a naive approach
  • 07:32that was that we were considering at the time.
  • 07:35So what we're talking about these critical windows
  • 07:37are exactly what you're seeing.
  • 07:39Maybe you can see my mouse here,
  • 07:41but these periods where these risk ratios
  • 07:45in this case kind of exclude zero
  • 07:49or these risk parameters,
  • 07:50they're not on any particular scale.
  • 07:52That's easily to interpreted in this case, unfortunately,
  • 07:54but this means that elevated exposure
  • 07:57during pregnancy week 10 for example,
  • 07:59leads to an increase in this case,
  • 08:01was preterm birth, a preterm birth risk.
  • 08:04So during your early kind of mid first
  • 08:07and early second trimester pregnancy,
  • 08:09we were noticing some interesting elevated risk to PM 2.5.
  • 08:14And what we've seen across a number of studies now
  • 08:16is that these windows vary by pollutant by outcome
  • 08:20they're very different.
  • 08:22There's lots of variability for ozone for example,
  • 08:24it seemed to be early on in the first trimester.
  • 08:28So this new methodology allows us to kind of hone in
  • 08:31on the signal and reduce some of this noise.
  • 08:35So if you try to basically imagine your data set,
  • 08:38you have lots of pregnant women in your study,
  • 08:41and you have linked with that pollution exposure
  • 08:44for the first 36 weeks of pregnancy.
  • 08:46A really naive thing to do would be,
  • 08:47let's just throw all of those
  • 08:49into a multiple regression model,
  • 08:50some binary regression model,
  • 08:52all at the same time.
  • 08:53Clearly there's going to be correlation across time
  • 08:55because exposure week one looks
  • 08:57like exposure week two, et cetera.
  • 08:58And if you do that, you can expect multicollinearity,
  • 09:01which is jumping around of point estimates,
  • 09:04increased variability,
  • 09:05which is exactly what you see here.
  • 09:07So our new methodology,
  • 09:08which relied on like Gaussian processes
  • 09:11and other smoothing techniques
  • 09:13allowed us to in a data driven way,
  • 09:16kind of tease out signal
  • 09:17that you could almost make out by eye here.
  • 09:19So if you look hard enough,
  • 09:20you can see kind of a similar shape in both cases,
  • 09:24but we were able to see a better shape here.
  • 09:26So this is what we're generally in the past
  • 09:28have been talking about with critical window estimation
  • 09:30and identification.
  • 09:33We mentioned that we worked on the survival outcome,
  • 09:36we started to think about preterm birth
  • 09:38instead of just a binary outcome yes or no.
  • 09:41We wanted to consider it as a survival outcome.
  • 09:43So what's the probability you make it
  • 09:44to week 35 of your pregnancy,
  • 09:46given that you've made it to 34 for example.
  • 09:49So what this opened up was,
  • 09:50well, maybe there are different exposure windows
  • 09:53given different outcome weeks.
  • 09:55So you can think of outcome week on the X axis
  • 09:58on the Y axis here on an exposure week on the Y axis.
  • 10:01So if you gave birth that week 27,
  • 10:04you only had 27 weeks of exposure, for example.
  • 10:07So people were leaving the set as pregnancy happened.
  • 10:10And so we introduced methodology
  • 10:12that not only kind of smoothed in the exposure direction,
  • 10:15but also smooth across the outcome direction.
  • 10:17And so these darker areas indicate weeks
  • 10:21and outcome weeks, exposure weeks and outcome weeks
  • 10:23where elevated exposure more adversely impacts
  • 10:27like the risk of preterm birth in this case.
  • 10:29So there was a distinct difference in this early preterm
  • 10:32and then this late preterm,
  • 10:33which kind of was impacted by exposures later
  • 10:35in the pregnancy.
  • 10:38And so underlying all of these kind of simplified plots
  • 10:42I'm showing you were
  • 10:43these individual outcome week specific critical window plots
  • 10:47that we kind of are more accustomed to interpreting.
  • 10:52So more recently we got into the spacial world noticing
  • 10:57that, well, we started noticing that
  • 10:59when we applied these methods
  • 11:00to different data sets in different areas,
  • 11:03we were seeing different shapes, different windows,
  • 11:05different pollutants, emerging as important.
  • 11:08And so we begin to think,
  • 11:09well, is there spatial variability in even at a local scale?
  • 11:13And so we develop new methodology
  • 11:15that can kind of tease out
  • 11:17not only temporal changes and exposure risk,
  • 11:21but also spatial variability as well.
  • 11:22So there's spatial correlation component here along
  • 11:26with kind of these critical windows floating around as well.
  • 11:29So this was 11 counties in North Carolina,
  • 11:31including Wake County and the county to House Charlotte,
  • 11:35and this was a low birth weight study.
  • 11:37So there's methodology around that can do this.
  • 11:41So we were working on these for a number of years
  • 11:44and we got approached basically with a question,
  • 11:47how are you actually defining
  • 11:49a critical pregnancy window?
  • 11:51And it seemed obvious at first,
  • 11:52but then we started to really question
  • 11:54the assumptions we had been making,
  • 11:56but obviously what we had been doing,
  • 11:58if I go back a few slides here is just looking,
  • 12:01when did these individual week
  • 12:03or time specific parameters exclude the critical value
  • 12:07in zero in this case?
  • 12:08And we were calling that a critical window
  • 12:13but we started to worry that
  • 12:16this might not be getting exactly what we're hoping
  • 12:18is it capturing the true set?
  • 12:20Is this doing a good job?
  • 12:22In particular, we were worried about over smoothing
  • 12:25with something like a Gaussian process
  • 12:27and specifically with the endpoint.
  • 12:29So if you can imagine, I'll go back one more time,
  • 12:32sorry to scroll.
  • 12:34Imagine the end points here and here,
  • 12:37we begin to worry that the over smoothness
  • 12:41could be pulling some of these actually null results
  • 12:45into the critical set or vice versa,
  • 12:48kind of pulling some important ones down to the null set.
  • 12:51So we were very concerned about the endpoints here
  • 12:54when we started working on this more recent work.
  • 12:57So our solution to this
  • 12:59was critical window variable selection.
  • 13:01So we like the smoothness, we like the plots that emerge.
  • 13:04We like how we can interpret these things,
  • 13:06but a variable selection component
  • 13:07would allow us to turn some of these effects off,
  • 13:10even if they appear to be significant in the plots.
  • 13:14And so what this meant is,
  • 13:15we introduced like a bayesian variable selection technique
  • 13:20called critical window variable selection,
  • 13:22where basically you still have the critical window plots
  • 13:25that you know and love, and you know how to interpret,
  • 13:28but underlying each effect now,
  • 13:30you actually have this binary exclusionary,
  • 13:33or inclusion variable
  • 13:34that tells you whether this thing should be included.
  • 13:37This particular weekly effect should be included
  • 13:39in the critical window set.
  • 13:40And what we found is that there are a number of times,
  • 13:44not in this particular real case study in North Carolina,
  • 13:47but through simulation,
  • 13:48we noticed that there were times
  • 13:50when exactly what we had worried was happening
  • 13:53had been happening so effects
  • 13:54near the border here were being pulled into the set,
  • 13:58but luckily they were not being included
  • 14:00in the variable selection component.
  • 14:01So to be in the variable selection set now,
  • 14:05you had to have posterior inclusion probability bigger
  • 14:08than point five, so bigger than this line
  • 14:11and your individual weekly effects
  • 14:13had to be exclude zero with a 95% credible vulnerable.
  • 14:16So with these two kind of definitions we were doing,
  • 14:19we were getting a much better kind of recovering
  • 14:23the true set of critical windows in simulation, at least.
  • 14:27So this really outperformed
  • 14:29what we had been doing previously.
  • 14:30So we've been moving forward
  • 14:32with this variable selection concept since then.
  • 14:36All right, so we like critical window variable selection,
  • 14:38we like a lot of these other methods.
  • 14:39The problem is that as I know,
  • 14:42a number of you are aware,
  • 14:43the literature has really moved towards the science
  • 14:46has moved towards pollution, mixtures
  • 14:48and multiple exposures.
  • 14:50And a lot of these methodologies were developed
  • 14:52with one pollutant in mind at the most two to three,
  • 14:58but they were not generally meant
  • 15:00for pollution mixtures for example.
  • 15:02So our goal in this work
  • 15:04was to extend what we liked the CWVS,
  • 15:07critical and variable selection to accommodate mixtures.
  • 15:10And so when we started to thinking about mixtures,
  • 15:12when you have time varying exposures
  • 15:14and time varying effects,
  • 15:16it became relatively conceptually complicated
  • 15:19because you have lots of parameters floating around.
  • 15:22So we wanted something that could do
  • 15:23like a dimension reduction essentially.
  • 15:25So what we thought is a nice solution,
  • 15:28like in a single pollutant context, or I'm sorry,
  • 15:32in a single exposure time period context
  • 15:34is this weighted quantile sum regression,
  • 15:36which I know a lot of you are familiar with,
  • 15:38'cause I've helped write pieces of grants
  • 15:40that have discussed weighted quantile sum regression here,
  • 15:44but it offers a nice interpretable solution
  • 15:46for estimating the impact of a mixture on an outcome.
  • 15:49And it has this really nice sum to one constraint
  • 15:53on the regression parameters.
  • 15:55And so you get in the end,
  • 15:56you have 20 pollutants for example,
  • 15:58and you get to see the relative contribution
  • 16:01of each of these pollutants in terms of the entire mixture.
  • 16:04So you have these little sum to one between zero
  • 16:06and one probabilities or proportions
  • 16:09that describe the role of individual pollutants.
  • 16:12And then you have this global regression parameter
  • 16:15that describes the impact of that mixture
  • 16:17as defined by those weights on the health outcome.
  • 16:21So it does a little two stage process estimate weights
  • 16:24and then global regression parameter,
  • 16:26not important for this talk.
  • 16:29More recently in 2020, this was extended
  • 16:32to the lag weighted quantile sum regression.
  • 16:35And yeah, it extended WQS to the multiple pollutants setting
  • 16:42in a really, I think of it as a relatively ad hoc solution,
  • 16:47but basically WQS has fit at each exposure week separately.
  • 16:51The weights are estimated,
  • 16:53the mixtures are combined based on those weights.
  • 16:55And then those kind of package mixtures
  • 16:58are thrown into like a distributed live model
  • 17:00to estimate similar curves
  • 17:02is what I've been showing you so far.
  • 17:03So the estimation of the weights
  • 17:05and their relative importance in the mixture
  • 17:07are done separately outside of kind of the estimation
  • 17:11of the regression parameters as well.
  • 17:13So this more, again, more of a two stage approach.
  • 17:16All right, so we like WQS
  • 17:19because of its relative simplicity and its interpretability,
  • 17:22we liked critical and variable selection.
  • 17:24So the goals here were to combine
  • 17:26that estimation identification ability of CWVS
  • 17:29with the interpretability and shrinkage properties
  • 17:31of WQS within a unified modeling framework and extending
  • 17:37oh yeah, so WQS is nice.
  • 17:39It has zero to some to one components
  • 17:42that are between zero and one,
  • 17:44but you don't actually get a sense of variable selection
  • 17:47when doing this.
  • 17:48So none of the weights can exactly equal zero.
  • 17:50We wanted a more sparse solution
  • 17:53and so we introduced also a way
  • 17:54to make these weights exactly zero.
  • 17:57So you can get a better sense of
  • 17:58which pollutants are the main players in the mixture.
  • 18:02And so what we're calling this is CWVS for mixtures
  • 18:06or CWVS mix.
  • 18:09And so some features before we get
  • 18:11into a little bit of the details of the model,
  • 18:13these are like the high,
  • 18:14just if you take nothing else away from like
  • 18:16what this model does, this is,
  • 18:18I think the important slide here is that,
  • 18:20we have main effects and first order interactions
  • 18:23between the pollutants during each exposure period.
  • 18:25So week one of pregnancy,
  • 18:27week two of pregnancy, all of these interactions,
  • 18:30all of these main effects are included.
  • 18:31So there's lots of parameters you can already imagine
  • 18:34are floating around here.
  • 18:35We still hold onto this sum to one mixture weights
  • 18:38at each exposure week separately.
  • 18:41But we want to account for the fact that,
  • 18:43what's happening in exposure week one
  • 18:44may be similar to exposure week two to three to four,
  • 18:48with this correlation dying out as you get further apart
  • 18:50in exposure time.
  • 18:51So we want these weights not to have to be estimated
  • 18:55kind of independently at each exposure week.
  • 18:57We want to enforce some smoothness,
  • 19:00data driven smoothness preferably to estimate these weights.
  • 19:04And as I mentioned, we want these weights
  • 19:06to have a variable selection component.
  • 19:08So we can actually identify individual elements
  • 19:10of the mixture and we still have this global risk parameter,
  • 19:14and this is going to follow the CWVS model
  • 19:17so that we can estimate
  • 19:18these critical windows more accurately.
  • 19:22All right, so the goals of this study
  • 19:24before you jump into some of the methodology here
  • 19:26are to develop CWVS mix.
  • 19:28As I mentioned,
  • 19:30simulation is really important in this world.
  • 19:32I wanna make sure that what we're doing
  • 19:34is not just duplicating other efforts
  • 19:36and that it's actually offering something new,
  • 19:38something helpful to the literature
  • 19:40that we can point to.
  • 19:41I think I know the shortcomings of something like lag,
  • 19:45weighted quantile sum regression,
  • 19:46but until I see it actually happen in simulation
  • 19:49it's just kind of hypothetical.
  • 19:51So finally we wanna investigate the impact
  • 19:53using this new methodology
  • 19:55of multiple ambient air pollutants on stillbirth risk.
  • 19:58And in this case,
  • 19:59we're focusing on New Jersey from 2005 to 2014.
  • 20:03And actually we have really nice output
  • 20:06from a novel data fusion model.
  • 20:08There are lots of data fusion models floating
  • 20:10around right now, but this is a one from 2019,
  • 20:12from our collaborator at Georgia tech and at Emory
  • 20:16that provided 12 pollutants,
  • 20:1812 kilometer grid cell size across the entire US
  • 20:22daily no missing this things like that.
  • 20:26So for these particular pollutants.
  • 20:29All right, so let's talk a little bit about
  • 20:31the model and what it does
  • 20:33and some of the intuitive features
  • 20:34that I think it has and why it might work well.
  • 20:37So yeah, we're starting with some outcome,
  • 20:42it could be some adverse health outcome
  • 20:44like preterm pregnancy or not,
  • 20:46or stillbirth or not some be newly outcome
  • 20:49where this PI describes kind of the probability
  • 20:52that person I experiences this outcome.
  • 20:55We model this probability using logistic regression
  • 20:58as we normally would,
  • 21:00these green I'm kind of trying to different.
  • 21:02I'm trying to keep people's attention
  • 21:04to the parameters
  • 21:05and how I'm mentally grouping them as well.
  • 21:08So these green represent these typical like demographics.
  • 21:12We know there are certain risk factors
  • 21:13for different health outcomes,
  • 21:16particularly pregnancy outcomes being over 35 for example,
  • 21:19with preterm pregnancy, alcohol, smoking, et cetera.
  • 21:23So this would go into this exi transpose data.
  • 21:26This specter here where a lot of our work came in
  • 21:31are on these blue parameters,
  • 21:33which are the weights that I've been talking about.
  • 21:35So these weights, these blue parameters
  • 21:37actually sum to one at each exposure week.
  • 21:42So each exposure week T
  • 21:44we basically have a vector of Lambdas
  • 21:47that are weights between zero and one
  • 21:49could be actually equal to zero exactly.
  • 21:52And they sum to one at each exposure week separately,
  • 21:55you notice their index Byte
  • 21:57because we're allowing the possibility
  • 21:59that the exposure profile changes across the pregnancy.
  • 22:03So it early on in the pregnancy,
  • 22:05maybe the risk is primarily driven by pollutant A
  • 22:10but later on in the pregnancy,
  • 22:12perhaps that shifts.
  • 22:13And so the weights would shift well as well,
  • 22:16but we expect this shift to be smoother
  • 22:19rather than complete choppiness across the exposure weeks.
  • 22:23And so what these weights do are
  • 22:25they kind of multiply here with the main effects
  • 22:29and these first order interactions.
  • 22:31And if you think about taking this sum across
  • 22:33main effects and interactions,
  • 22:34you have this package of weighted exposure essentially.
  • 22:38And the alpha here tells us whether at exposure period T
  • 22:42this package has any impact on
  • 22:44your ultimate probability of developing the outcome.
  • 22:48So we have this nice sense of the weights,
  • 22:50help us describe what's happening with the mixture profiles.
  • 22:53And, but the alpha keeps us honest
  • 22:55and keeps us able to say,
  • 22:57well, you know, this mixture's interesting,
  • 23:00but it has no impact on the health outcome of interest here.
  • 23:06So how do we do these mixture weights?
  • 23:09As I mentioned, two features that we're interested in
  • 23:11the ability to actually equal zero
  • 23:14and smoothness across time.
  • 23:15And so first point is to,
  • 23:19well, we introduce these latent weight parameters
  • 23:22that I'm calling Lambda star,
  • 23:23not to don't get too caught up in them.
  • 23:25Basically they're continuously varying parameters
  • 23:28that as soon as they cross the zero threshold,
  • 23:31they turn on in our model.
  • 23:33So that's what this maximum is doing.
  • 23:35So they turn on and they give you some weight
  • 23:37and then as soon as they cross into negative territory,
  • 23:40they go to zero.
  • 23:40So this is how we're getting actual zeros in these weights.
  • 23:43So the Lambdas and the Lambda Tilda
  • 23:45can actually equal zero
  • 23:47based on these underlying latent weight parameters.
  • 23:52All right, so we keep them summing to one
  • 23:55by dividing by the sum of the numerator, essentially.
  • 23:58So whatever weights are positive gets summed
  • 24:00and we're dividing by,
  • 24:01we're basically self kind of correcting here
  • 24:04so that the weights always come to one,
  • 24:07these weights combined.
  • 24:08For the interactions, we don't want the case.
  • 24:12We prefer sparse model,
  • 24:14particularly as the number of pollutants get really large.
  • 24:17So the number of interactions will grow.
  • 24:19So what we want is our interactions
  • 24:21that are only turned on essentially
  • 24:24when the main effects are turned on.
  • 24:25So you can see these two indicators I've added
  • 24:27basically say if the main effect themselves
  • 24:30aren't both turned on,
  • 24:31this interaction effect gets zeroed out already.
  • 24:34So the interaction has a kind of a higher bar clear
  • 24:37this strict hierarchy basically
  • 24:40where both main effects have to be on
  • 24:43and the interaction latent variable has to be on.
  • 24:46So there's the zero component now,
  • 24:48how do we do smoothness across time?
  • 24:50Well, it's all about this correlation structure.
  • 24:52So these latent Lambda star parameters
  • 24:55that control the weights are actually modeled
  • 24:57as a multi Gaussian process.
  • 24:59And I think the key thing to focus on here is that
  • 25:01there's this underlying correlation structure
  • 25:04that tells us as two exposure time points get further apart.
  • 25:09This exponential of a negative number
  • 25:11will get closer to zero.
  • 25:12So correlation dies out as exposure time gets further apart
  • 25:16now, as they get closer together,
  • 25:17this correlation is gonna be higher.
  • 25:19And the main parameter that controls this level
  • 25:22of correlation is this fee parameter.
  • 25:23And we actually put prior distributions on this
  • 25:27to allow the data, to drive the inference,
  • 25:29rather than like our view of what we expect
  • 25:32this smoothness to look like.
  • 25:33So yeah, this is data driven
  • 25:35kind of smoothness across exposure time.
  • 25:39All right, so now, so we've got the weights handled
  • 25:42they have both properties that we care about.
  • 25:43Now let's talk about the mixture impact itself.
  • 25:46So this alpha recall tells us whether the mixture
  • 25:48that we observe at time point T
  • 25:50or that we estimated exposure time point T
  • 25:53is actually relevant to the health outcome.
  • 25:55So we want, again,
  • 25:57we want this variable selection here
  • 25:59because we've noticed the problem with the end points
  • 26:01that I described earlier.
  • 26:02So to do this, we decompose this effect into two pieces,
  • 26:06a continuously varying piece.
  • 26:08And then this binary piece
  • 26:09that I mentioned earlier on in the talk,
  • 26:11the binary piece are just independent
  • 26:13but newly random variables.
  • 26:15But we imagine that if you're in the critical window set
  • 26:20at time one, then you may be in it at time two
  • 26:22and may be more likely to be in at time three.
  • 26:25So there may be some sense of correlation
  • 26:27across exposure time here as well.
  • 26:29So while we model these things as independent,
  • 26:31the probabilities that underlie these zero
  • 26:34and one variables are actually smoothly varying
  • 26:37and correlated across time.
  • 26:39So again,
  • 26:40we use this kind of exponential correlation structure.
  • 26:43We allow for cross correlation between the continuous
  • 26:46and the binary piece.
  • 26:48Not important to get into here,
  • 26:49you can kind of read back over.
  • 26:52I can share a paper with you if you want to,
  • 26:54or talk more about it offline,
  • 26:56but essentially there's some cross correlation
  • 26:58there's correlation across time,
  • 26:59but this allows for smoothness in the effects
  • 27:02and the kind of the regression parameter effects
  • 27:04that we've been looking at,
  • 27:05but also in the variable selection as well.
  • 27:08And these, both of these things come together to kind
  • 27:11of define the critical window variable selection model.
  • 27:14To finish the model recall everything's in the base setting
  • 27:17so really weekly informative prior distributions
  • 27:21kind of standard prior distributions when possible,
  • 27:24nothing too interesting here.
  • 27:26So the model you may be looking at on this previous slide
  • 27:29and thinking there's a lot of parameters floating
  • 27:31around here.
  • 27:32There's a lot of output that you're going to be estimating.
  • 27:35So how do you make sense of this as a practitioner,
  • 27:38someone who actually wants to know if a mixture
  • 27:40is having an impact on your health?
  • 27:42Well, luckily we still have relatively nice
  • 27:46and estimable kind of effects here,
  • 27:50associations that we can talk about.
  • 27:52So for example,
  • 27:53for a change in the log odds for a one unit increase
  • 27:55in each pollutant during a particular exposure period,
  • 27:59this would be the quantity
  • 28:01that you would make (indistinct).
  • 28:02You would exponentiate this,
  • 28:03and you would have like an odd ratio, for example,
  • 28:05now recall for any model that includes interactions.
  • 28:08The interpretation is always increasingly complicated
  • 28:12because it matters where you start
  • 28:14when you have interactions.
  • 28:15So if you're already at a high level,
  • 28:18so the values themselves of exposure have to come into play,
  • 28:22but nonetheless, you can still get nice quantities
  • 28:25to estimate in the end.
  • 28:27And if you're only interested in what happens
  • 28:29if pollutant A increased during exposure period T
  • 28:32you can write down actually
  • 28:34what that looks like as well.
  • 28:35So you can estimate both of these things relatively easily
  • 28:38from our output, from our model.
  • 28:41Alright, so we have a model
  • 28:42that kind of checked all the boxes,
  • 28:44at least in my head when I was writing it down
  • 28:46and we can, I tested it, we can fit it,
  • 28:49it seems to work and that it's converging
  • 28:53and it's producing things that look reasonable,
  • 28:56but the simulation study really allows us to dig deeper
  • 28:59and say, is there anything, this it's obviously new,
  • 29:02but is there anything beneficial to what we're doing?
  • 29:04Or should we just be doing something simpler
  • 29:06that already exists?
  • 29:08So we wanted particularly to ask,
  • 29:10how does CWVS mix compared
  • 29:13to some of these existing approaches
  • 29:15for three different factors that we're interested in?
  • 29:17So first identifying the true critical window set,
  • 29:20obviously probably the most important part
  • 29:22of critical window research here is like,
  • 29:25let's get the critical window set right
  • 29:28when we're estimating and identifying these parameters.
  • 29:31But obviously when you're talking about mixtures,
  • 29:33we also care about these weights.
  • 29:35We want to know that the mixture profile we're looking at
  • 29:38on a certain exposure period actually is,
  • 29:43reflective of the true mixture profile
  • 29:45that makes sense here.
  • 29:46So how well do we do at estimating these Lambdas
  • 29:49and Lambdas Tilda parameters
  • 29:52that describe the effects of main effects and interactions,
  • 29:55and then finally,
  • 29:56how well do we do it at estimating the magnitude of risk,
  • 30:00these alpha T parameters.
  • 30:02We wanna make sure we're getting these right as well.
  • 30:04And as a side issue, I guess,
  • 30:06just more of our curiosity,
  • 30:07how well does this variable selection process work
  • 30:10for the weights that we've introduced?
  • 30:13So now we need to think about
  • 30:15what are competing methods in this space.
  • 30:17There aren't a lot of methods out there
  • 30:19that aim to estimate critical windows with.
  • 30:24So time bearing exposures and multiple pollutants
  • 30:27and the ones that are out there
  • 30:30give different enough output
  • 30:31that's hard to compare one model to the next,
  • 30:34but here are three approaches that we kind of came up with.
  • 30:38One is the most naive kind of
  • 30:40where I would always start
  • 30:42as a practitioner with a new data set,
  • 30:44this equal weights approach.
  • 30:45So maybe just averaging all of the exposures for a person
  • 30:49on a given exposure week and including that average
  • 30:55and the interactions with the other exposure periods
  • 30:59in a framework, a distributed lag framework.
  • 31:04So yeah, this is called equal weights or EW.
  • 31:08A PCA approach also makes sense.
  • 31:10So let's allow the data to determine
  • 31:11the correct weights of these Lambdas,
  • 31:15but let's focus it only on the exposure period,
  • 31:18only the exposure data.
  • 31:19So at each exposure period,
  • 31:20fit a PCA to the person specific exposures
  • 31:24and generate these weights.
  • 31:27That kind of describe the relative contribution
  • 31:29of the different interactions and main effects in a mixture,
  • 31:34and then weight the mixtures in that way
  • 31:36and throw that weighted value
  • 31:38into the distributed regression model.
  • 31:41So for all of these methods,
  • 31:42we're using the original CWVS,
  • 31:44so that we're comparable so that the method
  • 31:47so that the results are actually comparable across.
  • 31:49And that the only thing that is changing essentially
  • 31:52is how we define the weights.
  • 31:54And then finally, the most sophisticated approach
  • 31:56at that time was this lag,
  • 31:57weighted quantal sum regression
  • 31:58that we talked a little bit about
  • 32:01where we applied weighted quantal sum regression
  • 32:03separately to each exposure period,
  • 32:06let that estimate the weights,
  • 32:07create the little package of exposure,
  • 32:09and then throw those packages
  • 32:10into the regression model using CWVS.
  • 32:13So once you have the weights,
  • 32:15like once you condition on the weights
  • 32:17and you know the weights,
  • 32:18you basically have one exposure
  • 32:19and that exposure is the package,
  • 32:22the mixture package that you've made.
  • 32:24So the model,
  • 32:25the modeling becomes much simpler in that case.
  • 32:29So how did we go about to test these different methods?
  • 32:35Well, we started very simply.
  • 32:36So these represent the weights cross exposure period.
  • 32:41In this case, I'm pretending like there's only five weeks
  • 32:43in the exposure set.
  • 32:45In reality, I let that vary for each data set
  • 32:47the length and the start time of the exposure window changed
  • 32:51but for this case,
  • 32:52we assumed it started at pregnancy week one
  • 32:55and went to week five.
  • 32:56And so in the simplest case,
  • 32:57we had just assumed there was one pollutant at play
  • 33:00and it stayed constant across the exposure period.
  • 33:02This is really simple.
  • 33:03One pollutant is driving the entire risk that we're seeing.
  • 33:08In another setting, we assumed that there were two,
  • 33:11but there was no changes over time.
  • 33:13They were always static across time
  • 33:15and three, there were three that were coming into play
  • 33:18at four, four, and then five, five of them,
  • 33:21obviously as more come online
  • 33:23and become important players in the mixture.
  • 33:26The weights generally go down
  • 33:27because all of lots of these have to be non zero.
  • 33:31In setting B,
  • 33:32we wanted to allow for some variability
  • 33:34among the important pollutants.
  • 33:35So we still allow for the same important pollutants
  • 33:38to be important at each exposure period,
  • 33:42but we allowed their relative contribution
  • 33:43to change across time.
  • 33:44So early on in pregnancy, this one was important,
  • 33:47but then it's contribution went down
  • 33:49and it was kind of surpassed by number two here
  • 33:53at pollutant two,
  • 33:54and then they can keep swapping in and out
  • 33:56across the exposure.
  • 33:57And in setting C it was complete chaos essentially
  • 34:02different pollutants could come online
  • 34:04and then leave and become important
  • 34:05or not important go to zero.
  • 34:07We don't anticipate this would ever,
  • 34:10or this would be the case,
  • 34:11but it would be nice to know if our model
  • 34:13can somehow collapse and kind of accommodate this reckless,
  • 34:17this wild behavior, I guess.
  • 34:20So, yeah,
  • 34:21this is something that kind of testing the extreme
  • 34:23of all these methods is what we were trying to do here.
  • 34:27So we'll jump right into the results.
  • 34:28Just to give you a sense of what happened
  • 34:31when we tested these models
  • 34:32with lots of simulated data sets,
  • 34:34CWVS mix continuously and kind of consistently
  • 34:40was able to get the critical windows set
  • 34:43more accurately than the other methods,
  • 34:45which struggled kind of in varying degrees
  • 34:48across these different settings,
  • 34:50in terms of estimating the weight parameters.
  • 34:54There's a generally CWVS mix has a lower means scored error
  • 34:59so it's doing a better job of estimating these parameters,
  • 35:02as you would expect, like with equal weights,
  • 35:04if you assume each weight,
  • 35:05each pollutant and interaction is playing
  • 35:09an equal part in the story,
  • 35:10you can be very bad off a lot of times,
  • 35:13which is given, which is why these weights
  • 35:16these values are so high for some of these methods.
  • 35:20And finally,
  • 35:21with the estimation of the regression parameters
  • 35:23that describe the magnitude of risk.
  • 35:26Generally, we're seeing improved performance with CWVS mix,
  • 35:31but interestingly,
  • 35:32at least at the time when we first saw this
  • 35:34is that the equal weights method does a pretty good job
  • 35:38of estimating these risk magnitude parameters
  • 35:43as the number of important pollutants increases.
  • 35:46So if you tell me that every one of your pollutants
  • 35:48are important,
  • 35:49then it's going to be hard to beat that something
  • 35:53that gives all of the pollutants equal weight.
  • 35:55So that's kind of the intuition behind it.
  • 35:57As more pollutants become important,
  • 35:58giving everything equal weight is not such a bad ideas,
  • 36:01almost it's just averaging away some of that error,
  • 36:04but generally, we're still doing well.
  • 36:07And specifically in comparison
  • 36:09to the lag weight quantile sum regression,
  • 36:10that's really importantly,
  • 36:12'cause at the time this was the kind
  • 36:13of the main method out there
  • 36:15that aimed to do the same thing we were doing.
  • 36:17So in summary here with a simulation study,
  • 36:21we did really well in critical in terms of accuracy, sorry,
  • 36:26weight parameter estimation,
  • 36:28and even in the risk magnitude parameter estimation.
  • 36:32So models that don't have,
  • 36:34that they don't actually estimate weights are more efficient
  • 36:36when the complexity
  • 36:37or the number of important pollutants grow
  • 36:40and a little bit about the variable selection
  • 36:42that we introduced with these latent variables.
  • 36:45It appeared to do really well again,
  • 36:47as the number of important pollutants was relatively small.
  • 36:51So if you have lots of pollutants that are important
  • 36:54and their interactions are important,
  • 36:56it was hard for the variable section process
  • 36:59to kind of tease out
  • 36:59when something's included or excluded.
  • 37:01It tended to just say everything was included.
  • 37:04So something to keep in mind,
  • 37:06I guess, as a limitation per perhaps of this approach.
  • 37:10All right, so now onto the real data application
  • 37:13that we had,
  • 37:14and this is part of a larger kind of climate change,
  • 37:18heat preterm birth study,
  • 37:21we collected lots of state specific data birth records
  • 37:26for all the way back to 1990 for maybe 12, 14 states.
  • 37:31And so this one was set in New Jersey,
  • 37:34but we focused on stillbirth given
  • 37:36their really strong stillbirth data collection
  • 37:40kind of methodology that New Jersey was using.
  • 37:43So stillbirth the death or loss of a baby,
  • 37:46at least 20 weeks of pregnancy affects about
  • 37:48one in 160 births in the US.
  • 37:51There are some known maternal risk factors,
  • 37:54black mother, 35 years age or more of age,
  • 38:00low SES, smoking, et cetera.
  • 38:03And recent literature review meta analysis suggest that,
  • 38:06PM 2.5 CO2 and O3 are associated with increased risk.
  • 38:11This was really recent,
  • 38:13but that more studies are definitely needed.
  • 38:14There's not a lot as in comparison
  • 38:17to some of the other adverse birth outcomes,
  • 38:18there's not as much done with stillbirth, at least.
  • 38:21However a majority of these previous studies
  • 38:23have focused on again, single pollutant approaches,
  • 38:27wide exposure periods like the entire relevant pregnancy
  • 38:31before the delivery.
  • 38:35So there is a need
  • 38:36for kind of multiple pollutant critical window
  • 38:37methods in this setting.
  • 38:38So this is what kind of made us think about
  • 38:43developing this methodology,
  • 38:44but also applying it in this case study.
  • 38:48So a little bit about the data we had access to.
  • 38:50We had live birth and fetal death records
  • 38:52from New Jersey from 2005 to 14.
  • 38:55We included singletons with gestational age
  • 38:57of at least 20 weeks,
  • 38:58no birth defects, conception date in 25 to 2005 to 2013,
  • 39:05we ran a case control analysis here
  • 39:07where we five link live births were linked
  • 39:10with each stillbirth matching only on race ethnicity.
  • 39:13And we actually ended up running these analysis separately
  • 39:16for each group non-Hispanic black,
  • 39:17non-Hispanic white and Hispanic.
  • 39:19And in terms of what our exposures,
  • 39:22we included weekly pollution exposures
  • 39:24through gestational week 20 were included in this analysis.
  • 39:30All right, a little bit about the pollutants
  • 39:32I mentioned we relied on a data fusion model
  • 39:35that gave us kind of fine scale spatially
  • 39:40and temporally estimates of 12 pollutants across New Jersey
  • 39:46across the US actually, but focusing here on New Jersey.
  • 39:49So you can see the pollutants listed here
  • 39:50and we linked each woman's residence at delivery
  • 39:54with the closest grid be where data were available
  • 39:57or the estimates and predictions were available
  • 39:59and assigned weekly exposures across
  • 40:00the first 20 weeks of gestation.
  • 40:02I know there's always a lot of pushback
  • 40:04in these birth records
  • 40:05because we don't have residential mobility,
  • 40:08we don't have sense of like how often people move.
  • 40:10And we know moving is differentialable
  • 40:13by socioeconomic status for example,
  • 40:15there are a lot of factors
  • 40:16that influence moving during pregnancy,
  • 40:18but if maybe this will make you feel somewhat better,
  • 40:22but we did a study in 2019,
  • 40:25the kind of assess the robustness
  • 40:27of these critical window methods more generally
  • 40:29to lots of different sources of error,
  • 40:32including residential mobility
  • 40:34and the results were actually very promising.
  • 40:36I thought so the findings are robust generally
  • 40:39to kind of this exposure misclassification
  • 40:43or exposure error that's introduced through mobility.
  • 40:47All right, so in summary,
  • 40:49I guess for the data we had around 1300 non-Hispanic black,
  • 40:53stillbirths in this time 928 Hispanic,
  • 40:56and 1100 non-Hispanic white.
  • 40:59our covariates that we included were a year of conception,
  • 41:02season of conception to control for this kind of seasonality
  • 41:05and long term time trends and pollution exposure,
  • 41:09tobacco use indicator, age category, education.
  • 41:13We had this sex of the fetus
  • 41:14and to control for spatial kind of residual correlation.
  • 41:18We actually included latitude, longitude
  • 41:21of the residents had delivery and their interaction term
  • 41:24as a pre-screening
  • 41:26because we had 12 pollutants to work with.
  • 41:28We didn't wanna introduce a lot of noise if possible,
  • 41:30into the new framework.
  • 41:31So we did a pre-run of the original critical window variable
  • 41:35selection on each pollutant individually,
  • 41:37as most analysis would do anyway,
  • 41:40and identified a subset across all
  • 41:43of the different data sets and by different data sets.
  • 41:45I mean the non-Hispanic black,
  • 41:47non-Hispanic white, and Hispanic.
  • 41:49So all of the relevant and kind of significant exposures
  • 41:54that came up and during any exposure period
  • 41:57were included as a subset into this bigger framework.
  • 41:59And so in total, we had PM 2.5 sulfate, nitrogen oxide,
  • 42:03ammonium, and nitrate that kind
  • 42:05of made this pre-screening period into the final subset.
  • 42:10So here is some of the output
  • 42:12that we thought was interesting.
  • 42:14There's a lot of output
  • 42:15that can be shown as you already know.
  • 42:17I guess now there's weight at each exposure period,
  • 42:20there's regression parameters,
  • 42:22there's just a lot that can happen here
  • 42:24and there's interactions, there's main effects,
  • 42:26but first let's focus on the first column here,
  • 42:29and this is at least something we can hold onto
  • 42:31that we understand from previous work in this space.
  • 42:35So what we can see for the non-Hispanic black population
  • 42:40that we were working with in New Jersey during this time,
  • 42:42that elevated exposure,
  • 42:44I'm not gonna say to what yet but elevated exposure
  • 42:46to some combination of these five pollutants
  • 42:50during pregnancy week two,
  • 42:51and then later on in the pregnancy, 16, 17,
  • 42:54and 20 actually led to increased odds.
  • 42:59So these are odds or ratios being presented of excuse me,
  • 43:03of stillbirth.
  • 43:04And so we can kind of take these in and say,
  • 43:07we get a sense of the critical windows
  • 43:09that are identified.
  • 43:10We also get a sense of the variable selection component
  • 43:13that I mentioned
  • 43:14and in this case, they line up pretty perfectly.
  • 43:17These are consistently in the model actually included
  • 43:19in the Bayesian variable selection model,
  • 43:21but also they're when they are in the model they're positive
  • 43:25So there this risk is in the right direction.
  • 43:27So more pollution during these pregnancy windows,
  • 43:32more risk of stillbirth in this population.
  • 43:34Now the question becomes,
  • 43:35well, what are you talking about
  • 43:36when you talk about the exposure?
  • 43:38Like, what is the mixture that you're talking about
  • 43:40in week two, for example?
  • 43:42Because we have five pollutants
  • 43:43and their interactions floating around.
  • 43:45So focusing first, so now let's move to the second column.
  • 43:49This represents the interactions,
  • 43:51this top part and the bottom part represents, I'm sorry,
  • 43:53this is main effects.
  • 43:54And the bottom part represents interactions.
  • 43:57So you can see ammonium is playing a big role throughout
  • 44:01until week 16,
  • 44:02which is dominated sharply by nitrogen oxides.
  • 44:06And then ammonium comes back into play here
  • 44:11in terms of the interactions that are important,
  • 44:12it looks like PM 2.5 and ammonium early on.
  • 44:16And then later on it's nitrogen oxides and ammonium
  • 44:20kind of come into play.
  • 44:21So a lot of this is noise.
  • 44:23I did not show you the variable section component,
  • 44:25but it probably would be nice
  • 44:28to kind of gray these out
  • 44:29if they're not selected in the model.
  • 44:32But a lot of these actually are selected in the model
  • 44:34with our variable selection.
  • 44:35So while these look to be non zero weights,
  • 44:37some of them are actually exactly zero essentially
  • 44:41because of the variable selection component.
  • 44:44But there's so much output,
  • 44:45it's hard to figure out what exactly
  • 44:47to show in a digestible way.
  • 44:48So this is where we landed.
  • 44:50So, interesting results you get to see how
  • 44:52the exposure kind of the mixture transitions
  • 44:55across exposure time,
  • 44:57you get to see what impact that has
  • 44:59on the actual risk of the outcome that you're talking about.
  • 45:03So a nice, I think coherent story can come,
  • 45:06can be told, if you're picturing your own analysis here,
  • 45:10you get to talk about the risk overall
  • 45:12to the mixture kind of combination or profile,
  • 45:14but also then dig deeper into individual weeks
  • 45:16and talk about which ones are important,
  • 45:18which interactions are reporting for example.
  • 45:21For the non-Hispanic white,
  • 45:22there was very little indication
  • 45:25that these pollutants were planning a role,
  • 45:28I guess, in the kind of development of stillbirth
  • 45:30or the risk of stillbirth in this population
  • 45:33and for the Hispanic population,
  • 45:36it looked like there potentially was some uptick here
  • 45:38at the end, but nothing significantly jumped out either.
  • 45:42And so at this point, it's almost...
  • 45:45You don't start to investigate
  • 45:47and over interpret these white parameters,
  • 45:50given that you're not seeing anything here.
  • 45:52So I kind of consider this to be noise essentially
  • 45:57for the Hispanic and non-Hispanic white results for example.
  • 46:01So a little brief kind of wrapping up here,
  • 46:04summary of our findings is that,
  • 46:06for the non-Hispanic black data set
  • 46:08and variable selection results
  • 46:10PM 2.5 and its chemical constituents
  • 46:13are primary drivers of risk.
  • 46:15And this was actually changing across exposure week.
  • 46:17So driven in week two by a lot of interactions
  • 46:21and kind of individual pieces.
  • 46:23Week 16, mainly heavily driven by nitrogen oxides
  • 46:27and then week 17,
  • 46:29one or two pollutants and their interactions.
  • 46:31So all the other interactions
  • 46:33that are not listed here among the five variables
  • 46:36were actually not significantly important here.
  • 46:39So no nothing kind of nothing seen
  • 46:46for the non-Hispanic white and Hispanic populations.
  • 46:50And I guess in conclusion,
  • 46:51we introduce CWVS mix with which combines smooth variable
  • 46:56Bayesian variable selection in the weights
  • 46:58and the regression parameters
  • 46:59with interpretable weighted quantile sum regression
  • 47:02shrinkage to identify critical windows,
  • 47:04but also kind of understand
  • 47:06and kind of dig deeper into the mixture itself.
  • 47:10And importantly, at least from our perspective
  • 47:13is that CWVS mix seemed to offer something
  • 47:17that the existing methods didn't,
  • 47:19which so consistently outperforming these other methods
  • 47:22for identifying the true critical window set,
  • 47:25estimating weight parameters,
  • 47:26which is really important for interpreting the mixtures
  • 47:29and then estimating the risk magnitude parameters as well.
  • 47:32And our stillbirth results from New Jersey
  • 47:34were in qualitative agreement with those in the literature,
  • 47:37in that PM 2.5 consistent signal across many studies
  • 47:42while developing kind of gaining new insights
  • 47:45regarding the exposure timing in this particular study,
  • 47:49obviously more work is needed.
  • 47:50And so I guess before jumping to this,
  • 47:53we were working on extending this framework.
  • 47:57So I'm working with the group at Emory here
  • 48:00on extending this framework to allow the windows themselves
  • 48:03to vary by something like socioeconomic status
  • 48:07or race ethnicity, or other individual level factors.
  • 48:10So there's this effect modification floating around now
  • 48:13plus the mixtures.
  • 48:15So it's becoming a really big task to kind of do all of this
  • 48:19in a single framework,
  • 48:20but we're trying to take baby steps, essentially.
  • 48:23We like where we're at now, we think it works well,
  • 48:25it's robust, it fits well
  • 48:28and can we extend it next to the questions
  • 48:30that are being asked?
  • 48:31So again, if you're someone who is asking similar questions,
  • 48:34please, we can talk.
  • 48:35And I really like enjoy sitting down with collaborators
  • 48:38and trying to figure out,
  • 48:40develop new methods that can answer the questions
  • 48:41that they have.
  • 48:43But if you find that,
  • 48:45your setting can already be answered
  • 48:47by some of these methods that I've discussed today
  • 48:50on my website and on my GitHub site,
  • 48:52I keep a lot of these packages that I've created
  • 48:55with help documentation
  • 48:56and then you are always free to reach out to me as well.
  • 48:59But if you're looking to do this original Gaussian process,
  • 49:03critical window estimation,
  • 49:04we have a package for that.
  • 49:06Howard Chang at Emory, go through my website again,
  • 49:08you'll find this his survival version
  • 49:11of the model up there as well.
  • 49:13CWVS in this original form is there for download
  • 49:16the spatial version,
  • 49:17which hopefully we're thinking about extending in
  • 49:22soon to account for something like oxidative potential
  • 49:25of these pollutants that's also there.
  • 49:28And then the newly developed methodology
  • 49:30is also there for download and for use as well.
  • 49:33And this obviously could not have happened
  • 49:35without collaborators, including Howard at Chang at Emory,
  • 49:38Lauren at RTI did a lot of data management,
  • 49:42Matthew Strickland, and Lindsey
  • 49:45at University of Nevada Reno,
  • 49:47and then James for providing the,
  • 49:50or helping with the data fusion output as well.
  • 49:53And here, this grant support here
  • 49:56that I mentioned in extreme heat duration,
  • 49:58and then data integration methods
  • 49:59for environmental exposures.
  • 50:01So yeah, please feel free to reach out
  • 50:05if you have any questions.
  • 50:06This work that I went over today
  • 50:09is in press at Annals of Applied Statistics,
  • 50:12not on their website yet,
  • 50:12but should be really soon.
  • 50:15But I think there's a version on archive
  • 50:16if you're interested
  • 50:17or if you want the most up to date version.
  • 50:19I actually think I sent it tomorrow
  • 50:20who may have passed it out to the class,
  • 50:22but yeah, definitely feel free to reach out
  • 50:24if there are any questions or anything I can help with.
  • 50:27Yeah, that's it.
  • 50:30<v ->Thank you so much.</v>
  • 50:31(applause)
  • 50:35Our students were impressed with this
  • 50:39heavy quantitative focused lecture.
  • 50:44We already collected some questions
  • 50:45from our students already,
  • 50:47but for folks who are joining online,
  • 50:50if you do have questions,
  • 50:51please feel free to put in the chat box.
  • 50:54So the first question,
  • 50:56one of the students is observing that
  • 50:58in your study, you found the elevator risk
  • 51:01was found in week two of the pregnancy,
  • 51:05which is very early.
  • 51:06So perhaps many pregnant women are not aware
  • 51:11of the pregnancy at that time.
  • 51:13So in terms of the intervention
  • 51:16at this early stage of pregnancy,
  • 51:19what's the kind of policy implications that we'll find?
  • 51:22<v ->Now that's a really great point.</v>
  • 51:24And this is something we've tried to,
  • 51:27we haven't figured out how to deal with either,
  • 51:29but has we've run into a number of interesting results
  • 51:35that we've seen early in the pregnancy.
  • 51:38We've particularly,
  • 51:38we've seen protective effects at some points
  • 51:42for like PM 2.5 exposure and pre-term pregnancy
  • 51:45very early on in the pregnancy.
  • 51:47And we believe it could be due to the exactly
  • 51:51what we're talking about.
  • 51:51People who don't actually know they're pregnant
  • 51:53at that point.
  • 51:54And so miscarriage is an issue
  • 51:57that isn't well kind of documented by a lot of these states.
  • 52:00There could be just fetal loss in general,
  • 52:03that we're not capturing in the birth records.
  • 52:05And so there's this population
  • 52:08that we're not even including in a lot of our analysis
  • 52:12that are lurking around
  • 52:13and kind of could be biasing
  • 52:14some of these early week results.
  • 52:16In terms of policy implications
  • 52:20it's a really good question.
  • 52:21I don't know other than if I guess it really,
  • 52:25if you're trying to get pregnant,
  • 52:27if you know you're on that, in that stage,
  • 52:30I mean, maybe it's helpful for you,
  • 52:31but if you're someone who doesn't know unanticipated
  • 52:35there's only so much that can go into outside
  • 52:40of just cleaner air altogether.
  • 52:42Which is something everyone can kind of agree on.
  • 52:45But I think it may only affect a subset of people
  • 52:48who are either attempting to get pregnant
  • 52:50or kind of really regimented and like,
  • 52:52know their schedule for example.
  • 52:55But there's this whole other issue about people
  • 52:57who aren't in our data set.
  • 52:58That's a really great point
  • 52:59and we have not figured out how to solve that yet.
  • 53:02<v Kai>Yet, tough question.</v>
  • 53:04Thanks, Josh.
  • 53:05We do have another question
  • 53:06from actually two students read this.
  • 53:09They really appreciate your talk about this new metrics.
  • 53:13And we realize this is the package.
  • 53:16Our package is available from your GitHub website.
  • 53:19So anyone who's interested in applying that
  • 53:21you can download the app package and run,
  • 53:25but the students are wondering like
  • 53:26beyond this time wearing air pollution mixtures
  • 53:30a lot other mixtures in terms of (indistinct)
  • 53:33like temperature, green space, other things.
  • 53:36So how does your approach this
  • 53:39the CWVS mix apply to a broader setting
  • 53:43of environment exposures?
  • 53:45<v ->I think, my push, and if you read the paper,</v>
  • 53:48you'll notice that I really push for people
  • 53:50to think about that in their own setting.
  • 53:54Cause I think it's generally applicable to any,
  • 53:57it doesn't have to be a pregnancy outcome.
  • 53:59It doesn't have to be air pollution.
  • 54:01What it does have to be is consistently measured
  • 54:04across some exposure period.
  • 54:06So I'll often get questions that,
  • 54:08I have two time periods measured,
  • 54:10in the first trimester and then in the third trimester,
  • 54:14can I fit your methodology?
  • 54:16Well, we need more fine grained exposure information.
  • 54:19That's consistent across the individuals
  • 54:21in order to estimate these critical windows.
  • 54:23So I think the only barrier for entry
  • 54:25is that you have consistently estimated
  • 54:27kind of exposures for the population of interest.
  • 54:30It doesn't matter so much what the exposure is now.
  • 54:32I say that, but if you're bringing binary exposures
  • 54:36and you have limit of detection issues,
  • 54:38there are obviously some issues
  • 54:40that will need to be sorted out,
  • 54:41but the framework itself should work really well.
  • 54:44The other covariate is, you'll notice that
  • 54:46a lot of my work has been focused on pregnancy outcomes
  • 54:49and that's because the exposure period is so well defined
  • 54:52if you're working with something like cancer for example,
  • 54:55well, how far do you extend back in time,
  • 55:00the exposures like how you could go years and years back.
  • 55:03So there's this cumulative idea as well.
  • 55:07That's really hard to understand
  • 55:08and these distributed lag models are great.
  • 55:10As long as you can a priority tell me
  • 55:13what the relevant exposure period is.
  • 55:14I can tell you if any of the interior parts
  • 55:17of that exposure period are important,
  • 55:19but if you're telling me you don't know
  • 55:21when the exposure period potentially started
  • 55:23or it's a completely different conversation.
  • 55:25So your outcome has to have,
  • 55:27or preferably would have some type
  • 55:29of relevant exposure period.
  • 55:32It's actually even better
  • 55:32for something like cardiac heart defects,
  • 55:35which we know the heart forms between like weeks three
  • 55:37and eight of pregnancy.
  • 55:39So you can really focus in on something like daily
  • 55:41or even sub daily
  • 55:42if you had that type of exposure information.
  • 55:45So yeah, those are the two,
  • 55:46generally it should work,
  • 55:47but just make sure you have a good sense
  • 55:48of the exposure period.
  • 55:52<v Kai>Very good point, thanks Josh.</v>
  • 55:53And we do have one comment from our on artist.
  • 55:57So I read Dr. Warren
  • 55:59could you please share your thought on applying
  • 56:01the critical window analysis?
  • 56:04(mutters)
  • 56:07<v ->Sorry, with what?</v>
  • 56:09(overlapping conversation)
  • 56:10That's a really great point.
  • 56:11So over, so I'm actually on sabbatical right now,
  • 56:14which is why I couldn't be there in person with you guys,
  • 56:17but over the sabbatical
  • 56:19I've developed the framework and the code
  • 56:22to account for binary outcomes,
  • 56:25continuous outcomes and count outcomes as well.
  • 56:28Luckily if you've taken my (indistinct) course
  • 56:31or you're gonna take it next fall,
  • 56:32you'll see how all of these connect
  • 56:34and lend themselves really nicely
  • 56:36to kind of full conditional distribution updates
  • 56:40that make the model fitting process
  • 56:41really kind of slick and nice.
  • 56:44So you can we have a negative binomial regression,
  • 56:47for example, that can do the same thing.
  • 56:49You just have count out outcome data,
  • 56:51if you have a continuous measure, for example,
  • 56:53so I'm really aiming this.
  • 56:55I hope this method doesn't just pop up
  • 56:57and then disappear, I want people to use it,
  • 56:59I want it to be useful.
  • 57:00And so that's why I'm trying to extend it
  • 57:02and trying to get people to use it in different contexts.
  • 57:04So, yeah, definitely I love those types of questions.
  • 57:08<v Kai>Thanks Josh.</v>
  • 57:09Because we actually have another (speech distorted)
  • 57:13So we have to end early
  • 57:14and we do have a lot of students questions
  • 57:17and I'm sure contact you for just once.
  • 57:21So thanks again, Josh for wonderful talk.
  • 57:24<v ->No, yeah thanks for being here.</v>