Climate Change and Health Seminar Series: “Critical Window Variable Selection for Mixtures: Estimating the Impact of Multiple Air Pollutants on Stillbirth”
May 17, 2022April 25, 2022
Dr. Josh Warren joined the Center on Climate Change and Health to discuss his work on air pollution and health outcomes including, preterm birth and low birth weight.
Information
- ID
- 7850
- To Cite
- DCA Citation Guide
Transcript
- 00:00<v ->Lets get started</v>
- 00:01and thank you everyone for coming today.
- 00:03And this is will be your final seminar
- 00:07for this semester for the (indistinct) the house seminar.
- 00:09And we are very, very pleasant
- 00:11to have very our own affiliate faculty,
- 00:16Dr. Josh Warren joining us.
- 00:19Dr. Warren is a associate professor
- 00:21at the Biostatistics Department here,
- 00:24and his research focuses on statistical method
- 00:28in public health with the emphasis
- 00:30on environmental health programs,
- 00:32and much of his work involves introducing spatial
- 00:36and spatial temporal models in the basin setting
- 00:39to learn about the association
- 00:41between environmental exposures,
- 00:42such as air pollution and various health outcomes,
- 00:46including the stillbirth that we are here today.
- 00:50He's also interested in applying and developing
- 00:52some spatial temper models in collaborative settings,
- 00:56such as the infectious disease
- 00:58we been considered during the COVID pandemic.
- 01:02So without further ado, Josh,
- 01:04the floor is yours, thank you.
- 01:06<v ->Thank thank you Kai for the introduction.</v>
- 01:08Can everyone hear me?
- 01:10<v Kai>Yes.</v>
- 01:11<v ->All right, perfect.</v>
- 01:14And thanks to Kai for the invitation
- 01:15and Mulholland for setting all of this up
- 01:17and allowing me to do this virtually.
- 01:19It's nice to be here talking about something
- 01:22other than COVID.
- 01:23And I guess more recently in my past,
- 01:26I've been doing a lot of infectious disease work,
- 01:28so it's kind of nice to be back into something
- 01:30that I'm still passionate about
- 01:32and still working heavily on.
- 01:34And so hopefully some of this today
- 01:36will be a little bit of review of what we've done
- 01:38and really current project
- 01:40that we've just completed and published,
- 01:43but hopefully there are some elements in here
- 01:45that you can find overlap within your own work.
- 01:48And so if you have,
- 01:50if you see something that brings a bell,
- 01:52just please reach out and we can kind of talk.
- 01:54My goal and all of this work
- 01:56is to kind of develop user friendly methods
- 01:59that are useful for people outside
- 02:01of statistics and biostatistics.
- 02:02So the EPI community and at large usually.
- 02:06So, yeah, just feel free to reach out afterwards,
- 02:08and I can share more information,
- 02:10but today we're gonna be talking about
- 02:12critical window variable selection for mixtures
- 02:15and particularly air pollution and stillbirth.
- 02:17So we'll go ahead and jump into it.
- 02:21I think probably most people here will know air pollution,
- 02:24reproductive outcomes.
- 02:25There's a pretty substantial literature at this point
- 02:29that suggests exposure to ambient air pollution
- 02:32during pregnancies associated
- 02:33with a number of adverse birth outcomes,
- 02:35including preterm pregnancy, low birth weight,
- 02:38congenital heart defects, stillbirth, and others.
- 02:42These are some of the main ones.
- 02:43Stillbirth is a more recently
- 02:45kind of emerging outcome of study.
- 02:48Traditionally, it's been pre-term birth
- 02:49and low birth weight have gotten a lot of attention,
- 02:52but these associations are stable robust,
- 02:55and have been observed across a number of different study
- 02:57settings, designs, pollutants
- 02:59and there are a number of good review papers.
- 03:01If you're interested in a lot of the EPI literature
- 03:03on this topic,
- 03:06I would kind of summarize previous a number
- 03:09of the previous EPI studies,
- 03:10but as they like to use pollution exposures
- 03:14that are summarized kind of A priorities,
- 03:18so they wanna focus on a trimester,
- 03:20they wanna focus on the entire pregnancy,
- 03:22like, what is the exposure across the entire pregnancy?
- 03:25What impact does that have with respect to this outcome?
- 03:27So these are usually pre-specified averaging periods
- 03:30and they're explored separately
- 03:33in these different usually kind
- 03:35of traditional statistical models like logistic regression
- 03:38or (indistinct) if you're using some kind of count model.
- 03:41And so lots of different pollutants
- 03:43are floating around in these analyses,
- 03:44lots of different averaging periods
- 03:46in terms of the exposure, relevance exposure period.
- 03:50Luckily working with pregnancy,
- 03:52we have a relatively stable idea
- 03:56of when exposure potentially affects the fetus.
- 04:02So lots of models floating around lots of pollutants
- 04:05and exposure weeks,
- 04:07but this method is inefficient
- 04:09and doesn't allow for a joint identification
- 04:11of more kind of specific periods
- 04:13across the entire pregnancy in a continuous manner.
- 04:16So more recently there has been a focus on
- 04:19critical window estimation and identification.
- 04:22So this is where I have done quite a bit of work, I think,
- 04:25in this world.
- 04:27And then even more recently, I would say,
- 04:29and I know a number of people I work with even here
- 04:31at Yale pollution mixers are becoming a really big deal.
- 04:35So in this talk,
- 04:36we're trying to combine both of these things,
- 04:38things that we know really well
- 04:39or that my group knows really well,
- 04:40critical windows, estimation identification,
- 04:42and then pollution mixers,
- 04:43things that we're getting into more and more it seems.
- 04:48So starting with critical windows of exposure
- 04:50and exactly what am I talking about
- 04:52when I'm talking about critical windows?
- 04:55So there's an increasing interest in identifying
- 04:57more specific periods of increased vulnerability.
- 05:00Usually we're thinking about pregnancy,
- 05:01but this can go for any really health outcome
- 05:04that you're interested in,
- 05:06but more vulnerable periods of the pregnancy
- 05:08to environmental exposures
- 05:10and doing this within a single modeling framework.
- 05:12So estimation of these effects,
- 05:14we're calling critical windows
- 05:15or windows of susceptibility.
- 05:17The NIHS included this identification of critical windows
- 05:21as a part of its strategic goals back in 2012.
- 05:24And the focus has remained since then.
- 05:27So understanding like specific timing of exposure
- 05:31with respect to outcome development
- 05:33has a number of features but importantly,
- 05:35it could lead to improve mechanistic explanations
- 05:38of disease development,
- 05:40and ultimately focus guidelines for protection
- 05:42of the unborn child.
- 05:45So we have, like I mentioned,
- 05:46we've done a lot of methods work here,
- 05:50trying to understand variability in these windows
- 05:53essentially, and how to estimate them appropriately.
- 05:56So you'll start to see, I show some pictures,
- 05:58some figures here that the models become really
- 06:02lots of parameters in these models.
- 06:03So you, it really becomes an estimation challenge.
- 06:06Like how do you,
- 06:07the model makes sense, you can write it down,
- 06:09but can you actually fit these models?
- 06:10So we've done these or consider these models
- 06:14in a number of different settings,
- 06:15including the space temporal settings,
- 06:17survival statistics setting, semi parametric,
- 06:20non-parametric bays with multi-varied outcomes,
- 06:23and then more recently variable selection.
- 06:26And so inferences typically carried out
- 06:28in the Bayesian setting where I do most of my work
- 06:31due to increased computational flexibility
- 06:33and importantly incorporation
- 06:35of stabilizing prior structure.
- 06:38So not only have these been done on the method side
- 06:42where a lot of my time is spent,
- 06:44but I really like seeing them translated
- 06:46to actual practice too.
- 06:48So these methods and kind of variants
- 06:51of these methods have been,
- 06:53has successfully identified these critical windows
- 06:56in a number of outcomes and settings
- 06:58and different populations,
- 06:59but pre-term birth, low birth weight,
- 07:02CHDs so across a number of studies now.
- 07:04So they're getting good traction in other studies.
- 07:07Well, not just in the stat literature,
- 07:08which is nice to see.
- 07:11To give you a more kind of practical view
- 07:13of what I'm talking about,
- 07:14this is one of the first studies we published on
- 07:18way back in 2012.
- 07:19And this is for Harris County Texas,
- 07:22home of Houston, Texas.
- 07:24And on the left two panels,
- 07:25you'll see output from our newly developed method
- 07:29on the right two panels,
- 07:30you'll see output from more of a naive approach
- 07:32that was that we were considering at the time.
- 07:35So what we're talking about these critical windows
- 07:37are exactly what you're seeing.
- 07:39Maybe you can see my mouse here,
- 07:41but these periods where these risk ratios
- 07:45in this case kind of exclude zero
- 07:49or these risk parameters,
- 07:50they're not on any particular scale.
- 07:52That's easily to interpreted in this case, unfortunately,
- 07:54but this means that elevated exposure
- 07:57during pregnancy week 10 for example,
- 07:59leads to an increase in this case,
- 08:01was preterm birth, a preterm birth risk.
- 08:04So during your early kind of mid first
- 08:07and early second trimester pregnancy,
- 08:09we were noticing some interesting elevated risk to PM 2.5.
- 08:14And what we've seen across a number of studies now
- 08:16is that these windows vary by pollutant by outcome
- 08:20they're very different.
- 08:22There's lots of variability for ozone for example,
- 08:24it seemed to be early on in the first trimester.
- 08:28So this new methodology allows us to kind of hone in
- 08:31on the signal and reduce some of this noise.
- 08:35So if you try to basically imagine your data set,
- 08:38you have lots of pregnant women in your study,
- 08:41and you have linked with that pollution exposure
- 08:44for the first 36 weeks of pregnancy.
- 08:46A really naive thing to do would be,
- 08:47let's just throw all of those
- 08:49into a multiple regression model,
- 08:50some binary regression model,
- 08:52all at the same time.
- 08:53Clearly there's going to be correlation across time
- 08:55because exposure week one looks
- 08:57like exposure week two, et cetera.
- 08:58And if you do that, you can expect multicollinearity,
- 09:01which is jumping around of point estimates,
- 09:04increased variability,
- 09:05which is exactly what you see here.
- 09:07So our new methodology,
- 09:08which relied on like Gaussian processes
- 09:11and other smoothing techniques
- 09:13allowed us to in a data driven way,
- 09:16kind of tease out signal
- 09:17that you could almost make out by eye here.
- 09:19So if you look hard enough,
- 09:20you can see kind of a similar shape in both cases,
- 09:24but we were able to see a better shape here.
- 09:26So this is what we're generally in the past
- 09:28have been talking about with critical window estimation
- 09:30and identification.
- 09:33We mentioned that we worked on the survival outcome,
- 09:36we started to think about preterm birth
- 09:38instead of just a binary outcome yes or no.
- 09:41We wanted to consider it as a survival outcome.
- 09:43So what's the probability you make it
- 09:44to week 35 of your pregnancy,
- 09:46given that you've made it to 34 for example.
- 09:49So what this opened up was,
- 09:50well, maybe there are different exposure windows
- 09:53given different outcome weeks.
- 09:55So you can think of outcome week on the X axis
- 09:58on the Y axis here on an exposure week on the Y axis.
- 10:01So if you gave birth that week 27,
- 10:04you only had 27 weeks of exposure, for example.
- 10:07So people were leaving the set as pregnancy happened.
- 10:10And so we introduced methodology
- 10:12that not only kind of smoothed in the exposure direction,
- 10:15but also smooth across the outcome direction.
- 10:17And so these darker areas indicate weeks
- 10:21and outcome weeks, exposure weeks and outcome weeks
- 10:23where elevated exposure more adversely impacts
- 10:27like the risk of preterm birth in this case.
- 10:29So there was a distinct difference in this early preterm
- 10:32and then this late preterm,
- 10:33which kind of was impacted by exposures later
- 10:35in the pregnancy.
- 10:38And so underlying all of these kind of simplified plots
- 10:42I'm showing you were
- 10:43these individual outcome week specific critical window plots
- 10:47that we kind of are more accustomed to interpreting.
- 10:52So more recently we got into the spacial world noticing
- 10:57that, well, we started noticing that
- 10:59when we applied these methods
- 11:00to different data sets in different areas,
- 11:03we were seeing different shapes, different windows,
- 11:05different pollutants, emerging as important.
- 11:08And so we begin to think,
- 11:09well, is there spatial variability in even at a local scale?
- 11:13And so we develop new methodology
- 11:15that can kind of tease out
- 11:17not only temporal changes and exposure risk,
- 11:21but also spatial variability as well.
- 11:22So there's spatial correlation component here along
- 11:26with kind of these critical windows floating around as well.
- 11:29So this was 11 counties in North Carolina,
- 11:31including Wake County and the county to House Charlotte,
- 11:35and this was a low birth weight study.
- 11:37So there's methodology around that can do this.
- 11:41So we were working on these for a number of years
- 11:44and we got approached basically with a question,
- 11:47how are you actually defining
- 11:49a critical pregnancy window?
- 11:51And it seemed obvious at first,
- 11:52but then we started to really question
- 11:54the assumptions we had been making,
- 11:56but obviously what we had been doing,
- 11:58if I go back a few slides here is just looking,
- 12:01when did these individual week
- 12:03or time specific parameters exclude the critical value
- 12:07in zero in this case?
- 12:08And we were calling that a critical window
- 12:13but we started to worry that
- 12:16this might not be getting exactly what we're hoping
- 12:18is it capturing the true set?
- 12:20Is this doing a good job?
- 12:22In particular, we were worried about over smoothing
- 12:25with something like a Gaussian process
- 12:27and specifically with the endpoint.
- 12:29So if you can imagine, I'll go back one more time,
- 12:32sorry to scroll.
- 12:34Imagine the end points here and here,
- 12:37we begin to worry that the over smoothness
- 12:41could be pulling some of these actually null results
- 12:45into the critical set or vice versa,
- 12:48kind of pulling some important ones down to the null set.
- 12:51So we were very concerned about the endpoints here
- 12:54when we started working on this more recent work.
- 12:57So our solution to this
- 12:59was critical window variable selection.
- 13:01So we like the smoothness, we like the plots that emerge.
- 13:04We like how we can interpret these things,
- 13:06but a variable selection component
- 13:07would allow us to turn some of these effects off,
- 13:10even if they appear to be significant in the plots.
- 13:14And so what this meant is,
- 13:15we introduced like a bayesian variable selection technique
- 13:20called critical window variable selection,
- 13:22where basically you still have the critical window plots
- 13:25that you know and love, and you know how to interpret,
- 13:28but underlying each effect now,
- 13:30you actually have this binary exclusionary,
- 13:33or inclusion variable
- 13:34that tells you whether this thing should be included.
- 13:37This particular weekly effect should be included
- 13:39in the critical window set.
- 13:40And what we found is that there are a number of times,
- 13:44not in this particular real case study in North Carolina,
- 13:47but through simulation,
- 13:48we noticed that there were times
- 13:50when exactly what we had worried was happening
- 13:53had been happening so effects
- 13:54near the border here were being pulled into the set,
- 13:58but luckily they were not being included
- 14:00in the variable selection component.
- 14:01So to be in the variable selection set now,
- 14:05you had to have posterior inclusion probability bigger
- 14:08than point five, so bigger than this line
- 14:11and your individual weekly effects
- 14:13had to be exclude zero with a 95% credible vulnerable.
- 14:16So with these two kind of definitions we were doing,
- 14:19we were getting a much better kind of recovering
- 14:23the true set of critical windows in simulation, at least.
- 14:27So this really outperformed
- 14:29what we had been doing previously.
- 14:30So we've been moving forward
- 14:32with this variable selection concept since then.
- 14:36All right, so we like critical window variable selection,
- 14:38we like a lot of these other methods.
- 14:39The problem is that as I know,
- 14:42a number of you are aware,
- 14:43the literature has really moved towards the science
- 14:46has moved towards pollution, mixtures
- 14:48and multiple exposures.
- 14:50And a lot of these methodologies were developed
- 14:52with one pollutant in mind at the most two to three,
- 14:58but they were not generally meant
- 15:00for pollution mixtures for example.
- 15:02So our goal in this work
- 15:04was to extend what we liked the CWVS,
- 15:07critical and variable selection to accommodate mixtures.
- 15:10And so when we started to thinking about mixtures,
- 15:12when you have time varying exposures
- 15:14and time varying effects,
- 15:16it became relatively conceptually complicated
- 15:19because you have lots of parameters floating around.
- 15:22So we wanted something that could do
- 15:23like a dimension reduction essentially.
- 15:25So what we thought is a nice solution,
- 15:28like in a single pollutant context, or I'm sorry,
- 15:32in a single exposure time period context
- 15:34is this weighted quantile sum regression,
- 15:36which I know a lot of you are familiar with,
- 15:38'cause I've helped write pieces of grants
- 15:40that have discussed weighted quantile sum regression here,
- 15:44but it offers a nice interpretable solution
- 15:46for estimating the impact of a mixture on an outcome.
- 15:49And it has this really nice sum to one constraint
- 15:53on the regression parameters.
- 15:55And so you get in the end,
- 15:56you have 20 pollutants for example,
- 15:58and you get to see the relative contribution
- 16:01of each of these pollutants in terms of the entire mixture.
- 16:04So you have these little sum to one between zero
- 16:06and one probabilities or proportions
- 16:09that describe the role of individual pollutants.
- 16:12And then you have this global regression parameter
- 16:15that describes the impact of that mixture
- 16:17as defined by those weights on the health outcome.
- 16:21So it does a little two stage process estimate weights
- 16:24and then global regression parameter,
- 16:26not important for this talk.
- 16:29More recently in 2020, this was extended
- 16:32to the lag weighted quantile sum regression.
- 16:35And yeah, it extended WQS to the multiple pollutants setting
- 16:42in a really, I think of it as a relatively ad hoc solution,
- 16:47but basically WQS has fit at each exposure week separately.
- 16:51The weights are estimated,
- 16:53the mixtures are combined based on those weights.
- 16:55And then those kind of package mixtures
- 16:58are thrown into like a distributed live model
- 17:00to estimate similar curves
- 17:02is what I've been showing you so far.
- 17:03So the estimation of the weights
- 17:05and their relative importance in the mixture
- 17:07are done separately outside of kind of the estimation
- 17:11of the regression parameters as well.
- 17:13So this more, again, more of a two stage approach.
- 17:16All right, so we like WQS
- 17:19because of its relative simplicity and its interpretability,
- 17:22we liked critical and variable selection.
- 17:24So the goals here were to combine
- 17:26that estimation identification ability of CWVS
- 17:29with the interpretability and shrinkage properties
- 17:31of WQS within a unified modeling framework and extending
- 17:37oh yeah, so WQS is nice.
- 17:39It has zero to some to one components
- 17:42that are between zero and one,
- 17:44but you don't actually get a sense of variable selection
- 17:47when doing this.
- 17:48So none of the weights can exactly equal zero.
- 17:50We wanted a more sparse solution
- 17:53and so we introduced also a way
- 17:54to make these weights exactly zero.
- 17:57So you can get a better sense of
- 17:58which pollutants are the main players in the mixture.
- 18:02And so what we're calling this is CWVS for mixtures
- 18:06or CWVS mix.
- 18:09And so some features before we get
- 18:11into a little bit of the details of the model,
- 18:13these are like the high,
- 18:14just if you take nothing else away from like
- 18:16what this model does, this is,
- 18:18I think the important slide here is that,
- 18:20we have main effects and first order interactions
- 18:23between the pollutants during each exposure period.
- 18:25So week one of pregnancy,
- 18:27week two of pregnancy, all of these interactions,
- 18:30all of these main effects are included.
- 18:31So there's lots of parameters you can already imagine
- 18:34are floating around here.
- 18:35We still hold onto this sum to one mixture weights
- 18:38at each exposure week separately.
- 18:41But we want to account for the fact that,
- 18:43what's happening in exposure week one
- 18:44may be similar to exposure week two to three to four,
- 18:48with this correlation dying out as you get further apart
- 18:50in exposure time.
- 18:51So we want these weights not to have to be estimated
- 18:55kind of independently at each exposure week.
- 18:57We want to enforce some smoothness,
- 19:00data driven smoothness preferably to estimate these weights.
- 19:04And as I mentioned, we want these weights
- 19:06to have a variable selection component.
- 19:08So we can actually identify individual elements
- 19:10of the mixture and we still have this global risk parameter,
- 19:14and this is going to follow the CWVS model
- 19:17so that we can estimate
- 19:18these critical windows more accurately.
- 19:22All right, so the goals of this study
- 19:24before you jump into some of the methodology here
- 19:26are to develop CWVS mix.
- 19:28As I mentioned,
- 19:30simulation is really important in this world.
- 19:32I wanna make sure that what we're doing
- 19:34is not just duplicating other efforts
- 19:36and that it's actually offering something new,
- 19:38something helpful to the literature
- 19:40that we can point to.
- 19:41I think I know the shortcomings of something like lag,
- 19:45weighted quantile sum regression,
- 19:46but until I see it actually happen in simulation
- 19:49it's just kind of hypothetical.
- 19:51So finally we wanna investigate the impact
- 19:53using this new methodology
- 19:55of multiple ambient air pollutants on stillbirth risk.
- 19:58And in this case,
- 19:59we're focusing on New Jersey from 2005 to 2014.
- 20:03And actually we have really nice output
- 20:06from a novel data fusion model.
- 20:08There are lots of data fusion models floating
- 20:10around right now, but this is a one from 2019,
- 20:12from our collaborator at Georgia tech and at Emory
- 20:16that provided 12 pollutants,
- 20:1812 kilometer grid cell size across the entire US
- 20:22daily no missing this things like that.
- 20:26So for these particular pollutants.
- 20:29All right, so let's talk a little bit about
- 20:31the model and what it does
- 20:33and some of the intuitive features
- 20:34that I think it has and why it might work well.
- 20:37So yeah, we're starting with some outcome,
- 20:42it could be some adverse health outcome
- 20:44like preterm pregnancy or not,
- 20:46or stillbirth or not some be newly outcome
- 20:49where this PI describes kind of the probability
- 20:52that person I experiences this outcome.
- 20:55We model this probability using logistic regression
- 20:58as we normally would,
- 21:00these green I'm kind of trying to different.
- 21:02I'm trying to keep people's attention
- 21:04to the parameters
- 21:05and how I'm mentally grouping them as well.
- 21:08So these green represent these typical like demographics.
- 21:12We know there are certain risk factors
- 21:13for different health outcomes,
- 21:16particularly pregnancy outcomes being over 35 for example,
- 21:19with preterm pregnancy, alcohol, smoking, et cetera.
- 21:23So this would go into this exi transpose data.
- 21:26This specter here where a lot of our work came in
- 21:31are on these blue parameters,
- 21:33which are the weights that I've been talking about.
- 21:35So these weights, these blue parameters
- 21:37actually sum to one at each exposure week.
- 21:42So each exposure week T
- 21:44we basically have a vector of Lambdas
- 21:47that are weights between zero and one
- 21:49could be actually equal to zero exactly.
- 21:52And they sum to one at each exposure week separately,
- 21:55you notice their index Byte
- 21:57because we're allowing the possibility
- 21:59that the exposure profile changes across the pregnancy.
- 22:03So it early on in the pregnancy,
- 22:05maybe the risk is primarily driven by pollutant A
- 22:10but later on in the pregnancy,
- 22:12perhaps that shifts.
- 22:13And so the weights would shift well as well,
- 22:16but we expect this shift to be smoother
- 22:19rather than complete choppiness across the exposure weeks.
- 22:23And so what these weights do are
- 22:25they kind of multiply here with the main effects
- 22:29and these first order interactions.
- 22:31And if you think about taking this sum across
- 22:33main effects and interactions,
- 22:34you have this package of weighted exposure essentially.
- 22:38And the alpha here tells us whether at exposure period T
- 22:42this package has any impact on
- 22:44your ultimate probability of developing the outcome.
- 22:48So we have this nice sense of the weights,
- 22:50help us describe what's happening with the mixture profiles.
- 22:53And, but the alpha keeps us honest
- 22:55and keeps us able to say,
- 22:57well, you know, this mixture's interesting,
- 23:00but it has no impact on the health outcome of interest here.
- 23:06So how do we do these mixture weights?
- 23:09As I mentioned, two features that we're interested in
- 23:11the ability to actually equal zero
- 23:14and smoothness across time.
- 23:15And so first point is to,
- 23:19well, we introduce these latent weight parameters
- 23:22that I'm calling Lambda star,
- 23:23not to don't get too caught up in them.
- 23:25Basically they're continuously varying parameters
- 23:28that as soon as they cross the zero threshold,
- 23:31they turn on in our model.
- 23:33So that's what this maximum is doing.
- 23:35So they turn on and they give you some weight
- 23:37and then as soon as they cross into negative territory,
- 23:40they go to zero.
- 23:40So this is how we're getting actual zeros in these weights.
- 23:43So the Lambdas and the Lambda Tilda
- 23:45can actually equal zero
- 23:47based on these underlying latent weight parameters.
- 23:52All right, so we keep them summing to one
- 23:55by dividing by the sum of the numerator, essentially.
- 23:58So whatever weights are positive gets summed
- 24:00and we're dividing by,
- 24:01we're basically self kind of correcting here
- 24:04so that the weights always come to one,
- 24:07these weights combined.
- 24:08For the interactions, we don't want the case.
- 24:12We prefer sparse model,
- 24:14particularly as the number of pollutants get really large.
- 24:17So the number of interactions will grow.
- 24:19So what we want is our interactions
- 24:21that are only turned on essentially
- 24:24when the main effects are turned on.
- 24:25So you can see these two indicators I've added
- 24:27basically say if the main effect themselves
- 24:30aren't both turned on,
- 24:31this interaction effect gets zeroed out already.
- 24:34So the interaction has a kind of a higher bar clear
- 24:37this strict hierarchy basically
- 24:40where both main effects have to be on
- 24:43and the interaction latent variable has to be on.
- 24:46So there's the zero component now,
- 24:48how do we do smoothness across time?
- 24:50Well, it's all about this correlation structure.
- 24:52So these latent Lambda star parameters
- 24:55that control the weights are actually modeled
- 24:57as a multi Gaussian process.
- 24:59And I think the key thing to focus on here is that
- 25:01there's this underlying correlation structure
- 25:04that tells us as two exposure time points get further apart.
- 25:09This exponential of a negative number
- 25:11will get closer to zero.
- 25:12So correlation dies out as exposure time gets further apart
- 25:16now, as they get closer together,
- 25:17this correlation is gonna be higher.
- 25:19And the main parameter that controls this level
- 25:22of correlation is this fee parameter.
- 25:23And we actually put prior distributions on this
- 25:27to allow the data, to drive the inference,
- 25:29rather than like our view of what we expect
- 25:32this smoothness to look like.
- 25:33So yeah, this is data driven
- 25:35kind of smoothness across exposure time.
- 25:39All right, so now, so we've got the weights handled
- 25:42they have both properties that we care about.
- 25:43Now let's talk about the mixture impact itself.
- 25:46So this alpha recall tells us whether the mixture
- 25:48that we observe at time point T
- 25:50or that we estimated exposure time point T
- 25:53is actually relevant to the health outcome.
- 25:55So we want, again,
- 25:57we want this variable selection here
- 25:59because we've noticed the problem with the end points
- 26:01that I described earlier.
- 26:02So to do this, we decompose this effect into two pieces,
- 26:06a continuously varying piece.
- 26:08And then this binary piece
- 26:09that I mentioned earlier on in the talk,
- 26:11the binary piece are just independent
- 26:13but newly random variables.
- 26:15But we imagine that if you're in the critical window set
- 26:20at time one, then you may be in it at time two
- 26:22and may be more likely to be in at time three.
- 26:25So there may be some sense of correlation
- 26:27across exposure time here as well.
- 26:29So while we model these things as independent,
- 26:31the probabilities that underlie these zero
- 26:34and one variables are actually smoothly varying
- 26:37and correlated across time.
- 26:39So again,
- 26:40we use this kind of exponential correlation structure.
- 26:43We allow for cross correlation between the continuous
- 26:46and the binary piece.
- 26:48Not important to get into here,
- 26:49you can kind of read back over.
- 26:52I can share a paper with you if you want to,
- 26:54or talk more about it offline,
- 26:56but essentially there's some cross correlation
- 26:58there's correlation across time,
- 26:59but this allows for smoothness in the effects
- 27:02and the kind of the regression parameter effects
- 27:04that we've been looking at,
- 27:05but also in the variable selection as well.
- 27:08And these, both of these things come together to kind
- 27:11of define the critical window variable selection model.
- 27:14To finish the model recall everything's in the base setting
- 27:17so really weekly informative prior distributions
- 27:21kind of standard prior distributions when possible,
- 27:24nothing too interesting here.
- 27:26So the model you may be looking at on this previous slide
- 27:29and thinking there's a lot of parameters floating
- 27:31around here.
- 27:32There's a lot of output that you're going to be estimating.
- 27:35So how do you make sense of this as a practitioner,
- 27:38someone who actually wants to know if a mixture
- 27:40is having an impact on your health?
- 27:42Well, luckily we still have relatively nice
- 27:46and estimable kind of effects here,
- 27:50associations that we can talk about.
- 27:52So for example,
- 27:53for a change in the log odds for a one unit increase
- 27:55in each pollutant during a particular exposure period,
- 27:59this would be the quantity
- 28:01that you would make (indistinct).
- 28:02You would exponentiate this,
- 28:03and you would have like an odd ratio, for example,
- 28:05now recall for any model that includes interactions.
- 28:08The interpretation is always increasingly complicated
- 28:12because it matters where you start
- 28:14when you have interactions.
- 28:15So if you're already at a high level,
- 28:18so the values themselves of exposure have to come into play,
- 28:22but nonetheless, you can still get nice quantities
- 28:25to estimate in the end.
- 28:27And if you're only interested in what happens
- 28:29if pollutant A increased during exposure period T
- 28:32you can write down actually
- 28:34what that looks like as well.
- 28:35So you can estimate both of these things relatively easily
- 28:38from our output, from our model.
- 28:41Alright, so we have a model
- 28:42that kind of checked all the boxes,
- 28:44at least in my head when I was writing it down
- 28:46and we can, I tested it, we can fit it,
- 28:49it seems to work and that it's converging
- 28:53and it's producing things that look reasonable,
- 28:56but the simulation study really allows us to dig deeper
- 28:59and say, is there anything, this it's obviously new,
- 29:02but is there anything beneficial to what we're doing?
- 29:04Or should we just be doing something simpler
- 29:06that already exists?
- 29:08So we wanted particularly to ask,
- 29:10how does CWVS mix compared
- 29:13to some of these existing approaches
- 29:15for three different factors that we're interested in?
- 29:17So first identifying the true critical window set,
- 29:20obviously probably the most important part
- 29:22of critical window research here is like,
- 29:25let's get the critical window set right
- 29:28when we're estimating and identifying these parameters.
- 29:31But obviously when you're talking about mixtures,
- 29:33we also care about these weights.
- 29:35We want to know that the mixture profile we're looking at
- 29:38on a certain exposure period actually is,
- 29:43reflective of the true mixture profile
- 29:45that makes sense here.
- 29:46So how well do we do at estimating these Lambdas
- 29:49and Lambdas Tilda parameters
- 29:52that describe the effects of main effects and interactions,
- 29:55and then finally,
- 29:56how well do we do it at estimating the magnitude of risk,
- 30:00these alpha T parameters.
- 30:02We wanna make sure we're getting these right as well.
- 30:04And as a side issue, I guess,
- 30:06just more of our curiosity,
- 30:07how well does this variable selection process work
- 30:10for the weights that we've introduced?
- 30:13So now we need to think about
- 30:15what are competing methods in this space.
- 30:17There aren't a lot of methods out there
- 30:19that aim to estimate critical windows with.
- 30:24So time bearing exposures and multiple pollutants
- 30:27and the ones that are out there
- 30:30give different enough output
- 30:31that's hard to compare one model to the next,
- 30:34but here are three approaches that we kind of came up with.
- 30:38One is the most naive kind of
- 30:40where I would always start
- 30:42as a practitioner with a new data set,
- 30:44this equal weights approach.
- 30:45So maybe just averaging all of the exposures for a person
- 30:49on a given exposure week and including that average
- 30:55and the interactions with the other exposure periods
- 30:59in a framework, a distributed lag framework.
- 31:04So yeah, this is called equal weights or EW.
- 31:08A PCA approach also makes sense.
- 31:10So let's allow the data to determine
- 31:11the correct weights of these Lambdas,
- 31:15but let's focus it only on the exposure period,
- 31:18only the exposure data.
- 31:19So at each exposure period,
- 31:20fit a PCA to the person specific exposures
- 31:24and generate these weights.
- 31:27That kind of describe the relative contribution
- 31:29of the different interactions and main effects in a mixture,
- 31:34and then weight the mixtures in that way
- 31:36and throw that weighted value
- 31:38into the distributed regression model.
- 31:41So for all of these methods,
- 31:42we're using the original CWVS,
- 31:44so that we're comparable so that the method
- 31:47so that the results are actually comparable across.
- 31:49And that the only thing that is changing essentially
- 31:52is how we define the weights.
- 31:54And then finally, the most sophisticated approach
- 31:56at that time was this lag,
- 31:57weighted quantal sum regression
- 31:58that we talked a little bit about
- 32:01where we applied weighted quantal sum regression
- 32:03separately to each exposure period,
- 32:06let that estimate the weights,
- 32:07create the little package of exposure,
- 32:09and then throw those packages
- 32:10into the regression model using CWVS.
- 32:13So once you have the weights,
- 32:15like once you condition on the weights
- 32:17and you know the weights,
- 32:18you basically have one exposure
- 32:19and that exposure is the package,
- 32:22the mixture package that you've made.
- 32:24So the model,
- 32:25the modeling becomes much simpler in that case.
- 32:29So how did we go about to test these different methods?
- 32:35Well, we started very simply.
- 32:36So these represent the weights cross exposure period.
- 32:41In this case, I'm pretending like there's only five weeks
- 32:43in the exposure set.
- 32:45In reality, I let that vary for each data set
- 32:47the length and the start time of the exposure window changed
- 32:51but for this case,
- 32:52we assumed it started at pregnancy week one
- 32:55and went to week five.
- 32:56And so in the simplest case,
- 32:57we had just assumed there was one pollutant at play
- 33:00and it stayed constant across the exposure period.
- 33:02This is really simple.
- 33:03One pollutant is driving the entire risk that we're seeing.
- 33:08In another setting, we assumed that there were two,
- 33:11but there was no changes over time.
- 33:13They were always static across time
- 33:15and three, there were three that were coming into play
- 33:18at four, four, and then five, five of them,
- 33:21obviously as more come online
- 33:23and become important players in the mixture.
- 33:26The weights generally go down
- 33:27because all of lots of these have to be non zero.
- 33:31In setting B,
- 33:32we wanted to allow for some variability
- 33:34among the important pollutants.
- 33:35So we still allow for the same important pollutants
- 33:38to be important at each exposure period,
- 33:42but we allowed their relative contribution
- 33:43to change across time.
- 33:44So early on in pregnancy, this one was important,
- 33:47but then it's contribution went down
- 33:49and it was kind of surpassed by number two here
- 33:53at pollutant two,
- 33:54and then they can keep swapping in and out
- 33:56across the exposure.
- 33:57And in setting C it was complete chaos essentially
- 34:02different pollutants could come online
- 34:04and then leave and become important
- 34:05or not important go to zero.
- 34:07We don't anticipate this would ever,
- 34:10or this would be the case,
- 34:11but it would be nice to know if our model
- 34:13can somehow collapse and kind of accommodate this reckless,
- 34:17this wild behavior, I guess.
- 34:20So, yeah,
- 34:21this is something that kind of testing the extreme
- 34:23of all these methods is what we were trying to do here.
- 34:27So we'll jump right into the results.
- 34:28Just to give you a sense of what happened
- 34:31when we tested these models
- 34:32with lots of simulated data sets,
- 34:34CWVS mix continuously and kind of consistently
- 34:40was able to get the critical windows set
- 34:43more accurately than the other methods,
- 34:45which struggled kind of in varying degrees
- 34:48across these different settings,
- 34:50in terms of estimating the weight parameters.
- 34:54There's a generally CWVS mix has a lower means scored error
- 34:59so it's doing a better job of estimating these parameters,
- 35:02as you would expect, like with equal weights,
- 35:04if you assume each weight,
- 35:05each pollutant and interaction is playing
- 35:09an equal part in the story,
- 35:10you can be very bad off a lot of times,
- 35:13which is given, which is why these weights
- 35:16these values are so high for some of these methods.
- 35:20And finally,
- 35:21with the estimation of the regression parameters
- 35:23that describe the magnitude of risk.
- 35:26Generally, we're seeing improved performance with CWVS mix,
- 35:31but interestingly,
- 35:32at least at the time when we first saw this
- 35:34is that the equal weights method does a pretty good job
- 35:38of estimating these risk magnitude parameters
- 35:43as the number of important pollutants increases.
- 35:46So if you tell me that every one of your pollutants
- 35:48are important,
- 35:49then it's going to be hard to beat that something
- 35:53that gives all of the pollutants equal weight.
- 35:55So that's kind of the intuition behind it.
- 35:57As more pollutants become important,
- 35:58giving everything equal weight is not such a bad ideas,
- 36:01almost it's just averaging away some of that error,
- 36:04but generally, we're still doing well.
- 36:07And specifically in comparison
- 36:09to the lag weight quantile sum regression,
- 36:10that's really importantly,
- 36:12'cause at the time this was the kind
- 36:13of the main method out there
- 36:15that aimed to do the same thing we were doing.
- 36:17So in summary here with a simulation study,
- 36:21we did really well in critical in terms of accuracy, sorry,
- 36:26weight parameter estimation,
- 36:28and even in the risk magnitude parameter estimation.
- 36:32So models that don't have,
- 36:34that they don't actually estimate weights are more efficient
- 36:36when the complexity
- 36:37or the number of important pollutants grow
- 36:40and a little bit about the variable selection
- 36:42that we introduced with these latent variables.
- 36:45It appeared to do really well again,
- 36:47as the number of important pollutants was relatively small.
- 36:51So if you have lots of pollutants that are important
- 36:54and their interactions are important,
- 36:56it was hard for the variable section process
- 36:59to kind of tease out
- 36:59when something's included or excluded.
- 37:01It tended to just say everything was included.
- 37:04So something to keep in mind,
- 37:06I guess, as a limitation per perhaps of this approach.
- 37:10All right, so now onto the real data application
- 37:13that we had,
- 37:14and this is part of a larger kind of climate change,
- 37:18heat preterm birth study,
- 37:21we collected lots of state specific data birth records
- 37:26for all the way back to 1990 for maybe 12, 14 states.
- 37:31And so this one was set in New Jersey,
- 37:34but we focused on stillbirth given
- 37:36their really strong stillbirth data collection
- 37:40kind of methodology that New Jersey was using.
- 37:43So stillbirth the death or loss of a baby,
- 37:46at least 20 weeks of pregnancy affects about
- 37:48one in 160 births in the US.
- 37:51There are some known maternal risk factors,
- 37:54black mother, 35 years age or more of age,
- 38:00low SES, smoking, et cetera.
- 38:03And recent literature review meta analysis suggest that,
- 38:06PM 2.5 CO2 and O3 are associated with increased risk.
- 38:11This was really recent,
- 38:13but that more studies are definitely needed.
- 38:14There's not a lot as in comparison
- 38:17to some of the other adverse birth outcomes,
- 38:18there's not as much done with stillbirth, at least.
- 38:21However a majority of these previous studies
- 38:23have focused on again, single pollutant approaches,
- 38:27wide exposure periods like the entire relevant pregnancy
- 38:31before the delivery.
- 38:35So there is a need
- 38:36for kind of multiple pollutant critical window
- 38:37methods in this setting.
- 38:38So this is what kind of made us think about
- 38:43developing this methodology,
- 38:44but also applying it in this case study.
- 38:48So a little bit about the data we had access to.
- 38:50We had live birth and fetal death records
- 38:52from New Jersey from 2005 to 14.
- 38:55We included singletons with gestational age
- 38:57of at least 20 weeks,
- 38:58no birth defects, conception date in 25 to 2005 to 2013,
- 39:05we ran a case control analysis here
- 39:07where we five link live births were linked
- 39:10with each stillbirth matching only on race ethnicity.
- 39:13And we actually ended up running these analysis separately
- 39:16for each group non-Hispanic black,
- 39:17non-Hispanic white and Hispanic.
- 39:19And in terms of what our exposures,
- 39:22we included weekly pollution exposures
- 39:24through gestational week 20 were included in this analysis.
- 39:30All right, a little bit about the pollutants
- 39:32I mentioned we relied on a data fusion model
- 39:35that gave us kind of fine scale spatially
- 39:40and temporally estimates of 12 pollutants across New Jersey
- 39:46across the US actually, but focusing here on New Jersey.
- 39:49So you can see the pollutants listed here
- 39:50and we linked each woman's residence at delivery
- 39:54with the closest grid be where data were available
- 39:57or the estimates and predictions were available
- 39:59and assigned weekly exposures across
- 40:00the first 20 weeks of gestation.
- 40:02I know there's always a lot of pushback
- 40:04in these birth records
- 40:05because we don't have residential mobility,
- 40:08we don't have sense of like how often people move.
- 40:10And we know moving is differentialable
- 40:13by socioeconomic status for example,
- 40:15there are a lot of factors
- 40:16that influence moving during pregnancy,
- 40:18but if maybe this will make you feel somewhat better,
- 40:22but we did a study in 2019,
- 40:25the kind of assess the robustness
- 40:27of these critical window methods more generally
- 40:29to lots of different sources of error,
- 40:32including residential mobility
- 40:34and the results were actually very promising.
- 40:36I thought so the findings are robust generally
- 40:39to kind of this exposure misclassification
- 40:43or exposure error that's introduced through mobility.
- 40:47All right, so in summary,
- 40:49I guess for the data we had around 1300 non-Hispanic black,
- 40:53stillbirths in this time 928 Hispanic,
- 40:56and 1100 non-Hispanic white.
- 40:59our covariates that we included were a year of conception,
- 41:02season of conception to control for this kind of seasonality
- 41:05and long term time trends and pollution exposure,
- 41:09tobacco use indicator, age category, education.
- 41:13We had this sex of the fetus
- 41:14and to control for spatial kind of residual correlation.
- 41:18We actually included latitude, longitude
- 41:21of the residents had delivery and their interaction term
- 41:24as a pre-screening
- 41:26because we had 12 pollutants to work with.
- 41:28We didn't wanna introduce a lot of noise if possible,
- 41:30into the new framework.
- 41:31So we did a pre-run of the original critical window variable
- 41:35selection on each pollutant individually,
- 41:37as most analysis would do anyway,
- 41:40and identified a subset across all
- 41:43of the different data sets and by different data sets.
- 41:45I mean the non-Hispanic black,
- 41:47non-Hispanic white, and Hispanic.
- 41:49So all of the relevant and kind of significant exposures
- 41:54that came up and during any exposure period
- 41:57were included as a subset into this bigger framework.
- 41:59And so in total, we had PM 2.5 sulfate, nitrogen oxide,
- 42:03ammonium, and nitrate that kind
- 42:05of made this pre-screening period into the final subset.
- 42:10So here is some of the output
- 42:12that we thought was interesting.
- 42:14There's a lot of output
- 42:15that can be shown as you already know.
- 42:17I guess now there's weight at each exposure period,
- 42:20there's regression parameters,
- 42:22there's just a lot that can happen here
- 42:24and there's interactions, there's main effects,
- 42:26but first let's focus on the first column here,
- 42:29and this is at least something we can hold onto
- 42:31that we understand from previous work in this space.
- 42:35So what we can see for the non-Hispanic black population
- 42:40that we were working with in New Jersey during this time,
- 42:42that elevated exposure,
- 42:44I'm not gonna say to what yet but elevated exposure
- 42:46to some combination of these five pollutants
- 42:50during pregnancy week two,
- 42:51and then later on in the pregnancy, 16, 17,
- 42:54and 20 actually led to increased odds.
- 42:59So these are odds or ratios being presented of excuse me,
- 43:03of stillbirth.
- 43:04And so we can kind of take these in and say,
- 43:07we get a sense of the critical windows
- 43:09that are identified.
- 43:10We also get a sense of the variable selection component
- 43:13that I mentioned
- 43:14and in this case, they line up pretty perfectly.
- 43:17These are consistently in the model actually included
- 43:19in the Bayesian variable selection model,
- 43:21but also they're when they are in the model they're positive
- 43:25So there this risk is in the right direction.
- 43:27So more pollution during these pregnancy windows,
- 43:32more risk of stillbirth in this population.
- 43:34Now the question becomes,
- 43:35well, what are you talking about
- 43:36when you talk about the exposure?
- 43:38Like, what is the mixture that you're talking about
- 43:40in week two, for example?
- 43:42Because we have five pollutants
- 43:43and their interactions floating around.
- 43:45So focusing first, so now let's move to the second column.
- 43:49This represents the interactions,
- 43:51this top part and the bottom part represents, I'm sorry,
- 43:53this is main effects.
- 43:54And the bottom part represents interactions.
- 43:57So you can see ammonium is playing a big role throughout
- 44:01until week 16,
- 44:02which is dominated sharply by nitrogen oxides.
- 44:06And then ammonium comes back into play here
- 44:11in terms of the interactions that are important,
- 44:12it looks like PM 2.5 and ammonium early on.
- 44:16And then later on it's nitrogen oxides and ammonium
- 44:20kind of come into play.
- 44:21So a lot of this is noise.
- 44:23I did not show you the variable section component,
- 44:25but it probably would be nice
- 44:28to kind of gray these out
- 44:29if they're not selected in the model.
- 44:32But a lot of these actually are selected in the model
- 44:34with our variable selection.
- 44:35So while these look to be non zero weights,
- 44:37some of them are actually exactly zero essentially
- 44:41because of the variable selection component.
- 44:44But there's so much output,
- 44:45it's hard to figure out what exactly
- 44:47to show in a digestible way.
- 44:48So this is where we landed.
- 44:50So, interesting results you get to see how
- 44:52the exposure kind of the mixture transitions
- 44:55across exposure time,
- 44:57you get to see what impact that has
- 44:59on the actual risk of the outcome that you're talking about.
- 45:03So a nice, I think coherent story can come,
- 45:06can be told, if you're picturing your own analysis here,
- 45:10you get to talk about the risk overall
- 45:12to the mixture kind of combination or profile,
- 45:14but also then dig deeper into individual weeks
- 45:16and talk about which ones are important,
- 45:18which interactions are reporting for example.
- 45:21For the non-Hispanic white,
- 45:22there was very little indication
- 45:25that these pollutants were planning a role,
- 45:28I guess, in the kind of development of stillbirth
- 45:30or the risk of stillbirth in this population
- 45:33and for the Hispanic population,
- 45:36it looked like there potentially was some uptick here
- 45:38at the end, but nothing significantly jumped out either.
- 45:42And so at this point, it's almost...
- 45:45You don't start to investigate
- 45:47and over interpret these white parameters,
- 45:50given that you're not seeing anything here.
- 45:52So I kind of consider this to be noise essentially
- 45:57for the Hispanic and non-Hispanic white results for example.
- 46:01So a little brief kind of wrapping up here,
- 46:04summary of our findings is that,
- 46:06for the non-Hispanic black data set
- 46:08and variable selection results
- 46:10PM 2.5 and its chemical constituents
- 46:13are primary drivers of risk.
- 46:15And this was actually changing across exposure week.
- 46:17So driven in week two by a lot of interactions
- 46:21and kind of individual pieces.
- 46:23Week 16, mainly heavily driven by nitrogen oxides
- 46:27and then week 17,
- 46:29one or two pollutants and their interactions.
- 46:31So all the other interactions
- 46:33that are not listed here among the five variables
- 46:36were actually not significantly important here.
- 46:39So no nothing kind of nothing seen
- 46:46for the non-Hispanic white and Hispanic populations.
- 46:50And I guess in conclusion,
- 46:51we introduce CWVS mix with which combines smooth variable
- 46:56Bayesian variable selection in the weights
- 46:58and the regression parameters
- 46:59with interpretable weighted quantile sum regression
- 47:02shrinkage to identify critical windows,
- 47:04but also kind of understand
- 47:06and kind of dig deeper into the mixture itself.
- 47:10And importantly, at least from our perspective
- 47:13is that CWVS mix seemed to offer something
- 47:17that the existing methods didn't,
- 47:19which so consistently outperforming these other methods
- 47:22for identifying the true critical window set,
- 47:25estimating weight parameters,
- 47:26which is really important for interpreting the mixtures
- 47:29and then estimating the risk magnitude parameters as well.
- 47:32And our stillbirth results from New Jersey
- 47:34were in qualitative agreement with those in the literature,
- 47:37in that PM 2.5 consistent signal across many studies
- 47:42while developing kind of gaining new insights
- 47:45regarding the exposure timing in this particular study,
- 47:49obviously more work is needed.
- 47:50And so I guess before jumping to this,
- 47:53we were working on extending this framework.
- 47:57So I'm working with the group at Emory here
- 48:00on extending this framework to allow the windows themselves
- 48:03to vary by something like socioeconomic status
- 48:07or race ethnicity, or other individual level factors.
- 48:10So there's this effect modification floating around now
- 48:13plus the mixtures.
- 48:15So it's becoming a really big task to kind of do all of this
- 48:19in a single framework,
- 48:20but we're trying to take baby steps, essentially.
- 48:23We like where we're at now, we think it works well,
- 48:25it's robust, it fits well
- 48:28and can we extend it next to the questions
- 48:30that are being asked?
- 48:31So again, if you're someone who is asking similar questions,
- 48:34please, we can talk.
- 48:35And I really like enjoy sitting down with collaborators
- 48:38and trying to figure out,
- 48:40develop new methods that can answer the questions
- 48:41that they have.
- 48:43But if you find that,
- 48:45your setting can already be answered
- 48:47by some of these methods that I've discussed today
- 48:50on my website and on my GitHub site,
- 48:52I keep a lot of these packages that I've created
- 48:55with help documentation
- 48:56and then you are always free to reach out to me as well.
- 48:59But if you're looking to do this original Gaussian process,
- 49:03critical window estimation,
- 49:04we have a package for that.
- 49:06Howard Chang at Emory, go through my website again,
- 49:08you'll find this his survival version
- 49:11of the model up there as well.
- 49:13CWVS in this original form is there for download
- 49:16the spatial version,
- 49:17which hopefully we're thinking about extending in
- 49:22soon to account for something like oxidative potential
- 49:25of these pollutants that's also there.
- 49:28And then the newly developed methodology
- 49:30is also there for download and for use as well.
- 49:33And this obviously could not have happened
- 49:35without collaborators, including Howard at Chang at Emory,
- 49:38Lauren at RTI did a lot of data management,
- 49:42Matthew Strickland, and Lindsey
- 49:45at University of Nevada Reno,
- 49:47and then James for providing the,
- 49:50or helping with the data fusion output as well.
- 49:53And here, this grant support here
- 49:56that I mentioned in extreme heat duration,
- 49:58and then data integration methods
- 49:59for environmental exposures.
- 50:01So yeah, please feel free to reach out
- 50:05if you have any questions.
- 50:06This work that I went over today
- 50:09is in press at Annals of Applied Statistics,
- 50:12not on their website yet,
- 50:12but should be really soon.
- 50:15But I think there's a version on archive
- 50:16if you're interested
- 50:17or if you want the most up to date version.
- 50:19I actually think I sent it tomorrow
- 50:20who may have passed it out to the class,
- 50:22but yeah, definitely feel free to reach out
- 50:24if there are any questions or anything I can help with.
- 50:27Yeah, that's it.
- 50:30<v ->Thank you so much.</v>
- 50:31(applause)
- 50:35Our students were impressed with this
- 50:39heavy quantitative focused lecture.
- 50:44We already collected some questions
- 50:45from our students already,
- 50:47but for folks who are joining online,
- 50:50if you do have questions,
- 50:51please feel free to put in the chat box.
- 50:54So the first question,
- 50:56one of the students is observing that
- 50:58in your study, you found the elevator risk
- 51:01was found in week two of the pregnancy,
- 51:05which is very early.
- 51:06So perhaps many pregnant women are not aware
- 51:11of the pregnancy at that time.
- 51:13So in terms of the intervention
- 51:16at this early stage of pregnancy,
- 51:19what's the kind of policy implications that we'll find?
- 51:22<v ->Now that's a really great point.</v>
- 51:24And this is something we've tried to,
- 51:27we haven't figured out how to deal with either,
- 51:29but has we've run into a number of interesting results
- 51:35that we've seen early in the pregnancy.
- 51:38We've particularly,
- 51:38we've seen protective effects at some points
- 51:42for like PM 2.5 exposure and pre-term pregnancy
- 51:45very early on in the pregnancy.
- 51:47And we believe it could be due to the exactly
- 51:51what we're talking about.
- 51:51People who don't actually know they're pregnant
- 51:53at that point.
- 51:54And so miscarriage is an issue
- 51:57that isn't well kind of documented by a lot of these states.
- 52:00There could be just fetal loss in general,
- 52:03that we're not capturing in the birth records.
- 52:05And so there's this population
- 52:08that we're not even including in a lot of our analysis
- 52:12that are lurking around
- 52:13and kind of could be biasing
- 52:14some of these early week results.
- 52:16In terms of policy implications
- 52:20it's a really good question.
- 52:21I don't know other than if I guess it really,
- 52:25if you're trying to get pregnant,
- 52:27if you know you're on that, in that stage,
- 52:30I mean, maybe it's helpful for you,
- 52:31but if you're someone who doesn't know unanticipated
- 52:35there's only so much that can go into outside
- 52:40of just cleaner air altogether.
- 52:42Which is something everyone can kind of agree on.
- 52:45But I think it may only affect a subset of people
- 52:48who are either attempting to get pregnant
- 52:50or kind of really regimented and like,
- 52:52know their schedule for example.
- 52:55But there's this whole other issue about people
- 52:57who aren't in our data set.
- 52:58That's a really great point
- 52:59and we have not figured out how to solve that yet.
- 53:02<v Kai>Yet, tough question.</v>
- 53:04Thanks, Josh.
- 53:05We do have another question
- 53:06from actually two students read this.
- 53:09They really appreciate your talk about this new metrics.
- 53:13And we realize this is the package.
- 53:16Our package is available from your GitHub website.
- 53:19So anyone who's interested in applying that
- 53:21you can download the app package and run,
- 53:25but the students are wondering like
- 53:26beyond this time wearing air pollution mixtures
- 53:30a lot other mixtures in terms of (indistinct)
- 53:33like temperature, green space, other things.
- 53:36So how does your approach this
- 53:39the CWVS mix apply to a broader setting
- 53:43of environment exposures?
- 53:45<v ->I think, my push, and if you read the paper,</v>
- 53:48you'll notice that I really push for people
- 53:50to think about that in their own setting.
- 53:54Cause I think it's generally applicable to any,
- 53:57it doesn't have to be a pregnancy outcome.
- 53:59It doesn't have to be air pollution.
- 54:01What it does have to be is consistently measured
- 54:04across some exposure period.
- 54:06So I'll often get questions that,
- 54:08I have two time periods measured,
- 54:10in the first trimester and then in the third trimester,
- 54:14can I fit your methodology?
- 54:16Well, we need more fine grained exposure information.
- 54:19That's consistent across the individuals
- 54:21in order to estimate these critical windows.
- 54:23So I think the only barrier for entry
- 54:25is that you have consistently estimated
- 54:27kind of exposures for the population of interest.
- 54:30It doesn't matter so much what the exposure is now.
- 54:32I say that, but if you're bringing binary exposures
- 54:36and you have limit of detection issues,
- 54:38there are obviously some issues
- 54:40that will need to be sorted out,
- 54:41but the framework itself should work really well.
- 54:44The other covariate is, you'll notice that
- 54:46a lot of my work has been focused on pregnancy outcomes
- 54:49and that's because the exposure period is so well defined
- 54:52if you're working with something like cancer for example,
- 54:55well, how far do you extend back in time,
- 55:00the exposures like how you could go years and years back.
- 55:03So there's this cumulative idea as well.
- 55:07That's really hard to understand
- 55:08and these distributed lag models are great.
- 55:10As long as you can a priority tell me
- 55:13what the relevant exposure period is.
- 55:14I can tell you if any of the interior parts
- 55:17of that exposure period are important,
- 55:19but if you're telling me you don't know
- 55:21when the exposure period potentially started
- 55:23or it's a completely different conversation.
- 55:25So your outcome has to have,
- 55:27or preferably would have some type
- 55:29of relevant exposure period.
- 55:32It's actually even better
- 55:32for something like cardiac heart defects,
- 55:35which we know the heart forms between like weeks three
- 55:37and eight of pregnancy.
- 55:39So you can really focus in on something like daily
- 55:41or even sub daily
- 55:42if you had that type of exposure information.
- 55:45So yeah, those are the two,
- 55:46generally it should work,
- 55:47but just make sure you have a good sense
- 55:48of the exposure period.
- 55:52<v Kai>Very good point, thanks Josh.</v>
- 55:53And we do have one comment from our on artist.
- 55:57So I read Dr. Warren
- 55:59could you please share your thought on applying
- 56:01the critical window analysis?
- 56:04(mutters)
- 56:07<v ->Sorry, with what?</v>
- 56:09(overlapping conversation)
- 56:10That's a really great point.
- 56:11So over, so I'm actually on sabbatical right now,
- 56:14which is why I couldn't be there in person with you guys,
- 56:17but over the sabbatical
- 56:19I've developed the framework and the code
- 56:22to account for binary outcomes,
- 56:25continuous outcomes and count outcomes as well.
- 56:28Luckily if you've taken my (indistinct) course
- 56:31or you're gonna take it next fall,
- 56:32you'll see how all of these connect
- 56:34and lend themselves really nicely
- 56:36to kind of full conditional distribution updates
- 56:40that make the model fitting process
- 56:41really kind of slick and nice.
- 56:44So you can we have a negative binomial regression,
- 56:47for example, that can do the same thing.
- 56:49You just have count out outcome data,
- 56:51if you have a continuous measure, for example,
- 56:53so I'm really aiming this.
- 56:55I hope this method doesn't just pop up
- 56:57and then disappear, I want people to use it,
- 56:59I want it to be useful.
- 57:00And so that's why I'm trying to extend it
- 57:02and trying to get people to use it in different contexts.
- 57:04So, yeah, definitely I love those types of questions.
- 57:08<v Kai>Thanks Josh.</v>
- 57:09Because we actually have another (speech distorted)
- 57:13So we have to end early
- 57:14and we do have a lot of students questions
- 57:17and I'm sure contact you for just once.
- 57:21So thanks again, Josh for wonderful talk.
- 57:24<v ->No, yeah thanks for being here.</v>