YSPH Biostatistics Seminar: “Estimation and Inference for Networks of Multi-Experiment Point Processes”
October 06, 2022Information
Speaker: Ali Shojaie, PhD, Professor of Biostatistics, Adjunct Professor of Statistics, University of Washington
October 4, 2022
ID8142
To CiteDCA Citation Guide
- 00:00<v ->Today it's my pleasure to introduce,</v>
- 00:02Professor Ali Shojaie.
- 00:05Professor Shojaie holds master's degrees
- 00:07in industrial engineering, statistics,
- 00:10applied math, and human genetics.
- 00:13He earned his PhD in statistics
- 00:14from the University of Michigan.
- 00:17His research focuses on the high dimensional data,
- 00:19longitudinal data, computational biology,
- 00:23network analysis, and neuroimaging.
- 00:26Professor Shojaie is a 2022 fellow
- 00:29of the American Statistical Association
- 00:32and 2022 winner of their Leo Breiman Award.
- 00:36He's a full professor of biostatistics,
- 00:38adjunct professor of statistics,
- 00:40and the associate chair for strategic research affairs
- 00:43in the department of biostatistics
- 00:45in the University of Washington.
- 00:47Let's welcome Professor Shojaie.
- 00:52<v ->Thanks for having me.</v>
- 00:54Sometimes I get moved by the volume of my voice.
- 00:57You guys, can you hear me at the back, okay?
- 01:00Since I'm not gonna use the microphone yet,
- 01:01but I'd rather not use the microphone at all.
- 01:06Well, it's a pleasure to be here
- 01:08and to talk to you about some work that I've doing doing
- 01:12for the past couple of years.
- 01:15I'm using machine learning tools for different types of data
- 01:21that you can understand better how the brain works.
- 01:29The question really is how do we process
- 01:32information on our brains?
- 01:34What is the processing information?
- 01:41The brain through neurons,
- 01:43we know that neurons interact with each other.
- 01:46Neurons do process information.
- 01:51This is of course related to my broader interests
- 01:54on network and understanding how things interact
- 01:57with each other.
- 01:59Naturally I was drawn into this part here,
- 02:03but when I talk to scientist colleagues,
- 02:06then a lot of times I'm asked,
- 02:08what is the goal of understanding that network?
- 02:10How do we use it?
- 02:11How do we
- 02:15take advantage of that network that we learned?
- 02:17Here's an example of some recent work that we've been doing
- 02:21that indicates that learning something about these networks
- 02:26is actually important.
- 02:30I should say that this is joint work
- 02:32with a bunch of colleagues at the University of Washington
- 02:38has done that is biomedical engineering,
- 02:43and the main group that has been running these experiments.
- 02:47And then I'm collaborating with E Shea-Brown
- 02:49who's in computational scientist,
- 02:51and Z Harchaoui, computer scientist slash statistician,
- 02:56and she's been working on this project.
- 02:59This project, the lab is interested.
- 03:02And what they do is neurostimulation.
- 03:05What they wanna do is to see if they could stimulate
- 03:08in different regions of the brain to make in this case
- 03:12monkey do certain things
- 03:14or to restore function that the monkey might have lost.
- 03:18And it's a really interesting platform
- 03:22that they've developed.
- 03:24It's basically small implants that they put
- 03:28in a region of the brain on these monkeys.
- 03:31And the implant has two areas when the lasers
- 03:35beam shine in about 96 in this case,
- 03:41electrodes that collect data
- 03:43in that small region of the brain.
- 03:47This is made possible by optogenetics
- 03:51meaning that it made the neurons sensitive to these lasers.
- 03:55When neurons
- 04:00receive the laser, then they basically get excited,
- 04:03get activate.
- 04:05The goal in this research eventually
- 04:08is to see how the activation of neurons,
- 04:11which plasticity would change
- 04:14the connectivity of the neurons,
- 04:18would result in later on in changing function.
- 04:23That's the eventual goal of this.
- 04:24This research work at the very beginning of that.
- 04:28We are not there yet in terms of understanding function,
- 04:32understanding the link, the connectivity and contact.
- 04:35The collaboration with this lab started
- 04:37when they wanted to predict how the connectivity changes
- 04:41as a result of this activation.
- 04:44We wanted to understand whether by changing various factors
- 04:49in the experiments, the distance between two lasers,
- 04:52the duration of laser.
- 04:54How could they accurately predict the changing connectivity?
- 05:01The way that the experiment is set up
- 05:02is that basically had these times where they have
- 05:07activation and then the latency period
- 05:10and then followed by observation.
- 05:12They basically observe the activity of these brain regions.
- 05:20That sort of 96.
- 05:22Electrodes in this main region over time.
- 05:25That's the data that they're correct.
- 05:31Here's a look at this functional connectivity a
- 05:35and that's what they were trying to predict.
- 05:40Basically the heat map shows
- 05:46the links between the various brain lesions,
- 05:50but 96 of them, you don't wanna.
- 05:56And if that connectivity is defined based on coherence,
- 06:01which is basically correlation measure frequency domain,
- 06:05and we have coherence in four different frequency bands.
- 06:08These are the standard bands that signal instructive
- 06:11and they think that they measure activity
- 06:14and different spatial resolution.
- 06:16We have theta band, the beta band, the gamma band,
- 06:18and the high gamma band.
- 06:20And we wanna see how the connectivity
- 06:22in these different bands changes
- 06:25as the effect of these type neurons.
- 06:31And what...
- 06:37This is not working.
- 06:38The clicker stopped working.
- 06:40We'll figure that.
- 06:51Let's go on full screen again to see where this goes.
- 07:00What basically we have
- 07:01is that we have the baseline connectome
- 07:03and we have these experimental protocols,
- 07:07and we're trying to predict how the connectivity changes.
- 07:10What the lab was doing before was that
- 07:12they were looking at trying to predict connectivity
- 07:14based on experimental protocols.
- 07:18And what they were getting
- 07:19was actually really bad prediction.
- 07:22These are test R squares.
- 07:26And what they were getting was about 5% test R square
- 07:30when they were using these protocol features
- 07:32to predict how to connect with these gene.
- 07:34And the first thing that we understood
- 07:36and so you see it that sort of really bad
- 07:38is that that's the prediction.
- 07:39If that's the prediction that you're getting,
- 07:41then really bad prediction.
- 07:43The first thing that we noticed in this research
- 07:46was that it's actually important to incorporate
- 07:50the features of the current state of connectivity
- 07:53in order to predict how to make them useful.
- 07:56What we did was that in addition to those protocol features,
- 07:59we added some network features,
- 08:01the current state of the network in order to predict
- 08:03how it's gonna change.
- 08:04And this is, to me, this is really interesting
- 08:06because it basically says that our prediction
- 08:10has to be subject specific
- 08:13depending on the current state of each month
- 08:14these connectivity, how their connectivity
- 08:18is going to change will be different.
- 08:21And what we saw was that when we incorporated
- 08:24these network features, we were able to improve quite a bit
- 08:28in terms of prediction.
- 08:29We're still not doing hugely good,
- 08:33we're only getting like test R squared of what, 25%.
- 08:36But what you see that sort of the connectivity
- 08:38is now, the prediction is now much more.
- 08:41How the connectivity.
- 08:43And also in terms of the pictures, you see that going from,
- 08:46so say this is the true,
- 08:48the first part in d is the true change in connectivity,
- 08:52e is what you would get from just the protocol features,
- 08:56and you see that prediction is really bad,
- 08:57and f is what you get when you combine protocol features
- 09:01and the network features.
- 09:03That prediction is closer to the true
- 09:09change in connectivity than just using the protocol feature.
- 09:12This was the first thing that we learned from this research.
- 09:15The second part of what we learned is that
- 09:18it also matters which approach you used the prediction.
- 09:21What they had done was that they were using some simple
- 09:24like linear model for prediction.
- 09:26And then we realized that we need to use something more
- 09:30expressive and then we sort of ended up using
- 09:32these non-linear additive models
- 09:34that we had previously developed,
- 09:36partly because while they have a lot of expressive power,
- 09:40they're still easy to interpret.
- 09:43Interpretation for these additive models is still easy
- 09:46and particularly we see what the shapes
- 09:51basically these functions are.
- 09:52For example, with the distance we see how the function
- 09:55changes and that helps with the design of these experience.
- 09:58I'm not gonna spend too much time
- 10:00talking about the details of this
- 10:01given that we only have 50 minutes
- 10:03and I wanna get to the main topic,
- 10:05but basically these additive models
- 10:08are built by combining these features.
- 10:11Think of tailor expansion in a very simple sense
- 10:14that you have a linear term, you have a quadratic term,
- 10:17you have a cubic term.
- 10:18And the way that sort we form these additive models
- 10:21is that we automatically select the degree of complexity
- 10:26of each additive feature,
- 10:28whether it's says linear, or quadratic, or cubic, etcetera.
- 10:32We also allow some features to be present in the models,
- 10:36features not to be present.
- 10:37What we end up with are these patterns
- 10:41where some features are real complex and other features,
- 10:43and that's automatically decided from data.
- 10:47This model is good in this prediction
- 10:51and it allows us to come up with these sets of predictions.
- 10:53We see now that for example, for coherence difference,
- 10:58which is the network feature,
- 10:59that's the coherence difference.
- 11:01Network distance, that's the distance
- 11:03between the two portals.
- 11:04The two laser points.
- 11:05We get these two patterns estimated
- 11:07and then when we combine them, we get this surface basically
- 11:10that determines how the connectivity,
- 11:15changing connectivity could be predicted
- 11:17based on these two features.
- 11:18And all of this is done automatically based on data.
- 11:23This approach, again, sort of the key feature of it
- 11:25is that it combines the network features
- 11:28of the current state of connectivity with protocol features
- 11:30in order to do a better job of prediction.
- 11:33This is a research that we just started
- 11:36and we will continue this research
- 11:39for the next at least five years.
- 11:42But the goal of it is eventually to see
- 11:44if we could predict the function
- 11:46and ultimately if we could build a controller
- 11:49that we could determine how to change function
- 11:52based on various features of the experiment.
- 11:57I mentioned all of this to say that knowing
- 11:59and learning the network matters.
- 12:01We need to learn the current state of connectivity,
- 12:04for example, in this work in order to be able to design
- 12:07experiments that would hopefully help
- 12:12and restore function.
- 12:15Now in this particular work,
- 12:17what we did was that we used a very simple
- 12:20notion of connectivity.
- 12:21We used coherence, which is basically correlation,
- 12:24but we know that that's not always the best
- 12:28way to define connectivity between ranges.
- 12:32And so what I wanna talk about for the remaining
- 12:3640 minutes or so is how do we learn connectivity
- 12:40between neurons?
- 12:42And this is using a different type of data
- 12:45that I had thought about before,
- 12:46and I'm hoping that so I could show you this clip,
- 12:51which is that shows the actual raw data.
- 12:55The data is actually a video.
- 12:58And this is activity of individual neurons
- 13:00in a small region of the brain.
- 13:03These dots that you see popping up,
- 13:04these are individual neurons firing over time.
- 13:10And you see that sort of neuron fires
- 13:12and other neuron fires, et cetera, et cetera.
- 13:15That's the raw data that we're getting.
- 13:18And the goal is to understand
- 13:21based on this pattern of activation of neurons,
- 13:24how neurons talk to each other basically.
- 13:27Now I'm gonna go back here.
- 13:34And so the data of that video that I showed you,
- 13:38basically, here's some snapshot of that data.
- 13:41Here's one frame.
- 13:43And there's a lot of steps in getting this data
- 13:46to place it a bit more quick.
- 13:50Were not gonna talk about this,
- 13:52but sort of we need to first identify where the neurons are.
- 13:55No one tells us where the neurons are in that video.
- 13:58We need to first identify where the neurons are.
- 14:00We need to identify when they swipe, when they fire.
- 14:03No one tells us that either.
- 14:05There's a lot of pre processing step that happens.
- 14:09The first task is called segmentation,
- 14:11identifying where the neurons are,
- 14:13then spike detection, when the nuance fire over time,
- 14:15when which individual neuron fires over time.
- 14:17And that none of these is a trivial task.
- 14:19And then a lot of smart people are working on these,
- 14:22including some of my colleagues.
- 14:25After a lot of pre-processing,
- 14:26so you end up with each individual neuron,
- 14:28you end up with a data point, like data set like this
- 14:31that it basically has these takes
- 14:35whenever the neuron has fired.
- 14:39A given neuron you have over time that the neuron fire
- 14:42like this.
- 14:45These are the time points the neuron apply.
- 14:47Now, you can do something fancier,
- 14:49you can look at the magnitude,
- 14:51the signal that you're detecting at neuron.
- 14:53You could deal with that, but for now we're ignoring that.
- 14:55We're just looking at when they fire.
- 14:58This is called the spike train for each neuron.
- 15:01That's the data that we're using.
- 15:05These are neurons firing times.
- 15:07And if we combine them, this is the cartoon
- 15:09we get something like this.
- 15:10We get a sequence of activation pattern.
- 15:13This is color coded based on that sort of five neuron
- 15:16sort of cartoon network.
- 15:18And you see that different neurons activate
- 15:19at different times.
- 15:23And what I'll talk about is a notion of connectivity
- 15:25that tries to predict the activation pattern of one neuron
- 15:29from a network, basically.
- 15:31That sort of maybe neuron one tells us something
- 15:34about sort of activation patterns in neuro two,
- 15:36that if we knew when neuro one activated or fired,
- 15:39we could predict when neuro on two fires,
- 15:41and maybe neuron two will tell us something
- 15:43about activations of neurons three and four, et cetera.
- 15:46And that's the notion of connectivity at that time
- 15:49after, since we're trying to estimate those edges
- 15:51in this time.
- 15:53Now, please.
- 15:55<v ->Could you say just a few words informally</v>
- 15:57about the direction of connectivity?
- 15:58<v ->Yeah.</v>
- 15:59<v ->Maybe drawing arrow forward in time.</v>
- 16:00<v ->Yes.</v>
- 16:01I'll get to this, maybe in the next two slides.
- 16:06The framework that we're gonna work with
- 16:09is called the Hawkes process.
- 16:11Just go back to seminal more by Alan Hawkes.
- 16:14In '70s where he looked at spectral properties
- 16:19of point processes.
- 16:20What are point processing that basically is like activation
- 16:23over time.
- 16:24Zeros and ones over time.
- 16:26It could Poisson processes.
- 16:29What the Hawkes process does in particular
- 16:31is that it uses the past history of one neuron
- 16:37to predict the future.
- 16:39And this goes back to Forest's question
- 16:42that sort of what is that edge in this case?
- 16:44This is the notion that is related closely in a special case
- 16:48of what is known to econometricians as Granger causality
- 16:52that sort of using past to predict future.
- 16:55And that's the notion of connectivity
- 16:57that we're here at, we're after in this particular case.
- 17:03And what makes this Hawkes process
- 17:05the convenient for this is that
- 17:07sort of it's already set up to do this.
- 17:08I'm gonna present the Hawkes process.
- 17:10Its simplest form, this is the linear Hawkes process.
- 17:13And what it is, is that sort o, it's a counting process.
- 17:17It's just counting the events.
- 17:20And so that's the event process N.
- 17:25And that event process has an intensity lambda j
- 17:31for each neuron is standard i,
- 17:33which is combination of two terms,
- 17:37a new I, that's the baseline intensity of that neuron.
- 17:40That means that if you had nothing else,
- 17:43this neuron would fire at this rate, but basically random
- 17:47that would fire at random rate
- 17:51plus the effect that that neuron
- 17:53gets from the other neurons.
- 17:55Every time that there's an activation in neuron,
- 17:58any neuron j from one to p including neuron i itself,
- 18:03depending on how long it's been since that activation.
- 18:05The time it's been, the current time t
- 18:08and the time of activation of the previous neuron
- 18:09acquiring or the previous neuron,
- 18:11some weight function determines how much influence
- 18:15that neuron pi gets.
- 18:17This has a flavor of causality,
- 18:20which is why econometricians call it danger causality.
- 18:24This is worked by the ranger,
- 18:29but it's really not causality.
- 18:30We know that there's beyond,
- 18:32and so there's a lot of work on this
- 18:33that's sort, it's only causality
- 18:34on the day-to-day restrictive assumptions,
- 18:37talk about in general,
- 18:38but nonetheless it predicts in the future.
- 18:41It's a prediction in the future.
- 18:43And again, sort of in this case this d and i
- 18:47is our point process, lambda i is our intensity process.
- 18:52It started itself.
- 18:54Ui is the background intensity
- 18:56and tjks are the times when the other neurons
- 19:01acquired in the past.
- 19:03And this omega ij is the transfer function.
- 19:06It determines how much information is passed
- 19:09from firing your one neuron
- 19:11to firing of other neurons in the future.
- 19:14And usually you think that sort of the further
- 19:16you go in the past, the less information is carrying over.
- 19:19Usually the types of functions that you consider,
- 19:21these transfer functions are decay
- 19:23and how to decay form
- 19:25that sort of, if you go too far in the past,
- 19:27there's no information, there's no useful information.
- 19:30Any question on the basic of this linear Hawkes process
- 19:33because I'm not gonna present the more complicated version,
- 19:38but I think this will suffice for our conversation.
- 19:41I wanna make sure that we're all good
- 19:43with this simple version.
- 19:48Okay, so no question on this.
- 19:51But if we agree with this and then this actually process
- 19:55gives us a very convenient way
- 19:56of defining that connectivity.
- 19:59What it meant by connectivity now basically means
- 20:02that this function omega ij, if it's non zero,
- 20:06then that means that there's an edge
- 20:07between neuron j and neuron I.
- 20:09And that's basically what I was showing you
- 20:11in that bigger module.
- 20:13It all comes down to estimating
- 20:15whether omega ij is zero or not for this Hawkes process.
- 20:21Okay.
- 20:23Let me show you a zero simple example
- 20:25with two neurons.
- 20:26In this case, neuron one has no other influence.
- 20:32It's only it's past history and baseline intensity.
- 20:36Neuron two has an edge on neuron one.
- 20:40Let's see what we would expect for the intensity
- 20:43of neuron one.
- 20:44If we think about neuro one,
- 20:47then it's basically a baseline intensity, that new one.
- 20:51And it's gonna fire at random times for some process.
- 20:56It's gonna fire at random times with the same intensity.
- 20:59The intensity is not gonna change because fixed,
- 21:02we could allow that intensity to be time varying, et cetera,
- 21:05make it more complicated but in it simplest form
- 21:08that neuron is just gonna fire randomly,
- 21:11every time that they sort of it wants.
- 21:15Now, neuron two would have a difference story
- 21:19because neuron two depends on activation of neuro one.
- 21:22Any time that neural one fires, the intensity of neuron two
- 21:28goes from, let's say the baseline is zero for neuron two,
- 21:31but every time that neuron one fires,
- 21:33the intensity of neuron two becomes non zero
- 21:36because it got excitement from neuron one.
- 21:38It responds to that.
- 21:40Neuron two would require to, and then when you have
- 21:42like three activations, you can get
- 21:45the convolution of effects that would make neuron two
- 21:48more likely to activate as well or to spike as well.
- 21:54And then so this is a pattern that sort of basically
- 21:56what we are doing here is that we're taking
- 21:58this to be on omega
- 22:02to one, that sort of this you see there's the K form
- 22:05and these get involved if you have more activation
- 22:09on neuron one, that sort of increases the intensity
- 22:12of neuron two, meaning that we have more of a chance
- 22:16for neuron two to fire and this.
- 22:20Say this simple example, this could be the intensity
- 22:23of neuron two.
- 22:24And in fact this all we observe in this case
- 22:29are these two spike trains for neuron one and neuron two.
- 22:32We don't observe the network,
- 22:35in this case there are four possible edges.
- 22:37One of them is the right edge.
- 22:38We don't observe the intensity processes.
- 22:41All we observe is just the point process, the spike.
- 22:45And the goal is to estimate the network
- 22:47based on that spike train.
- 22:49And in fact,
- 22:53as part of that, we also need to estimate that process.
- 23:01That estimation problem is not actually that complicated.
- 23:06If you think of it, it's trying to predict
- 23:10now based on past.
- 23:13We could do prediction.
- 23:14We could use basically penalized regression.
- 23:18It's a penalized Poison regression.
- 23:20Something along those lines.
- 23:21A little bit more complicated,
- 23:22but basically it's a penalized Poisson regression
- 23:24and we could use the approach similar
- 23:27to what is known as neighborhood selection.
- 23:28We basically meaning that we regress each neuron
- 23:31on the past of all other neurons,
- 23:33including that neuron itself.
- 23:34It's a simple regression problems.
- 23:36And then we use regularization to select a subset of them
- 23:39that are more informative, et cetera.
- 23:42And there's been quite a bit of work on this,
- 23:45including some work that we've done.
- 23:47The work that we've done was focused more
- 23:49on extending the theory of these Hawkes processes
- 23:55to a setting that is more useful
- 23:58for neuroscience applications.
- 24:00In particular, the theory that existed was focused mostly
- 24:06on the simple linear functions, but also on the case
- 24:11where we had non-negative transfer functions.
- 24:14And this was purely an artifact
- 24:17that the theoretical analysis approach that Hawkes had taken
- 24:22and using these what are known as cluster representation.
- 24:28What Hawkes and Oakes had done was that they were
- 24:33representing each neuron as a sum of, sorry,
- 24:39homogeneous Poisson processes,
- 24:42activation pattern of each neuron
- 24:44as some of homogeneous Poisson process.
- 24:46And because there was a sum that could not allow
- 24:48for omega ijs to be negative,
- 24:51'cause they would cancel throughout and we would get less.
- 24:56What we did, and this was the work of my former student,
- 25:00Chen Chang who's Davis, was to
- 25:06come up with an alternative framework,
- 25:09theoretical framework motivated by the fact that
- 25:10we know that neuroscience activations are not just positive,
- 25:15they're not all excitement,
- 25:18they're also inhibitions happening.
- 25:21Neuroscience and in any other biological system really,
- 25:24we can't have biological systems being stable
- 25:28without negative feedback.
- 25:29These negative feedback groups are critical.
- 25:32We wanted to allow for negative effects
- 25:36or the effects of inhibition.
- 25:38And so we came up with a different representation
- 25:40based on what is known as thinning process representation
- 25:44that then allowed us to get a concentration
- 25:48for general.
- 25:48I won't go into details of this,
- 25:50that basically we get something that we can show
- 25:53that for any sort of function,
- 25:59we get a concentration around its need in a sense.
- 26:03And so using this as an application,
- 26:06then you could show that sort of with high probability,
- 26:08we get to estimate the network correctly
- 26:11using this name of selection type approach.
- 26:16This is estimation but we don't really
- 26:20have any sense of whether...
- 26:27Let's skip over this for the sake of time.
- 26:29You don't really have any sense of whether
- 26:31the edges that we estimate are true edges or not.
- 26:33We don't have a measure of uncertainty.
- 26:35We have theory that shows that
- 26:37sort of the pi should be correct
- 26:39but we wanna maybe get a sense of uncertainty about this.
- 26:43And so the work that we've been doing more recently
- 26:48focused on trying to quantify the uncertainty
- 26:50of these estimates.
- 26:52And so there's been a lot of work over the past
- 26:55almost 10 years on trying to develop inference
- 26:59for these regularized estimation procedures.
- 27:03And so we're building on these work,
- 27:05existing work in particular,
- 27:06we're building on work on
- 27:11inferences for vector risk processes.
- 27:14However, there's some differences
- 27:17most importantly that vector risk processes capture a fixed
- 27:24and pre-specified lag, whereas in the Hawkes process case,
- 27:28we have each basically dependence over the entire history.
- 27:34We don't have a fixed lag and it's all pre-specified.
- 27:38And also another difference
- 27:40is that vector auto-aggressive processes
- 27:42needs pardoning.
- 27:44Its' observed over this free time,
- 27:45whereas the Hawkes process is observed
- 27:48over a continuous time.
- 27:50It's a continuous time process
- 27:50and that that adds a little bit of challenge,
- 27:52but nonetheless, so we use this de-correlated
- 27:56score testing work
- 27:57which is based on the work of Ning and Liu.
- 28:01And what I'm gonna talk about in the next couple of slides
- 28:07is an inference framework for these Hawkes processes.
- 28:11Again, what I showed you before,
- 28:14the simple form of linear Hawkes process
- 28:16and motivated by your neuroscience applications,
- 28:19what we can consider is something quite simple,
- 28:22although, we could generalize that.
- 28:24And that generalization is in the paper
- 28:26but the simple case is to consider something like omega ij
- 28:30as beta ij times some function pathway j
- 28:34where that function is simply decay function over time.
- 28:40It's like exponentially decaying function.
- 28:43It's class decay function.
- 28:46That's called a transition for neuroscience applications.
- 28:49And so if we go with this framework then that
- 28:54beta ij coefficient determines the connectivity for us,
- 28:58that this beta ij, if it's positive,
- 29:01that means that sort of there's an excitement effect.
- 29:03If it's negative, there's an inhibition effect,
- 29:05and if it's zero, there's no influence from one or data.
- 29:08All we need to do really is to develop inference
- 29:11for this beta ij.
- 29:14And so that is our goal.
- 29:17And to do that, I'll go into a little bit of technicalities
- 29:23and detail of not enough too much.
- 29:25Please stop me if there are any questions.
- 29:27The first thing we do is that we realize
- 29:29that we can represent that linear Hawkes process
- 29:34as a form of basically a regression almost.
- 29:38The first thing we do is we turn it into this
- 29:44integrated stochastic process.
- 29:46We integrate all the past
- 29:49that form that sort of seemed ugly,
- 29:51we integrate it so that it becomes
- 29:53a little bit more compact.
- 29:55And then once we do that, we then write it pretty similar
- 29:59to regression.
- 29:59We do a change of variable basically.
- 30:01We write that point process dNi as as our outcome Yi
- 30:07and then we write epsilon i to be Yi minus lambda
- 30:11to be added subtract lambda i sense.
- 30:15And that allows us to write things
- 30:18as a simple form of regression.
- 30:22Now this is something that's easy
- 30:24and we're able to deal with.
- 30:25The main complication is that sort of this a regression
- 30:28with the hetero stochastic noise.
- 30:32Sigma it squared depends on the past
- 30:36this also time period.
- 30:38It depends on the beta lambda.
- 30:42Okay, so once we do this
- 30:49then to develop a test for beta ij,
- 30:53we could develop a test for beta ij
- 30:55and then this also could extended to testing multiple betas
- 31:00and sort of allowing for ground expansions et cetera.
- 31:03And even nonstationary the baseline,
- 31:06but the test is basically
- 31:09now based on this de-correlated score test.
- 31:11Once we write in this regression form,
- 31:13we can take this de-correlated score test
- 31:15and I'll skip over the details here
- 31:19but basically we form this set of octagonal columns
- 31:23and define a score test based on this
- 31:26that looks something like this,
- 31:28that you're looking at the effect of the correlated j
- 31:32with basically noise term, epsilon i.
- 31:36Both of these are driven from data based on some parameters,
- 31:40but once you have this, this Sij
- 31:43then you could actually now define a test
- 31:47that basically looks at the magnitude of that Sij.
- 31:53And that's the support that we could use.
- 31:59And under the no, we can show that this test SUT
- 32:02converges to a pi square distribution
- 32:05and we could use that for testing.
- 32:08In practice, you need to estimate these parameters.
- 32:10We estimate them, we ensure that things still work
- 32:13with the estimated parameters
- 32:15and still so that you have can register pi squared.
- 32:19And you can also do confidence and all this sector.
- 32:24Maybe I'll just briefly mention
- 32:26that this also has the usual power that we expect
- 32:29that you can study power of this as a local alternative.
- 32:35And this gives us basically how that we would expect.
- 32:41And simulation also behaves very close
- 32:45to the oracle procedure that knows which neurons
- 32:47acting with other.
- 32:50What we've done here is that
- 32:51we've looked at increasing sample size
- 32:54or own length of the sequence from 200 to 2,000
- 32:58and then we see that sort of type one error
- 33:01becomes pretty well controlled as time increases.
- 33:05The pink here is oracle.
- 33:06The blue is our procedure.
- 33:08The power also increases as the sample size increases.
- 33:14And also look at the coverage of the confidence involved.
- 33:18Both for the zeros and non zeros,
- 33:21the coverage also seems to be well behaved.
- 33:26This is simple setting of simulation but that looks like
- 33:32it's not too far actually in application
- 33:35that we've also looked at.
- 33:38And in particular we've looked at some data
- 33:42paper that was published in 2018 in nature
- 33:45when they had looked at activation patterns of neurons
- 33:50and how they would change with and without laser.
- 33:54And at the time this was like the largest,
- 33:57so they had multiple device that they had looked at,
- 34:00and this was the largest region
- 34:02that they had looked at had 25 neurons.
- 34:04The technology has improved quite a bit.
- 34:06Now there's a couple of hundred neurons
- 34:08that they could measure,
- 34:09but this was 25 neurons.
- 34:10And then what I'm showing you are the activation patterns
- 34:14without laser and with laser
- 34:16and not showing the edges that are common
- 34:19between the two networks.
- 34:20I'm just showing the edges are different
- 34:21between these networks.
- 34:23And we see that these betas,
- 34:25some of them are clearly different.
- 34:28In one condition the coefficient covers zero
- 34:32and the other conditions not cover.
- 34:33And that's why you're seeing these difference in networks.
- 34:36And that's similar to what they had observed
- 34:39based on basically correlation that as you activate
- 34:43there's more connectivity among these neurons.
- 34:49Now in the actual experiments,
- 34:51and this is maybe the last 15 minutes or so by top,
- 34:57in the actual experiments, they don't do just a simple
- 35:00one shot experiment because they have to implant
- 35:03this device.
- 35:06This is data of a mouse.
- 35:08They have to implant this device on mouse's brain.
- 35:11And so what they do is that they actually,
- 35:13once they do that and sort of now with that camera,
- 35:16they just measure activities of neurons.
- 35:18But once they do that, they actually run
- 35:20a sequence of experiments.
- 35:23It's never just a single experiment or two experiments.
- 35:25What they do is that they, for example,
- 35:28they show different images, the mouse
- 35:31and they see the activation patterns of neurons
- 35:34as the mouse processes different images.
- 35:36And what they usually do is that sort they show an image
- 35:38with one orientation and then they have a washout period.
- 35:42They show an image with different orientation,
- 35:44they have a washout period.
- 35:45They show an image with a different orientation
- 35:47and then they might use laser
- 35:50in combination of these different images et cetera.
- 35:53What they ended up doing
- 35:54is that they have many, many experiments.
- 35:56And what we expect is that the networks
- 35:59in these different experiments
- 36:00to be different from each other
- 36:02but maybe share some commonalities as well.
- 36:04We don't expect completely different networks
- 36:06but we expect somewhat related networks.
- 36:09And over different time segments
- 36:13the network might change.
- 36:15In one segment it might be that and the next segment
- 36:19it might change to something different
- 36:20but maybe some parts of the network structure are like.
- 36:25What this does is that it sort of motivates us
- 36:27to think about join the estimate in these networks
- 36:29because each one of these time segments
- 36:31might not have enough observation to estimate accurately.
- 36:35And this goes back to the simulation results
- 36:36that I showed you, that in order to get to good control
- 36:41of type one error and good power,
- 36:43we need to have decent number of observations.
- 36:45And in each one of these time segments
- 36:47might not have enough observations.
- 36:50In order to make sure that we get high quality estimates
- 36:54and valid inference,
- 36:57we need to maybe join the estimations
- 37:00in order to get better quality estimates and influence.
- 37:11That's the idea of the second part
- 37:13of what I wanna talk about going beyond
- 37:17the single experiment and trying to do estimation
- 37:19and inference, and multiple experiments of similar.
- 37:22And in fact in the case of this paper by and Franks
- 37:26they had, for every single mouse,
- 37:30they had 80 different experimental setups
- 37:33with laser and different durations
- 37:35and different strengths.
- 37:37It's not a single experiment for each mouse.
- 37:39It's 80 different experiments for each mouse.
- 37:42And you would expect that many of these experiments
- 37:44are similar to each other
- 37:45and they might have different degrees of similarities
- 37:47with each other that might need to take into account.
- 37:53Then the goal of the second part is do joint estimation
- 37:56of inference for settings where we have multiple experiments
- 37:59and not just a single experiment.
- 38:02To do this, we went back to basically
- 38:05that destination that we had
- 38:07and previously what we had was the sparsity type penalty.
- 38:11What we do is that sort of now we added
- 38:12a fusion type penalty.
- 38:14Now we combine the estimates in different experiments.
- 38:19And this is based on past work that I had done
- 38:22with the the post
- 38:24but the main difference in this board is that
- 38:28now we wanna allow these estimates
- 38:32to be similar to each other
- 38:33based on a data-driven notion of similarity.
- 38:36We don't know which experiments
- 38:37are more similar to each other.
- 38:40And we basically want the data to tell us which experiments
- 38:43should be more similar to each other, should be combined
- 38:46and not necessarily find that a priority person
- 38:51usually don't have that information.
- 38:53These data-driven weights are critical here,
- 38:57and we drive these data-driven weights
- 38:59based on just simple correlations.
- 39:01We calculate simple correlations.
- 39:02The first step we look to see which one of these conditions,
- 39:05the correlations are more correlated with each other,
- 39:09more similar to each other
- 39:11based on these correlations.
- 39:13And we use these cost correlations to then define ways
- 39:17for which experiments should be more closely used
- 39:20with each other.
- 39:21And estimates on which experiments
- 39:22should be more closely used.
- 39:25And I leave that in terms of details
- 39:29but in this similar setting
- 39:32as what I had explained before
- 39:34in terms of experimental setup for this,
- 39:37I'm sorry, in terms of simulation setup,
- 39:39there are 50 neurons in network
- 39:42from three different experiments in this case
- 39:44of three different lengths,
- 39:45and we use different estimators.
- 39:48And what we see is that sort of when we do this fusion,
- 39:51we do better in terms of the number of two positives
- 39:54for any given number of estimated edges
- 39:57compared to separately estimating
- 39:59or compared to sort of other types of fusions
- 40:02that what one might consider.
- 40:06Now, estimation is somewhat easy.
- 40:10The main challenge was to come up
- 40:12with these data-driven weights.
- 40:14The main issue is that if you wanted to come up with
- 40:19valid infants in these settings,
- 40:21when we have many, many experiments,
- 40:24then then we would have very low power if we're adjusting,
- 40:27for example, from all comparison using FDR, FWER,
- 40:31false discovery rate or family-wise error rate,
- 40:35we have p squared times MS.
- 40:37And so we have a low power.
- 40:40To deal with this setting, what we have done
- 40:42is that we've come up with a hierarchical testing procedure
- 40:45that avoids testing
- 40:50all these p squared times M coefficient.
- 40:52And the idea is this,
- 40:53the idea is that if you have a sense of which conditions
- 40:57are more similar to each other,
- 40:59we construct a very specific type of binary tree,
- 41:03which basically always has a single node
- 41:07on the left side in this case.
- 41:09And then we start on the top of that tree
- 41:11and and test for each coefficient.
- 41:13We first test Albany experiments.
- 41:16If you don't reject, then you stop there.
- 41:18If you reject then we test one, and two,
- 41:22three, and four separately.
- 41:25If you reject one, then we've identified the non
- 41:28make the non zero edge.
- 41:30If you reject two, three, four, then we go down.
- 41:34If you don't reject two, three, four, we stop there.
- 41:36This way we stop at the level that is appropriate
- 41:39based on data.
- 41:42And this this ends up especially in sparse networks,
- 41:44this ends up saving us a lot of tests
- 41:49and gives us significant improvement in power.
- 41:51And that's shown in the simulation
- 41:53that you end up, if you don't do this,
- 41:57your power decreases as the number of experiments increases.
- 42:01And in this case you've gone up to 50 experiments
- 42:04as I mentioned.
- 42:04The golden and facts paper has about 80.
- 42:07Whereas if you don't do that
- 42:09and if your network sparse actually power,
- 42:12you see that by combining experiments,
- 42:15you actually gain power
- 42:16because you're incorporating more data.
- 42:19And this is more controlling the family-wise error rate.
- 42:22And both methods control the famil-wise error rate.
- 42:25We haven't developed anything for FDR.
- 42:27We haven't developed theory for FDR
- 42:29but the method also seems to be controlling FDR
- 42:32in a very stringent way actually.
- 42:35But we just don't have theory for FDR control
- 42:38'cause that becomes more complicated.
- 42:46I'm going very fast because of time
- 42:47but I'll pause for a minute.
- 42:49Any questions.
- 42:53Please.
- 42:54<v ->What do you think about stationary</v>
- 42:56of the Hawkes process in the context?
- 42:58Whether it's the exogenous experimental forcing
- 43:01and like over what timescale did that happen
- 43:03in the stationary, the reasonable?
- 43:04<v ->Yeah, that's a really good question.</v>
- 43:11To be honest, I think these hard processes
- 43:13are most likely non stationary.
- 43:14The two mechanisms of non stationary that could happen.
- 43:20One, we try to account for it.
- 43:22I skipped over it but we tried to account
- 43:25for one aspect of it by allowing the baseline rate
- 43:28to be time varying.
- 43:38Basically we allow this this new i to be a function of time.
- 43:43Baseline rate for each neuron is varying over time.
- 43:48And the hope is that, that would capture
- 43:49some of the exogenous factors that might influence overall.
- 43:56It could also be that the data are changing over time.
- 44:00That sort of we haven't done or it could in fact be that
- 44:06we have abrupt changes
- 44:10in patterns of either activation or the baseline over time,
- 44:15but sort all of a sudden something completely changes.
- 44:17We have piecewise stationary, not monotone sort of,
- 44:22not continuous, not stationary.
- 44:24We have piecewise.
- 44:26We have experimental that's happening,
- 44:28something happening and then all of a sudden
- 44:30something else is happening.
- 44:31This eventually would capture maybe plasticity
- 44:35in these neurons to neuroplasticity to some extent
- 44:39that sort of allows for changes of activity over time,
- 44:42but beyond that we haven't done any.
- 44:45There's actually one paper that has looked
- 44:47at piece stationary for these hard processes neuron.
- 44:52It becomes a competition, very, very difficult problem,
- 44:56especially the person becomes very difficult problem.
- 44:59But I think it's a very good question.
- 45:03Aside from that one paper much else that has done.
- 45:11<v ->Hi, thank you professor for the sharing.</v>
- 45:13I have a question regarding the segmentation
- 45:17'cause on the video you showed us,
- 45:19the image is generally very shaky.
- 45:23In the computer vision perspective,
- 45:25it's very hard to isolate which neuron actually fired
- 45:28and make sure that it's that same neuron fires over time.
- 45:32And also the second question is that the mouse
- 45:36factory, the model you've mentioned is like 20 neurons,
- 45:39but in the picture you show us there's probably
- 45:42thousands of neurons.
- 45:42How do you identify which 20 neurons to look at?
- 45:46<v ->Very good questions.</v>
- 45:48First of all, before they even get to segmentation,
- 45:51they need to do what is known as,
- 45:55and this is actually common in
- 45:59time series and sort of (indistinct).
- 46:03In registration.
- 46:07What this means is that you first need to register
- 46:09the images so that they're basically aligning correct.
- 46:13Then you can do segmentation.
- 46:14If you remember first five,
- 46:17but if you remember had a couple of dots
- 46:20before getting to segmentation.
- 46:21There are a couple of steps that need to happen
- 46:23before we even get to segmentation.
- 46:25And part of that is registration.
- 46:27Registration is actually a nontrivial pass
- 46:29to make sure that the vocations don't change.
- 46:32You have to right otherwise that the algorithm
- 46:36will get confused.
- 46:37First there's a registration that needs to happen
- 46:41and some background correction
- 46:43and sort of getting noise correctly and everything.
- 46:45And then there's registration.
- 46:47And then after that you could do segmentation,
- 46:49identifying neurons.
- 46:50Now, the data that they showed you was a data
- 46:52from actually cats video that showed it's different,
- 46:56this holding and banks data that they showed you here.
- 47:00This one had 25 neurons that they had.
- 47:03This is an older technology.
- 47:04It's an older paper that they only had 25 neurons,
- 47:07that they had smaller regions that they were capturing.
- 47:10The newer technologies, they were capturing
- 47:11the larger region a couple hundred.
- 47:14I think the most I've seen
- 47:16was about a thousand or so neurons.
- 47:17I haven't seen more than a thousand neurons.
- 47:20<v ->Thank you.</v>
- 47:25<v ->Okay, so I'm close to the end of my time.</v>
- 47:29Maybe I'll have the remaining minutes or so
- 47:34I'll basically mention that sort of
- 47:37give by this saying we have joint estimation
- 47:42to the data from holding advance.
- 47:43And then we also see that something that is not surprising
- 47:48perhaps that the no laser condition,
- 47:51the net yield is more different
- 47:53than the two different magnitudes of laser,
- 47:55maybe 10, 20 sort of meters and so square.
- 48:02You see that so least two are more similar other
- 48:05than the no laser condition.
- 48:10And I'm probably gonna stop here
- 48:12and sort of leave a couple of minutes for questions,
- 48:14additional questions, but I'll mention that
- 48:15so the last part I didn't talk about was to see if we could
- 48:19go beyond prediction.
- 48:20Could we use this and mention that sort major causality
- 48:23is not really causality prediction.
- 48:27It could we go beyond prediction,
- 48:31get a sense of which neurons are impacting other neurons.
- 48:35And I'll briefly mention that sort of there are two issues
- 48:39in general going beyond prediction causality.
- 48:45We have a review paper that tlaks about this one,
- 48:47issue is subsampling.
- 48:48And that you don't have enough resolution.
- 48:51And the other issue is where you might have
- 48:53limited processes that make it difficult
- 48:55to answer all the questions.
- 48:57Fortunately the issue of self sampling,
- 49:00which is a difficult issue in general is not present,
- 49:04but is not very prominent thinking these classroom
- 49:09and imaging data
- 49:10because you have continuous time videos.
- 49:14And subsampling should not be a big deal in this case.
- 49:19However, we observe a tiny faction
- 49:23of the connection of the brain.
- 49:25The question is, can we somehow account
- 49:27for all the other neurons that we don't see?
- 49:31The last part of this work is about that.
- 49:34And I'll sort of jump to the end
- 49:38because I'll put a reference to that work.
- 49:41That one is published in case you're interested
- 49:43in a paper that sort of looks at
- 49:49whether we could go beyond prediction,
- 49:51whether they actually identify causal links
- 49:54particularly neurons.
- 49:56And I think I'm gonna stop here and thank you guys
- 50:00and I'm happy to take more questions.
- 50:17<v ->Naive question.</v>
- 50:19Biologically, what is a network connection here?
- 50:24Because they're not, I'm assuming they're not
- 50:27growing synapses or not based on the laser.
- 50:33(indistinct)
- 50:36(group chattering)