# YSPH Biostatistics Seminar: “Motif Expansion of Global Information Flow in Spike Train Data”

September 15, 2022## Information

Alexander Strang, Kruskal Instructor, Statistics and Applied Math, University of Chicago

September 12, 2022

ID8065

To CiteDCA Citation Guide

- 00:00<v Robert>Good afternoon.</v>
- 00:01In respect for everybody's time today,
- 00:04let's go ahead and get started.
- 00:07So today it is my pleasure to introduce,
- 00:09Dr. Alexander Strang.
- 00:12Dr. Strang earned his bachelor's in Mathematics and Physics,
- 00:16as well as his PhD in Applied Mathematics
- 00:19from Case Western Reserve University in Cleveland, Ohio.
- 00:24Born in Ohio, so representer.
- 00:27He studies variational inference problems,
- 00:29noise propagation in biological networks,
- 00:32self-organizing edge flows,
- 00:34and functional form game theory
- 00:36at the University of Chicago,
- 00:38where he is a William H. Kruskal Instructor of Statistics,
- 00:42and Applied Mathematics.
- 00:43Today he is going to talk to us about motivic expansion
- 00:47of global information flow in spike train data.
- 00:50Let's welcome our speaker.
- 00:54<v ->Okay, thank you very much.</v>
- 00:56Thank you first for the kind invite,
- 00:59and for the opportunity to speak here in your seminar.
- 01:03So I'd like to start with some acknowledgements.
- 01:06This is very much a work in progress.
- 01:09Part of what I'm going to be showing you today
- 01:11is really the work of a master's student
- 01:12that I've been working with this summer, that's Bowen.
- 01:15And really I'd like to thank Bowen
- 01:16for a lot of the simulation
- 01:18and a lot of the TE calculation I'll show you later.
- 01:21This project more generally was born
- 01:22out of conversations with Brent Doiron
- 01:24and Lek-Heng Lim here at Chicago.
- 01:27Brent really was the inspiration for
- 01:30starting to venture into computational neuroscience.
- 01:33I'll really say that that I am new to this world,
- 01:35it's a world that's exciting to me,
- 01:37but really is a world that I am actively exploring
- 01:41and learning about.
- 01:42So I look forward to conversations afterwards
- 01:44to learn more here.
- 01:46My background was much more inspired by Lek-Heng's work
- 01:49in computational topology.
- 01:52And some of what I'll be presenting today
- 01:54is really inspired by conversations with him.
- 01:58So let's start with some introduction and motivation.
- 02:01The motivation personally for this talk,
- 02:05so it goes back really to work that I started
- 02:06when I was a graduate student,
- 02:08I've had this sort of long standing interest
- 02:10in the interplay between structure and dynamics,
- 02:12in particular on networks.
- 02:14I've done interesting questions like,
- 02:16how does the structure of a network determine dynamics
- 02:18of processes on that network?
- 02:21And in turn, how do processes on that network
- 02:24give rise to structure?
- 02:30On the biological side,
- 02:32today's talk I'm going to be focusing on
- 02:34sort of applications of this question
- 02:36within neural networks.
- 02:38And I think that this sort of world of
- 02:39computational neuroscience is really exciting
- 02:41if you're interested in this interplay
- 02:42between structure and dynamics
- 02:44because neural networks encode, transmit
- 02:46and process information via dynamical processes.
- 02:49For example, the process, the dynamical process
- 02:53of a neural network is directed by the wiring patterns
- 02:56by the structure of that network.
- 02:58And moreover, if you're talking
- 03:00about some sort of learning process,
- 03:01then those wiring patterns can change
- 03:03and adapt during the learning process,
- 03:05so that are themselves dynamic.
- 03:08In this area I've been interested in questions like,
- 03:10how is the flow of information governed
- 03:12by the wiring pattern?
- 03:14How do patterns of information flow
- 03:16present themselves in data?
- 03:17And can they be inferred from that data?
- 03:19And what types of wiring patterns
- 03:21might develop during learning?
- 03:24Answering questions of this type requires
- 03:26a couple of things.
- 03:26Sort of very big picture, it requires a language
- 03:29for describing structures and patterns.
- 03:31It requires having a dynamical process,
- 03:33some sort of model for the neural net,
- 03:35and it requires a generating model
- 03:38that generates initial structure
- 03:40and links the structure to dynamics.
- 03:42Alternatively, if we don't generate things using a model,
- 03:45if we have some sort of observable or data,
- 03:47then we can try to work in the other direction
- 03:49and go from dynamics to structure.
- 03:52Today during this talk,
- 03:53I'm gonna be focusing really on this first piece,
- 03:55a language for describing structures and patterns.
- 03:57And on the second piece on sort of an observable
- 04:00that I've been working on trying to compute to use,
- 04:04to try to connect these three points together.
- 04:08So to get started, a little bit of biology.
- 04:10Really I was inspired in this project
- 04:12by a paper from K.G. Mura.
- 04:14Here he was looking at a coupled oscillator model,
- 04:17this is a Kuramoto model for neural activity
- 04:20where the firing dynamics interact with the wiring.
- 04:22So the wiring in the couples,
- 04:25the oscillators would adapt on a slower time scale
- 04:29than the oscillators themselves.
- 04:31And that adaptation could represent
- 04:34different types of learning processes.
- 04:36For example, the fire-together wire-together rules
- 04:39or Hebbian learning,
- 04:41you can look at causal learning rules,
- 04:43or anti-Hebbian learning rules.
- 04:45And this is just an example I've run of this system.
- 04:48This system of OD is sort of interesting
- 04:50because it can generate all sorts of different patterns.
- 04:52You can see synchronized firing,
- 04:54you can see traveling waves,
- 04:55you can see chaos,
- 04:57these occur at different sort of critical boundaries.
- 04:59So you can see phase transitions
- 05:01when you have large collections of these oscillators.
- 05:04And depending on how they're coupled together,
- 05:05it behaves differently.
- 05:07In particular some of what's interesting here is
- 05:09that starting from some random seed topology,
- 05:13the dynamics play forward from that initial condition,
- 05:16and that random seed topology produces some ensemble of
- 05:19wiring patterns that are of themselves random.
- 05:22You can think about the ensemble of wiring patterns
- 05:24as being chaotic,
- 05:25sort of realizations of some random initialization.
- 05:29That said, you can also observe structures
- 05:32within the systems of coupled oscillators.
- 05:33So you could see large scale cyclic structures
- 05:36representing organized causal firing patterns
- 05:38in certain regimes.
- 05:40So this is a nice example where graph structure
- 05:42and learning parameters can determine dynamics,
- 05:44and in turn where those dynamics can determine structure.
- 05:48On the other side, you can also think
- 05:49about a data-driven side instead of a model-driven side.
- 05:53If we empirically observe sample trajectories
- 05:56of some observables, for example, neuron recordings,
- 05:58then we might hope to infer something
- 05:59about the connectivity that generates them.
- 06:01And so here instead of starting by posing a model,
- 06:04and then simulating it and studying how it behaves,
- 06:06we can instead study data or try to study structure in data.
- 06:10Often that data comes in the form of covariance matrices
- 06:12representing firing rates.
- 06:14And these covariance matrices maybe auto covariance matrices
- 06:17with some sort of time lag.
- 06:19Here there are a couple of standard structural approaches,
- 06:22so there is a motivic expansion approach.
- 06:25This was at least introduced by Brent Doiron's lab
- 06:28with his student, Gay Walker.
- 06:30Here the idea is that you define some graph motifs,
- 06:34and then you can study the dynamics
- 06:36in terms of those graph motifs.
- 06:38For example, if you build a power series in those motifs,
- 06:41then you can try to represent your covariance matrices
- 06:44in terms of that power series.
- 06:45And this is something we're gonna talk
- 06:46about quite a bit today.
- 06:47This is really part of why I was inspired by this work is,
- 06:49I had been working separately on the idea of
- 06:51looking at covariance matrices
- 06:53in terms of these power series expansions.
- 06:56This is also connected to topological data analysis,
- 06:59and this is where the conversations with Lek-Heng
- 07:01played a role in this work.
- 07:03Topological data analysis aims to construct graphs,
- 07:07representing dynamical sort of systems.
- 07:08For example, you might look at the dynamical similarity
- 07:11of firing patterns of certain neurons,
- 07:13and then tries to study the topology of those graphs.
- 07:18Again, this sort of leads to similar questions,
- 07:20but we can be a little bit more precise here
- 07:21for thinking in neuroscience.
- 07:24We can say more precisely, for example,
- 07:25how is information processing and transfer represented,
- 07:29both in these covariance matrices and the structures
- 07:32that we hope to extract from them.
- 07:33In particular, can we try
- 07:35and infer causality from firing patterns?
- 07:39And this is fundamentally an information theoretic question.
- 07:42Really we're asking, can we study
- 07:43the directed exchange of information from trajectories?
- 07:47Here one approach, I mean, in some sense
- 07:49you can never tell causality without some underlying model,
- 07:53without some underlying understanding of the mechanism.
- 07:56So if all we can do is observe,
- 07:58then we need to define what we mean by causality.
- 08:01A reasonable sort of standard definition here
- 08:03is Wiener Causality,
- 08:04which says that two times series share a causal relation,
- 08:06so we say X causes Y,
- 08:08if the history of X informs a future of Y.
- 08:12Note that here "cause" put in quotes,
- 08:14really means forecasts.
- 08:15That means that the past or the present of X
- 08:18carries relevant information about the future of Y.
- 08:22A natural measure of that information is transfer entropy.
- 08:26Transfer entropy was introduced by Schreiber in 2000,
- 08:30and it's the expected KL divergence
- 08:32between the distribution of the future of Y
- 08:35given the history of X
- 08:38and the marginal distribution of the future of Y.
- 08:41So essentially it's how much predictive information
- 08:43does X carry about Y?
- 08:46This is a nice quantity for a couple of reasons.
- 08:48First, it's zero when two trajectories are independent.
- 08:51Second, since it's just defined in terms of
- 08:53these conditional distributions, it's model free.
- 08:56So I put here no with a star because this,
- 08:59the generative assumptions actually do matter
- 09:01when you go to try and compute it,
- 09:02but in principle it's defined independent of the model.
- 09:05Again, unlike some other effective causality measures,
- 09:07it doesn't require introducing some time lag to define.
- 09:11It's a naturally directed quantity, right?
- 09:13We can say that the future of Y
- 09:15conditioned on the past of X and...
- 09:17That transfer entropy is defined on the terms of
- 09:20the future of Y conditioned on the past of X and Y.
- 09:23And that quantity is directed because reversing X and Y,
- 09:27it does not sort of symmetrically change this statement.
- 09:30This is different than quantities
- 09:31like mutual information or correlation
- 09:32that are also often used
- 09:34to try to measure effective connectivity in networks,
- 09:37which are fundamentally symmetric quantities.
- 09:41Transfer entropy is also less corrupted
- 09:43by measurement noise, linear mixing of signals,
- 09:46or shared coupling to external sources.
- 09:50Lastly, and maybe most interestingly,
- 09:52if we think in terms of correlations,
- 09:54correlations are really moments,
- 09:56correlations are really about covariances, right?
- 09:57Second order moments.
- 09:59Transfer entropies, these are about entropies,
- 10:01these are sort of logs of distributions,
- 10:04and so they depend on the full shape of these distributions,
- 10:06which means that transfer entropy can capture coupling
- 10:10that is maybe not apparent or not obvious,
- 10:13just looking at a second order moment type analysis.
- 10:17So transfer entropy has been applied pretty broadly.
- 10:20It's been applied to spiking cortical networks
- 10:22and calcium imaging,
- 10:24to MEG data in motor tasks and auditory discrimination.
- 10:29It's been applied to motion recognition,
- 10:31precious metal prices
- 10:32and multivariate time series forecasting,
- 10:34and more recently to accelerate learning
- 10:36in different artificial neural nets.
- 10:38So you can look at feedforward architectures,
- 10:40convolution architectures, even recurrent neural nets,
- 10:42and transfer entropy has been used
- 10:44to accelerate learning in those frameworks.
- 10:49For this part of the talk,
- 10:50I'd like to focus really on two questions.
- 10:52First, how do we compute transfer entropy?
- 10:55And then second, if we could compute transfer entropy
- 10:58and build a graph out of that,
- 11:00how would we study the structure of that graph?
- 11:01Essentially, how is information flow structured?
- 11:05We'll start with computing in transfer entropy.
- 11:09To compute transfer entropy,
- 11:10we actually need to write down an equation.
- 11:13So transfer entropy was originally introduced
- 11:14for discrete time arbitrary order Markov processes.
- 11:18So suppose we have two Markov processes: X and Y,
- 11:21and we'll let X of N denote the state of
- 11:23process X at time N,
- 11:25and XNK where the K is in superscript to denote the sequence
- 11:29starting from N minus K plus one going up to N.
- 11:32So that's sort of the last K states
- 11:35that the process X visited,
- 11:37then the transfer entropy from Y to X,
- 11:40they're denoted T, Y arrow to X
- 11:45is the entropy of the future of X conditioned on its past
- 11:50minus the entropy of the future of X conditioned on its past
- 11:54and the past of the trajectory Y.
- 11:56So here you can think the transfer entropy
- 11:57is essentially the reduction in entropy
- 11:59of the future states of X
- 12:00when incorporating the past of Y.
- 12:03This means that computing transfer entropy
- 12:05reduces to estimating essentially these entropies.
- 12:07That means we need to be able to estimate
- 12:08essentially the conditional distributions
- 12:10inside of these parentheses.
- 12:14That's easy for certain processes.
- 12:15So for example, if X and Y are Gaussian processes,
- 12:19then really what we're trying to compute
- 12:20is conditional mutual information.
- 12:22And there are nice equations
- 12:23for conditional mutual information
- 12:25when you have Gaussian random variables.
- 12:26So if I have three Gaussian random variables: X, Y, Z,
- 12:29possibly multivariate with joint covariance sigma,
- 12:33then the conditional mutual information
- 12:35between these variables,
- 12:36so the mutual information between X and Y conditioned on Z
- 12:39is just given by this ratio of log determinants
- 12:42of those covariances.
- 12:45In particular, a common test model used
- 12:48in sort of the transfer entropy literature
- 12:51are linear auto-regressive processes
- 12:53because a linear auto-regressive process
- 12:55when perturbed by Gaussian noise
- 12:57produces a Gaussian process.
- 12:58All of the different
- 12:59joint marginal conditional distributions are all Gaussian,
- 13:02which means that we can compute
- 13:03these covariances analytically,
- 13:05which then means that you can compute
- 13:06the transfer entropy analytically.
- 13:07So these linear auto-regressive processes
- 13:09are nice test cases
- 13:10because you can do everything analytically.
- 13:12They're also somewhat disappointing or somewhat limiting
- 13:15because in this linear auto-regressive case,
- 13:17transfer entropy is the same as Granger causality.
- 13:22And in this Gaussian case,
- 13:24essentially what we've done is we've reduced
- 13:26transfer entropy to a study of time-lagged correlations,
- 13:29so this becomes the same
- 13:30as sort of a correlation based analysis,
- 13:32we can't incorporate information beyond the second moments,
- 13:34if we restrict ourselves to Gaussian processes
- 13:36or Gaussian approximations.
- 13:39The other thing to note is this is strongly model-dependent
- 13:41because this particular formula
- 13:43for computing mutual information
- 13:44depends on having Gaussian distributions.
- 13:50In a more general setting or a more empirical setting,
- 13:53you might observe some data.
- 13:55You don't know if that data
- 13:56comes from some particular process,
- 13:58so you can't necessarily assume
- 13:59that conditional distributions are Gaussian,
- 14:01but we would still like to estimate transfer entropy,
- 14:03which leads to the problem of estimating transfer entropy
- 14:06given an observed time series.
- 14:08We would like to do this again, sans some model assumption,
- 14:10so we don't wanna assume Gaussianity.
- 14:13This is sort of trivial,
- 14:14again, I star that in discrete state spaces
- 14:17because essentially it amounts to counting occurrences,
- 14:20but it becomes difficult whenever the state spaces are large
- 14:23and/or high dimensional as they often are.
- 14:26This leads to a couple of different approaches.
- 14:28So as a first example, let's consider spike train data.
- 14:32So spike train data consists essentially of
- 14:35binning the state of a neuron into either on or off.
- 14:39So neurons, you can think either in the state zero or one,
- 14:41and then a pair wise calculation for transfer entropy
- 14:44only requires estimating a joint probability distribution
- 14:48over four to the K plus L states where K plus L,
- 14:51K is the history of X that we remember,
- 14:54and L is the history of Y.
- 14:57So if sort of the Markov process
- 14:59generating the spike train data is not of high order,
- 15:02does not have a long memory,
- 15:04then these K and L can be small,
- 15:06and this state space is fairly small,
- 15:08so this falls into that first category
- 15:10when we're looking at a discrete state space,
- 15:12and it's not too difficult.
- 15:15In a more general setting,
- 15:16if we don't try to bin the states
- 15:18of the neurons to on or off,
- 15:19for example, maybe we're looking at a firing rate model
- 15:22where we wanna look at the firing rates of the neurons,
- 15:24and that's a continuous random variable,
- 15:27then we need some other types of estimators.
- 15:29So the common estimator used here
- 15:31is a kernel density estimator, a KSG estimator,
- 15:34and this is designed for large continuous
- 15:36or high dimensional state spaces,
- 15:37e.g. sort of these firing rate models.
- 15:40Typically the approach is to employ a Takens delay map,
- 15:43which embeds your high dimensional data
- 15:45in some sort of lower dimensional space
- 15:48that tries to capture the intrinsic dimension
- 15:50of the attractor that your dynamic process settles onto.
- 15:55And then you try to estimate an unknown density
- 15:57based on this delay map using a k-nearest
- 15:59neighbor kernel density estimate.
- 16:01The advantage of this sort of
- 16:03k-nearest neighbor kernel density is
- 16:05that it dynamically adapts the width of the kernel,
- 16:07giving your sample density.
- 16:09And this has been implemented in some open source toolkits,
- 16:11these are the toolkits that we've been working with.
- 16:15So we've tested this in a couple of different models,
- 16:18and really I'd say this work,
- 16:19this is still very much work in progress,
- 16:20this is work that Bowen was developing over the summer.
- 16:23And so we developed a couple different models to test.
- 16:26The first were these Linear Auto-Regressive Networks,
- 16:29and we just used these to test
- 16:31the accuracy of the estimators
- 16:32because everything here is Gaussian,
- 16:33so you can compute the necessary
- 16:35transfer entropies analytically.
- 16:37The next more interesting class of networks
- 16:39are Threshold Linear Networks or TLNs.
- 16:42These are a firing rate model where your rate R
- 16:44obeys this sarcastic differential equation.
- 16:47So the rate of change and the rate, DR of T is,
- 16:51so you have sort of a leak term, negative RFT,
- 16:53and then plus here, this is essentially a coupling.
- 16:57All of this is inside here, the brackets with a plus,
- 17:00this is like a ReLU function,
- 17:02so this is just taking the positive part
- 17:04of what's on the inside.
- 17:05Here B is an activation threshold,
- 17:08W is a wiring matrix,
- 17:09and then R are those rates, again.
- 17:11And then C here, that's essentially covariance
- 17:13for some noise term for terming this process,
- 17:17we use these TLNs to test the sensitivity
- 17:19of our transfer entropy estimators
- 17:21to common and private noise sources as you change C,
- 17:24as well as sort of how well the entropy network
- 17:26agrees with the wiring matrix.
- 17:31A particular class of TLNs
- 17:32were really nice for these experiments
- 17:35what are called Combinatorial Threshold Linear Networks.
- 17:37These are really pretty new,
- 17:38these were introduced by Carina Curto's lab this year,
- 17:42and really this was inspired by a talk
- 17:45I'd seen her give at FACM in May.
- 17:49These are threshold linear networks
- 17:51where the weight matrix here, W,
- 17:52representing the wiring of the neurons
- 17:55is determined by a directed graph G.
- 17:58So you start with some directed graph G,
- 18:00that's what's shown here on the left.
- 18:01This figure is adapted from Carina's paper,
- 18:03this is a very nice paper
- 18:04if you'd like to take a look at it.
- 18:07And if I and J are not connected,
- 18:10then the weight matrix is assigned one value;
- 18:12and if they are connected, then it's assigned another value,
- 18:14and the wiring is zero if I equals J.
- 18:18These networks are nice
- 18:20if we wanna test structural hypotheses
- 18:22because it's very easy to predict from the input graph
- 18:25how the output dynamics of the network should behave,
- 18:28and they are really beautiful analysis
- 18:30that Carina does in this paper to show
- 18:32that you can produce all these different
- 18:33interlocking patterns of limit cycles
- 18:35and multistable states,
- 18:36and chaos, and all these nice patterns,
- 18:38and you can design them by picking these nice
- 18:41sort of directed graphs.
- 18:44The last class of networks that we've built to test
- 18:46are Leaky-Integrate and Fire Networks.
- 18:48So here we're using a Leaky-Integrate and Fire model
- 18:51where our wiring matrix, W, is drawn randomly.
- 18:54It's block stochastic, which means
- 18:57that it's (indistinct) between blocks.
- 19:00And it's a balanced network,
- 19:02so we have excitatory and inhibitory neurons
- 19:04that talk to each other,
- 19:06and maintain a sort of a balance in the dynamics here.
- 19:09The hope is to pick a large enough scale network
- 19:11that we see properly chaotic dynamics
- 19:13using this Leaky-Integrate and Fire model.
- 19:17These tests have yielded fairly mixed results,
- 19:21so the simple tests behave sort of as expected.
- 19:24So the estimators that are used are biased,
- 19:27and the bias typically decays slower
- 19:29than the variance in estimation,
- 19:30which means that you do need fairly long trajectories
- 19:32to try to properly estimate the transfer entropy.
- 19:36That said, transfer entropy does correctly identify
- 19:38causal relationships and simple graphs,
- 19:40and transfer entropy matches the underlying structure
- 19:44used in a Combinatorial Threshold Linear Network, so CTLN.
- 19:49Unfortunately, these results did not carry over as cleanly
- 19:52to the Leaky-Integrate and Fire models
- 19:54or to model sort of larger models.
- 19:56So what I'm showing you on the right here,
- 19:58this is a matrix where we've calculated
- 20:00the pairwise transfer entropy between all neurons
- 20:03in a 150 neuron balanced network.
- 20:06This has shown absolute, this has shown in the log scale.
- 20:09And the main thing I wanna highlight for it
- 20:11to taking a look at this matrix
- 20:12is very hard to see exactly what the structure is.
- 20:15You see this banding,
- 20:17that's because neurons tend to be highly predictive
- 20:20if they fire a lot.
- 20:21So there's a strong correlation
- 20:22between the transfer entropy, between X and Y,
- 20:25and just the activity level of X,
- 20:29but it's hard to distinguish block-wise differences,
- 20:31for example, between inhibitory neurons
- 20:33and excitatory neurons, and that really takes plotting out.
- 20:36So here this box in a whisker plot on the bottom,
- 20:39this is showing you if we group entries of this matrix
- 20:43by the type of connection,
- 20:44so maybe excitatory to excitatory,
- 20:46or inhibitory to excitatory, or so on,
- 20:48that the distribution of realized transfer entropy
- 20:50is really a different,
- 20:52but they're different in sort of subtle ways.
- 20:54So in this sort of larger scale balance network,
- 20:58it's much less clear whether transfer entropy
- 21:02effectively is like equated in some way
- 21:05with the true connectivity or wiring.
- 21:09In some ways, this is not a surprise
- 21:10because the behavior of the balance networks
- 21:12is inherently balanced,
- 21:13and (indistinct) inherently unstructured,
- 21:16but there are ways in which these experiments
- 21:18have sort of revealed confounding factors
- 21:20that are conceptual factors
- 21:22that make transfer entropies
- 21:24not as sort of an ideal measure,
- 21:25or maybe not as ideal as it seems
- 21:28given the start of this talk.
- 21:29So for example, suppose two trajectories:
- 21:33X and Y are both strongly driven by a third trajectory, Z,
- 21:36but X responds to Z first.
- 21:39Well, then the present information about X
- 21:40or the present state of X carries information
- 21:42about the future of Y, so X is predictive of Y,
- 21:45so X forecast Y.
- 21:46So in the transfer entropy or Wiener causality setting,
- 21:48we would say X causes Y,
- 21:51even if X and Y are only both responding to Z.
- 21:54So here in this example,
- 21:56suppose you have a directed tree where information
- 21:59or sort of dynamics propagate down the tree.
- 22:02If you look at this node here, PJ and I,
- 22:07PJ will react
- 22:08to essentially information traveling down this tree
- 22:12before I does, so PJ would be predictive for I.
- 22:15So we would observe an effective connection
- 22:19where PJ forecasts I,
- 22:21which means that neurons that are not directly connected
- 22:23may influence each other,
- 22:24and that this transfer entropy
- 22:26really you should think of in terms of forecasting,
- 22:29not in terms of being a direct analog to the wiring matrix.
- 22:33One way around this is to condition on the state
- 22:35of the rest of the network
- 22:37before you start doing some averaging.
- 22:39This leads to some other notions of entropy.
- 22:41So for example, causation entropy,
- 22:42and this is sort of a promising direction,
- 22:44but it's not a time to explore yet.
- 22:47So that's the estimation side,
- 22:49those are the tools for estimating transfer entropy.
- 22:52Now let's switch gears
- 22:53and talk about that second question I had introduced,
- 22:55which is essentially, how do we analyze structure?
- 22:57Suppose we could calculate a transfer entropy graph,
- 23:00how would we extract structural information from that graph?
- 23:04And here, I'm going to be introducing some tools
- 23:06that I've worked on for awhile
- 23:08for describing sort of random structures and graphs.
- 23:11These are tied back to some work I'd really done
- 23:15as a graduate student in conversations with Lek-Heng.
- 23:18So we start in a really simple context,
- 23:19which is the graph or network.
- 23:21This could be directed or undirected,
- 23:22however, we're gonna require that does not have self-loops,
- 23:24then it's finite.
- 23:26We'll let V here be the number of vertices
- 23:28and E be the number of edges.
- 23:30Then the object of study that we'll introduce
- 23:33is something called an edge flow.
- 23:34An edge flow is essentially a function
- 23:35on the edges of the graph.
- 23:37So this is a function that accepts pairs of endpoints
- 23:40and returns a real number,
- 23:42and this is an alternating function.
- 23:43So if I had to take F of IJ,
- 23:45that's negative F of JI
- 23:47because you can think of F of IJ as being some flow,
- 23:49like a flow of material between I and J,
- 23:52hence this name, edge flow.
- 23:54This is analogous to a vector field
- 23:56because this is like the analogous construction
- 23:58to a vector field in the graph,
- 23:59and represents some sort of flow between nodes.
- 24:02Edge flows are really sort of generic things,
- 24:04so you can take this idea of an edge flow
- 24:07and apply it in a lot of different areas
- 24:09because really all you need is,
- 24:10you just need to structure some alternating function
- 24:12on the edges of the graph.
- 24:13So I've sort of read papers
- 24:16and worked in a bunch of these different areas,
- 24:19particularly I've focused on applications of this
- 24:21in game theory, in pairwise and social choice settings,
- 24:25in biology and Markov chains.
- 24:26And a lot of this project has been attempting
- 24:28to take this experience working with edge flows in,
- 24:31for example, say non-equilibrium thermodynamics
- 24:34or looking at pairwise preference data,
- 24:36and looking at a different application area
- 24:38here to neuroscience.
- 24:40Really you could think about the edge flow
- 24:42or a relevant edge flow in neuroscience,
- 24:43you might be asking about asymmetries and wiring patterns,
- 24:46or differences in directed influence or causality,
- 24:49or really you could think about these
- 24:50transfer entropy quantities.
- 24:51This is why I was excited about transfer entropy.
- 24:53Transfer entropy is inherently directed notion
- 24:56of information flow,
- 24:57so it's natural to think
- 24:59that if you can calculate things like a transfer entropy,
- 25:01then really what you're studying
- 25:03is some sort of edge flow on a graph.
- 25:06Edge flows often are subject to
- 25:08sort of the same set of common questions.
- 25:10So if I wanna analyze the structure of an edge flow,
- 25:12there's some really big global questions
- 25:14that I would often ask,
- 25:15that get asked in all these different application areas.
- 25:19One common question is,
- 25:20well, does the flow originate somewhere and end somewhere?
- 25:23Are there sources and sinks in the graph?
- 25:25Another is, does it circulate?
- 25:26And if it does circulate, on what scales and where?
- 25:31If you have a network that's connected
- 25:33to a whole exterior network,
- 25:34for example, if you're looking at some small subsystem
- 25:37that's embedded in a much larger system
- 25:38as is almost always the case in neuroscience,
- 25:41then you also need to think about,
- 25:42what passes through the network?
- 25:43So is there a flow or a current that moves
- 25:46through the boundary of the network?
- 25:47Is there information that flows through the network
- 25:51that you're studying?
- 25:52And in particular if we have these different types of flow,
- 25:55if flow can originate and source and end in sinks,
- 25:57if it can circulate, if it can pass through,
- 25:59can we decompose the flow into pieces that do each of these,
- 26:03and ask how much of the flow does one, two, or three?
- 26:07Those questions lead to a decomposition.
- 26:11So here we're going to start with this simple idea,
- 26:13we're going to decompose an edge flow
- 26:15by projecting it onto orthogonal subspaces
- 26:17associated with some graph operators.
- 26:20Generically if we consider two linear operators: A and B,
- 26:24where the product A times B equals zero,
- 26:27then the range of B must be contained
- 26:29in the null space of A,
- 26:31which means that I can express
- 26:33essentially any set of real numbers.
- 26:35So you can think of this as being the vector space
- 26:38of possible edge flows as a direct sum of the range of B,
- 26:43the range of A transpose
- 26:45and the intersection of the null space of B transpose
- 26:47in the null space of A.
- 26:48This blue subspace, this is called the harmonic space,
- 26:53and this is trivial
- 26:56in many applications
- 26:58if you choose A and B correctly.
- 27:00So there's often settings where you can pick A and B,
- 27:02so that these two null spaces have no intersection,
- 27:06and then this decomposition boils down
- 27:08to just separating a vector space
- 27:10into the range of B and the range of A transpose.
- 27:16In the graph setting,
- 27:17our goal is essentially to pick these operators
- 27:19to the meaningful things.
- 27:20That is to pick graph operators,
- 27:22so that these subspaces carry a meaningful,
- 27:26or carry meaning in the structural context.
- 27:30So let's think a little bit about graph operators here,
- 27:33so let's look at two different classes of operators.
- 27:35So we can consider matrices that have E rows and N columns,
- 27:40or matrices that have L rows and E columns where,
- 27:44again, E is the number of edges in this graph.
- 27:48If I have a matrix with E rows,
- 27:50then each column of the matrix has as many entries
- 27:53as there are edges in the graph,
- 27:55so it can be thought of as itself an edge flow.
- 27:57So you could think that this matrix is composed
- 27:59of a set of columns where each column is some particular
- 28:02sort of motivic flow or flow motif.
- 28:05In contrast if I look at a matrix where I have E columns,
- 28:09then each row of the matrix is a flow motif,
- 28:11so products against M
- 28:14evaluate inner products against specific flow motifs.
- 28:18That means that in this context,
- 28:20if I look at the range of this matrix,
- 28:21this is really a linear combination
- 28:23of a specific subset of flow motifs.
- 28:25And in this context,
- 28:26if I look at the null space of the matrix,
- 28:28I'm looking at all edge flows orthogonal
- 28:30to that set of flow motifs.
- 28:32So here if I look at the range of a matrix with E rows,
- 28:36that subspace is essentially a modeling behavior
- 28:39similar to the motifs.
- 28:40So if I pick a set of motifs that flow out of a node
- 28:44or flow into a node,
- 28:45then this range is going to be a subspace of edge flows
- 28:48that tend to originate in sources and end in sinks.
- 28:51In contrast here, the null space of M,
- 28:54that's all edge flows orthogonal to the flow motifs,
- 28:57so it models behavior distinct from the motifs.
- 28:59Essentially this space asks, what doesn't the flow do?
- 29:02Whereas this space asks, what does the flow do?
- 29:07Here is a simple, sort of very classical example.
- 29:09And really this goes all the way back to,
- 29:11you could think like Kirchhoff electric circuit theory.
- 29:14We can define two operators.
- 29:15Here G, this is essentially a gradient operator.
- 29:18And if you've taken some graph theory,
- 29:20you might know this as the edge incidence matrix.
- 29:22This is a matrix which essentially records
- 29:25the endpoints of an edge
- 29:26and evaluates differences across it.
- 29:29So, for example, if I look at this first row of G,
- 29:33this corresponds to edge one in the graph,
- 29:35and if I had a function defined on the nodes in the graph,
- 29:39products with G would evaluate differences across this edge.
- 29:43If you look at its columns,
- 29:44each column here is a flow motif.
- 29:46So, for example, this highlighted second column,
- 29:49this is entries: one, negative one, zero, negative one.
- 29:52If you carry those back to the edges,
- 29:53that corresponds to this specific flow motif.
- 29:56So here this gradient,
- 29:58it's adjoint to essentially a divergence operator,
- 30:00which means that the flow motifs are unit inflows
- 30:03or unit outflows from specific nodes,
- 30:05like what's shown here.
- 30:07You can also introduce something like a curl operator.
- 30:10The curl operator evaluates paths, sums around loops.
- 30:13So this row here, for example, this is a flow motif
- 30:16corresponding to the loop labeled A in this graph.
- 30:20You could certainly imagine
- 30:21other operators' built cutter, other motifs,
- 30:23these operators are particularly nice
- 30:25because they define principled subspaces.
- 30:28So if we apply that generic decomposition,
- 30:31then we could say that the space
- 30:32of possible edge flows are E,
- 30:34it can be decomposed into the range of the grading operator,
- 30:37the range of the curl transpose,
- 30:39and the intersection of their null spaces
- 30:42into this harmonic space.
- 30:44This is nice because the range of the gradient that flows,
- 30:46it start and end somewhere.
- 30:48Those are flows that are associated with
- 30:50like motion down a potential.
- 30:52So these if you're thinking physics,
- 30:53you might say that these are sort of conservative,
- 30:55these are like flows generated by a voltage
- 30:57if you're looking at electric circuit.
- 30:59These cyclic flows, well, these are the flows
- 31:01in the range of the curl transpose,
- 31:03and then this harmonic space,
- 31:04those are flows that enter and leave the network
- 31:06without either starting or ending
- 31:09a sink or a source, or circulating.
- 31:11So you can think that really this decomposes
- 31:13the space of edge flows into flows that start
- 31:16and end somewhere inside the network.
- 31:17Flows that circulate within the network,
- 31:19and flows that do neither,
- 31:20i.e. flows that enter and leave the network.
- 31:22So this accomplishes that initial decomposition
- 31:25I'd set out at the start.
- 31:28Once we have this decomposition, then we can evaluate
- 31:31the sizes of the components of decomposition to measure
- 31:34how much of the flow starts and ends somewhere,
- 31:38how much circulates and so on.
- 31:39So we can introduce these generic measures
- 31:41we're given some operator N,
- 31:44we decompose the space of edge flows
- 31:46into the range of M and the null space of M transpose,
- 31:49which means we can project F onto these subspaces,
- 31:52and then just evaluate the sizes of these components.
- 31:55And that's a way of measuring
- 31:57how much of the flow behaves like
- 31:59the flow motifs contained in this operator,
- 32:01and how much it doesn't.
- 32:04So, yeah, so that lets us answer this question,
- 32:07and this is the tool that we're going to be using
- 32:09sort of as our measurable.
- 32:12Now that's totally easy to do,
- 32:16if you're given a fixed edge flow and a fixed graph
- 32:17because if you have fixed graph,
- 32:18you can build your operators, you choose the motifs,
- 32:20you have fixed edge flow, you just project the edge flow
- 32:23onto the subspaces spanned by those operators,
- 32:25and you're done.
- 32:27However, there are many cases where it's worth thinking
- 32:31about a distribution of edge flows,
- 32:33and then expected structures given that distribution.
- 32:37So here we're going to be considering random edge flows,
- 32:39for example, in edge flow capital F,
- 32:41here I'm using capital letters to denote random quantities
- 32:43sampled from an edge flow distributions.
- 32:45This is a distribution of possible edge flows.
- 32:46And this is worth thinking about
- 32:48because many generative models are stochastic.
- 32:51They may involve some random seed,
- 32:53or they may, for example, like that neural model
- 32:55or a lot of these sort of neural models be chaotic.
- 32:58So even if they are deterministic generative models,
- 33:01the output data behaves
- 33:03as it was sampled from the distribution.
- 33:05On the empirical side, for example,
- 33:07when we're estimating transfer entropy
- 33:09or estimating some information flow,
- 33:11then there's always some degree of measurement error
- 33:13or uncertainty in that estimate,
- 33:15which really means sort of from a Bayesian perspective,
- 33:18we should be thinking that our estimator
- 33:21is a point estimate drawn from some
- 33:23posterior distribution of edge flows,
- 33:24and then we're back in the setting where,
- 33:25again, we need to talk about a distribution.
- 33:28Lastly, this random edge flow setting is also
- 33:31really important if we wanna compare to null hypotheses
- 33:35because often if you want to compare
- 33:37to some sort of null hypothesis,
- 33:38it's helpful to have an ensemble of edge flows
- 33:41to compare against, which means that we would like
- 33:44to be able to talk about expected structure
- 33:46under varying distributional assumptions.
- 33:50If we can talk meaningfully about random edge flows,
- 33:54then really what we can start doing is
- 33:56we can start bridging the expected structure
- 33:59back to the distribution.
- 34:00So what we're looking for is a way of explaining
- 34:03sort of generic expectations
- 34:05of what the structure will look like
- 34:07as we vary this distribution of edge flows.
- 34:10You could think that a particular dynamical system
- 34:13generates a wiring pattern,
- 34:17that generates firing dynamics,
- 34:19those firing dynamics determine
- 34:21some sort of information flow graph.
- 34:23And then that information flow graph
- 34:25is really a sample from that generative model.
- 34:28And we would like to be able to talk about,
- 34:30what would we expect if we knew the distribution
- 34:33of edge flows about the global structure?
- 34:35That is, we'd like to bridge global structure
- 34:37back to this distribution,
- 34:39and then ideally you would bridge that distribution back
- 34:41to the generative mechanism.
- 34:42This is a project for a future work,
- 34:45obviously this is fairly ambitious.
- 34:47However, this first point is something that you can do
- 34:51really in fairly explicit detail.
- 34:53And that's what I'd like to spell out
- 34:54with the end of this talk is
- 34:55how do you bridge global structure
- 34:58back to a distribution of edge flows?
- 35:02So, yeah, so that's the main question,
- 35:05how does the choice of distribution
- 35:06influence the expected global flow structure?
- 35:12So first, we start with the Lemma.
- 35:15Suppose that we have a distribution of edge flows
- 35:17with some expectation F bar and some covariance,
- 35:20here I'm using double bar V to denote covariance.
- 35:24We'll let S contained in the set of,
- 35:26or S be a subspace
- 35:29contained within the vector space of edge flows,
- 35:31and we'll let Ps of S be the orthogonal projector onto S.
- 35:35Then Fs of S, that's the projection F onto this subspace S,
- 35:40the expectation of its norm squared
- 35:43is the norm of the expected flow projected onto S squared.
- 35:48So this is essentially the expectation of the sample
- 35:53is the measure evaluated of the expected sample.
- 35:56And then plus a term that involves an inner product
- 35:58between the projector onto the subspace,
- 36:00and the covariance matrix for the edge flows.
- 36:02Here this denotes the matrix inner product,
- 36:04so this is just the sum overall IJ entries.
- 36:09What's nice about this formula
- 36:10is at least in terms of expectation, it reduces the study
- 36:16of the bridge between distribution
- 36:18and network structure to a study of moments, right?
- 36:22Because we've replaced the distributional problem here
- 36:24with a linear algebra problem
- 36:27that's posed in terms of this projector,
- 36:29the projector under the subspace S,
- 36:31which is determined by the topology of the network,
- 36:33and the variance in that edge flow
- 36:36which is determined by your generative model.
- 36:40Well, you might say, okay, well, (laughs) fine,
- 36:42this is a matrix inner product, we can just stop here,
- 36:44we could compute this projector,
- 36:45we could sample a whole bunch of edge flows,
- 36:47compute this covariance.
- 36:48So you can do this matrix inner product,
- 36:50but I sort of agree
- 36:51because I suspect that you can really do more
- 36:55with this sort of inner product.
- 36:57So I'd like to highlight some challenges
- 37:00associated with this inner product.
- 37:03So first, let's say, I asked you to design a distribution
- 37:06with tunable global structure.
- 37:07So for example, I said, I want you to pick
- 37:09a generative model or design a distribution of edge flows
- 37:12that when I sample edge flows from it,
- 37:14their expected structures matched some expectation.
- 37:18It's not obvious how to do that given this formula,
- 37:22it's not obvious in particular
- 37:23because these projectors,
- 37:24like the projector on the subspace S typically depend
- 37:27in fairly non-trivial ways on the graph topology.
- 37:30So small changes in the graph topology
- 37:32can completely change as projector.
- 37:34In essence, it's hard to isolate topology from distribution.
- 37:37You can think that this inner product,
- 37:39if I think about it in terms of the IJ entries,
- 37:43while easy to compute, it's not easy to interpret
- 37:47because I and J are somewhat arbitrary indexing.
- 37:49And obviously really the topology of the graph,
- 37:51it's not encoded in the indexing,
- 37:53that's encoded in the structure of these matrices.
- 37:56So in some ways what we really need is a better basis
- 37:59for computing this inner product.
- 38:01In addition, computing this inner product
- 38:03just may not be empirically feasible
- 38:05because it might not be feasible
- 38:07to estimate all these covariances.
- 38:08There's lots of settings
- 38:09where if you have a random edge flow,
- 38:11it becomes very expensive to try to estimate
- 38:13all the covariances in this graph,
- 38:14err, sorry, in this matrix
- 38:16because this matrix has as many entries
- 38:19as there are pairs of edges in the graph.
- 38:22And typically that number of edges grows fairly quickly
- 38:26in the number of nodes of the graph.
- 38:27So in the worst case,
- 38:29the size of these matrices
- 38:31goes not to the square of the number of nodes of the graph,
- 38:33but the number of nodes of the graph to the fourth,
- 38:35so this becomes very expensive very fast.
- 38:37Again, we could try to address this problem
- 38:41if we had a better basis for performing this inner product
- 38:43because we might hope to be able to truncate
- 38:46somewhere in that basis,
- 38:47and use a lower dimensional representation.
- 38:50So to build there, I'm gonna show you
- 38:52a particular family of covariances.
- 38:55We're going to start with a very simple generative model,
- 38:58so let's suppose that each node of the graph
- 39:00is assigned some set of attributes,
- 39:02here a random vector X sampled from a...
- 39:04So you can think of trait space,
- 39:05a space of possible attributes,
- 39:07and these are sampled i.i.d.
- 39:09In addition, we'll assume
- 39:10that there exists an alternating function F,
- 39:13which accepts pairs of attributes and returns a real number.
- 39:17So this is something that I can evaluate
- 39:19on the endpoints of an edge,
- 39:21and return an edge flow value.
- 39:24In this setting,
- 39:26everything that I'd shown you before simplifies.
- 39:29So if my edge flow F is drawn by first sampling
- 39:33a set of attributes,
- 39:34and then plugging those attributes
- 39:35into functions on the edges, then the
- 39:40mean edge flow is zero, so that F bar goes away,
- 39:44and the covariance reduces to this form.
- 39:46So you have a standard form where the covariance
- 39:48in the edge flow is a function of two scalar quantities,
- 39:52that's sigma squared in row.
- 39:53These are both statistics associated with this function
- 39:56and the distribution of traits.
- 39:59And then some matrices,
- 40:00so we have an identity matrix,
- 40:02and we have this gradient matrix showing up again.
- 40:05This is really nice because when you plug it back in
- 40:07to try to compute say the expected sizes of the components,
- 40:13this matrix inner product
- 40:15that I was complaining about before,
- 40:17this whole matrix inner product simplifies.
- 40:19So when you have a variance
- 40:21that's in this nice, simple canonical form,
- 40:23then the expected overall size of the edge flow,
- 40:26that's just sigma squared,
- 40:27the expected size projected onto that
- 40:30sort of conservative subspace
- 40:32that breaks into this combination
- 40:35of the sigma squared in the row.
- 40:37Again, those are some simple statistics.
- 40:39And then V, E, L, and E,
- 40:41those are just sort of
- 40:42essentially dimension counting on the network.
- 40:43So this is the number of vertices, the number of edges,
- 40:47and the number of loops,
- 40:48the number of loops that's the number of edges
- 40:49minus the number of vertices plus one.
- 40:52And similarly, the expected cyclic size
- 40:55or size of the cyclic component reduces to,
- 40:57again, this sort of scalar factor
- 40:59in terms of some simple statistics
- 41:01and some dimension counting sort of
- 41:03topology related quantities.
- 41:07So this is very nice because this allows us
- 41:11to really separate the role of topology
- 41:13from the role of the generative model.
- 41:14The generative model determines sigma in row,
- 41:17and topology determines these dimensions.
- 41:22It turns out that the same thing is true,
- 41:26even if you don't sample the edge flow
- 41:29using this sort of trait approach,
- 41:31but the graph is complete.
- 41:33So if your graph is complete,
- 41:34then no matter how you sample your edge flow,
- 41:37for any edge flow distribution,
- 41:38exactly the same formulas hold,
- 41:40you just replace those simple statistics
- 41:43with estimators for those statistics given your sample flow.
- 41:47And this is sort of a striking result
- 41:49because this says that this conclusion
- 41:51that was linked to some specific generative model
- 41:54with some very sort of specific assumptions, right?
- 41:56We assumed it was i.i.d. extends to all complete graphs,
- 41:59regardless of the actual distribution that we sampled from.
- 42:05Up until this point,
- 42:06this is kind of just an algebra miracle.
- 42:09And one of the things I'd like to do
- 42:10at the end of this talk is explain why this is true,
- 42:13and show how to generalize these results.
- 42:16So to build there,
- 42:17let's emphasize some of the advantages of this.
- 42:19So first, the advantages of this model,
- 42:22it's mechanistically plausible in certain settings,
- 42:24it cleanly separated the role of topology and distribution.
- 42:28And these coefficients that had to do with the topology,
- 42:30these are just dimensions,
- 42:31these are non-negative quantities.
- 42:34So it's easy to work out monotonic relationships
- 42:36between expected structure
- 42:38and simple statistics of the edge flow distribution.
- 42:44The fact that you can do that enables more general analysis.
- 42:47So what I'm showing you on the right here,
- 42:48this is from a different application area.
- 42:51This was an experiment where we trained
- 42:53a set of agents to play a game using a genetic algorithm,
- 42:58and then we looked at the expected sizes of sort of cyclic
- 43:01and acyclic components in a tournament among those agents.
- 43:05And you could actually predict these curves
- 43:08using this sort of type of structural analysis
- 43:10because it was possible to predict the dynamics
- 43:13of the simple statistics, this sigma in this row.
- 43:17So this is a really powerful analytical tool,
- 43:20but it is limited to this particular model.
- 43:23In particular, it only models unstructured cycles.
- 43:26So if you look at the cyclic component
- 43:27generated by this model, it just looks like random noise
- 43:30that's been projected onto the range of the curl transpose.
- 43:34It's limited to correlations on adjacent edges,
- 43:36so we only generate correlations on edges
- 43:38that share an endpoint
- 43:39because you could think that all of the original
- 43:41random information comes from the endpoints.
- 43:45And then it's in some ways not general enough,
- 43:47so it lacks some expressivity.
- 43:48We can't parametize all possible expected structures
- 43:51by picking a sigma in a row.
- 43:54And we lack some notion of sufficiency,
- 43:56i.e. if the graph is not complete,
- 43:58then this nice algebraic property
- 44:01that it actually didn't matter what the distribution was,
- 44:03this fails to hold.
- 44:04So if the graph is not complete,
- 44:06then projection onto the family of covariances
- 44:09parameterized in this fashion
- 44:11changes the expected global structure.
- 44:15So we would like to address these limitations.
- 44:17And so our goal for the next part of this talk
- 44:19is to really generalize these results.
- 44:21To generalize, we're going to
- 44:23switch our perspective a little bit.
- 44:25So I'll recall this formula
- 44:27that if we generate our edge flow
- 44:30by sampling quantities on the endpoints,
- 44:32and then plugging them into functions on the edges,
- 44:34then you necessarily get a covariance
- 44:35that's in this two parameter family
- 44:37where I have two scalar quantities
- 44:39associated with the statistics of the edge flow.
- 44:41That's the sigma in this row.
- 44:42And then I have some matrices that are associated
- 44:44with the topology of the network
- 44:45in the subspaces I'm projecting onto.
- 44:48These are related to a different way
- 44:51of looking at the graph.
- 44:52So I can start with my original graph
- 44:54and then I can convert it to an edge graph
- 44:57where I have one node per edge in the graph,
- 45:00and nodes are connected if they share an endpoint.
- 45:04You can then assign essentially signs to these edges
- 45:07based on whether the edge direction chosen
- 45:11in the original graph is consistent
- 45:12or inconsistent at the node that links to edges.
- 45:16So for example, edges one and two both point into this node,
- 45:20so there's an edge that's linking
- 45:21one and two in the edge graph with a positive sum.
- 45:25This essentially tells you that the influence of
- 45:29random information assigned on this node linking one and two
- 45:33would positively correlate the sample edge flow
- 45:36on edges one and two.
- 45:38Then this form, what this form
- 45:41sort of for covariance matrices says,
- 45:43is that we're looking at families of edge flows
- 45:46that have correlations on edges sharing an endpoint.
- 45:49So edges at distance one in this edge graph,
- 45:51and non-adjacent edges are
- 45:52entirely independent of each other.
- 45:56Okay?
- 45:58So that's essentially what
- 45:59the trait performance model is doing,
- 46:01is it's parameterizing a family of covariance matrices
- 46:04where we're modeling correlations at distance one,
- 46:06but not further in the edge graph.
- 46:08So then the natural thought
- 46:09for how to generalize these results is to ask,
- 46:11can we model longer distance correlations
- 46:13through this graph?
- 46:15To do so, let's think a little bit about
- 46:17what this matrix that's showing up inside the covariance is.
- 46:21So we have a gradient, tons of gradient transpose.
- 46:24This is an effect of Laplacian for that edge graph.
- 46:30And you can do this for other motifs.
- 46:32If you think about different sort of motif constructions,
- 46:35essentially if you take a product of M transpose times M,
- 46:38that will generate something that looks like a Laplacian
- 46:41or an adjacency matrix for a graph
- 46:44where I'm assigning nodes to be motifs
- 46:47and looking at the overlap of motifs.
- 46:50And if I look at M times M transpose,
- 46:52and I'm looking at the overlap of edges via shared motifs.
- 46:55So these operators you can think
- 46:56about as being Laplacians for some sort of graph
- 46:59that's generated from the original graph motifs.
- 47:04Like any adjacency matrix,
- 47:06powers of something like GG transpose minus 2I,
- 47:11that will model connections along longer paths
- 47:14along longer distances in these graphs
- 47:16associated with motifs,
- 47:17in this case with the edge graph.
- 47:20So our thought is maybe,
- 47:21well, we could extend this trait performance family
- 47:23of covariance matrices by instead of only looking at
- 47:27a linear combination of an identity matrix, and this matrix,
- 47:31we could look at a power series.
- 47:32So we could consider combining powers of this matrix.
- 47:37And this will generate this family of matrices
- 47:39that are parameterized by some set of coefficients-
- 47:41<v Robert>Dr. Strang?</v>
- 47:43<v ->Ah, yes?</v> <v ->I apologize (mumbles)</v>
- 47:44I just wanna remind you
- 47:46that we have a rather tight time limit,
- 47:48approximately a couple of minutes.
- 47:50<v ->Yes, of course.</v>
- 47:52So here, the idea is to parametize this family of matrices
- 47:57by introducing a set of polynomials with coefficients alpha,
- 48:00and then plugging into the polynomial,
- 48:03the Laplacian that's generated by sort of the,
- 48:06or the adjacent matrix
- 48:08generated by the graph motifs we're interested in.
- 48:11And that trait performance result,
- 48:12that was really just looking at the first order case here,
- 48:14that was looking at a linear polynomial
- 48:17with these chosen coefficients.
- 48:20This power series model is really nice analytically,
- 48:24so if we start with some graph operator M,
- 48:28and we consider the family of covariance matrices
- 48:31generated by plugging M, M transpose
- 48:34into some polynomial and power series,
- 48:36then this family of matrices is contained
- 48:39within the span of powers of M, M transpose.
- 48:45You can talk about this family
- 48:47sort of in terms of combinatorics.
- 48:48So for example, if we use that gradient
- 48:50times gradient transpose minus twice the identity,
- 48:52then powers of this is essentially, again, paths counting.
- 48:55So this is counting paths of length N.
- 48:58You can also look at things like the trace of these powers.
- 49:00So if you look at the trace series,
- 49:02that's the sequence where you look at the trace
- 49:04of powers of these,
- 49:06essentially these adjacency matrices.
- 49:09This is doing some sort of loop count
- 49:11where we're counting loops of different length.
- 49:14And you could think that this trace series
- 49:15in some sense is controlling amplification
- 49:17of self-correlations within the sampled edge flow.
- 49:22Depending on the generative model,
- 49:23we might wanna use different operators
- 49:25for generating this family.
- 49:26So, for example, going back to that
- 49:28synaptic plasticity model with coupled oscillators,
- 49:31in this case using the gradient to generate
- 49:34the family of covariance matrices.
- 49:35It's not really the right structure
- 49:37because the dynamics of the model
- 49:39sort of have these natural cyclic connections.
- 49:43So it's better to build the power series using the curl.
- 49:46So depending on your model,
- 49:47you can adapt this power series family
- 49:49by plugging in a different graph operator.
- 49:53Let's see now, what happens if we try to compute
- 49:55the expected sizes of some components
- 49:58using a power series of this form?
- 50:00So if the variance or covariance matrix
- 50:04for our edge flow is a power series in,
- 50:06for example, the gradient, gradient transpose,
- 50:08then the expected sizes of the measures
- 50:12can all be expressed as
- 50:13linear combinations of this trace series
- 50:16and the coefficients of the original polynomial.
- 50:19For example, the expected cyclic size of the flow
- 50:21is just the polynomial evaluated at negative two
- 50:24multiplied by the number of the loops in the graph.
- 50:26And this really generalizes that trait performance result
- 50:29because the trait performance result is given
- 50:31by restricting these polynomials to be linear.
- 50:34Okay?
- 50:36This you can extend sort of to other bases,
- 50:41but really what this accomplishes is
- 50:43by generalizing trait performance,
- 50:45we achieve this sort of generic properties
- 50:50that it failed to have.
- 50:52So in particular, if I have an edge flow subspace S
- 50:56spanned by a set of flow motifs stored in some operator M,
- 50:59then this power series family of covariance
- 51:01is associated with the Laplacian,
- 51:03that is M times M transpose is both expressive
- 51:07in the sense that for any non-negative A and B,
- 51:11I can pick some alpha and beta,
- 51:13so that the expected size of the projection of F
- 51:16onto the subspace is A,
- 51:18and the projected size of F
- 51:19onto the subspace orthogonal to S is B
- 51:23for any covariance in this power series family.
- 51:27And it's sufficient in the sense
- 51:29that for any edge flow distribution
- 51:31with mean zero in covariance V.
- 51:35If C is the matrix nearest to V in Frobenius norm
- 51:38restricted to the power series family,
- 51:40then these inner products computed in terms of C
- 51:44are exactly the same as the inner products
- 51:46computed in terms of V,
- 51:47so they directly predict the structure,
- 51:49which means that if I use this power series family,
- 51:51discrepancies off of this family
- 51:54don't change the expected structure.
- 51:57Okay?
- 51:57So I know I'm short on time here,
- 51:59so I'd like to skip then just to the end of this talk.
- 52:03There's further things you can do with this,
- 52:04this is sort of really nice.
- 52:06Mathematically you can build an approximation theory
- 52:08out of this and study for different random graph families,
- 52:12how many terms in these power series you need?
- 52:15And those terms define some nice,
- 52:17sort of simple minimal set of statistics
- 52:19to try to sort of estimate structure,
- 52:22but I'd like to really just get to the end here
- 52:25and emphasize the takeaways from this talk.
- 52:28So the first half of this talk
- 52:30was focused on information flow.
- 52:32What we saw is that information flow is a non-trivial,
- 52:35but well studied, estimation problem.
- 52:37And this is something that at least on my side
- 52:38sort of is a work in progress with students.
- 52:41Here in some ways, the conclusion of that first half
- 52:43would be that causation entropy
- 52:45may be a more appropriate measure than TE
- 52:47when trying to build these flow graphs
- 52:49to apply these structural measures to.
- 52:51Then on the structural side,
- 52:53we can say that power series family,
- 52:55this is a nice family of covariance matrices.
- 52:57It has nice properties that are useful empirically
- 52:59because they let us build global correlation structures
- 53:02from a sequence of local correlations
- 53:03from that power series.
- 53:06If you plug this back into the expected measures,
- 53:08you can recover monotonic relations,
- 53:10like in that limited trait performance case.
- 53:12And truncation of these power series
- 53:14reduces the number of quantities
- 53:16that you would actually need to measure.
- 53:19Actually to a number of quantities
- 53:20that can be quite small relative to the graph,
- 53:22and that's where this approximation theory comes in.
- 53:25One way, sort of maybe to summarize this entire approach
- 53:28is what we've done is by looking at these power series
- 53:31built in terms of the graph operators
- 53:33is it provides a way to study
- 53:35inherently heterogeneous connections,
- 53:38or covariances, or edge flow distributions
- 53:41using a homogeneous correlation model
- 53:43that's built sort of at multiple scales
- 53:45by starting the local scale, and then looking at powers.
- 53:49In some ways this is a comment
- 53:50that I ended a previous version of this talk with.
- 53:53I still think that this structural analysis is in some ways
- 53:56a hammer seeking a nail,
- 53:57and that this inflammation flow construction,
- 53:59this is work in progress to try to build that nail.
- 54:02So thank you all for your attention,
- 54:04I'll turn it now over to questions.
- 54:09<v Robert>(mumbles) really appreciate it.</v>
- 54:14Unfortunately, for those of you on Zoom,
- 54:16you're welcome to keep up the conversation,
- 54:17so (mumbles) unfortunately have to clear the room.
- 54:20So I do apologize (mumbles)
- 54:25Dr. Steinman?
- 54:27It might be interesting, yeah. (laughs)
- 54:28(students laugh)
- 54:30Dr. Strang? <v ->Oh, yes, yeah.</v>
- 54:33<v Robert>Okay, do you mind if people...?</v>
- 54:35Yeah, we have to clear the room,
- 54:36do you mind if people email you if they have questions?
- 54:40<v ->I'm sorry, I couldn't hear the end of the question.</v>
- 54:42Do I mind if...?
- 54:45<v Robert>We have to clear the room,</v>
- 54:47do you mind if people email you if they have questions?
- 54:49<v ->No, not at all.</v>
- 54:50<v Robert>(mumbles) may continue the conversation,</v>
- 54:52so I do apologize, they are literally
- 54:54just stepping in the room right now.
- 54:57<v ->Okay, no, yeah, that's totally fine.</v>
- 54:59<v Robert>Thank you, thank you.</v>
- 55:01And thanks again for a wonderful talk.
- 55:03<v ->Thank you.</v>