YSPH Biostatistics Seminar: “Motif Expansion of Global Information Flow in Spike Train Data”
September 15, 2022Information
Alexander Strang, Kruskal Instructor, Statistics and Applied Math, University of Chicago
September 12, 2022
ID8065
To CiteDCA Citation Guide
- 00:00<v Robert>Good afternoon.</v>
- 00:01In respect for everybody's time today,
- 00:04let's go ahead and get started.
- 00:07So today it is my pleasure to introduce,
- 00:09Dr. Alexander Strang.
- 00:12Dr. Strang earned his bachelor's in Mathematics and Physics,
- 00:16as well as his PhD in Applied Mathematics
- 00:19from Case Western Reserve University in Cleveland, Ohio.
- 00:24Born in Ohio, so representer.
- 00:27He studies variational inference problems,
- 00:29noise propagation in biological networks,
- 00:32self-organizing edge flows,
- 00:34and functional form game theory
- 00:36at the University of Chicago,
- 00:38where he is a William H. Kruskal Instructor of Statistics,
- 00:42and Applied Mathematics.
- 00:43Today he is going to talk to us about motivic expansion
- 00:47of global information flow in spike train data.
- 00:50Let's welcome our speaker.
- 00:54<v ->Okay, thank you very much.</v>
- 00:56Thank you first for the kind invite,
- 00:59and for the opportunity to speak here in your seminar.
- 01:03So I'd like to start with some acknowledgements.
- 01:06This is very much a work in progress.
- 01:09Part of what I'm going to be showing you today
- 01:11is really the work of a master's student
- 01:12that I've been working with this summer, that's Bowen.
- 01:15And really I'd like to thank Bowen
- 01:16for a lot of the simulation
- 01:18and a lot of the TE calculation I'll show you later.
- 01:21This project more generally was born
- 01:22out of conversations with Brent Doiron
- 01:24and Lek-Heng Lim here at Chicago.
- 01:27Brent really was the inspiration for
- 01:30starting to venture into computational neuroscience.
- 01:33I'll really say that that I am new to this world,
- 01:35it's a world that's exciting to me,
- 01:37but really is a world that I am actively exploring
- 01:41and learning about.
- 01:42So I look forward to conversations afterwards
- 01:44to learn more here.
- 01:46My background was much more inspired by Lek-Heng's work
- 01:49in computational topology.
- 01:52And some of what I'll be presenting today
- 01:54is really inspired by conversations with him.
- 01:58So let's start with some introduction and motivation.
- 02:01The motivation personally for this talk,
- 02:05so it goes back really to work that I started
- 02:06when I was a graduate student,
- 02:08I've had this sort of long standing interest
- 02:10in the interplay between structure and dynamics,
- 02:12in particular on networks.
- 02:14I've done interesting questions like,
- 02:16how does the structure of a network determine dynamics
- 02:18of processes on that network?
- 02:21And in turn, how do processes on that network
- 02:24give rise to structure?
- 02:30On the biological side,
- 02:32today's talk I'm going to be focusing on
- 02:34sort of applications of this question
- 02:36within neural networks.
- 02:38And I think that this sort of world of
- 02:39computational neuroscience is really exciting
- 02:41if you're interested in this interplay
- 02:42between structure and dynamics
- 02:44because neural networks encode, transmit
- 02:46and process information via dynamical processes.
- 02:49For example, the process, the dynamical process
- 02:53of a neural network is directed by the wiring patterns
- 02:56by the structure of that network.
- 02:58And moreover, if you're talking
- 03:00about some sort of learning process,
- 03:01then those wiring patterns can change
- 03:03and adapt during the learning process,
- 03:05so that are themselves dynamic.
- 03:08In this area I've been interested in questions like,
- 03:10how is the flow of information governed
- 03:12by the wiring pattern?
- 03:14How do patterns of information flow
- 03:16present themselves in data?
- 03:17And can they be inferred from that data?
- 03:19And what types of wiring patterns
- 03:21might develop during learning?
- 03:24Answering questions of this type requires
- 03:26a couple of things.
- 03:26Sort of very big picture, it requires a language
- 03:29for describing structures and patterns.
- 03:31It requires having a dynamical process,
- 03:33some sort of model for the neural net,
- 03:35and it requires a generating model
- 03:38that generates initial structure
- 03:40and links the structure to dynamics.
- 03:42Alternatively, if we don't generate things using a model,
- 03:45if we have some sort of observable or data,
- 03:47then we can try to work in the other direction
- 03:49and go from dynamics to structure.
- 03:52Today during this talk,
- 03:53I'm gonna be focusing really on this first piece,
- 03:55a language for describing structures and patterns.
- 03:57And on the second piece on sort of an observable
- 04:00that I've been working on trying to compute to use,
- 04:04to try to connect these three points together.
- 04:08So to get started, a little bit of biology.
- 04:10Really I was inspired in this project
- 04:12by a paper from K.G. Mura.
- 04:14Here he was looking at a coupled oscillator model,
- 04:17this is a Kuramoto model for neural activity
- 04:20where the firing dynamics interact with the wiring.
- 04:22So the wiring in the couples,
- 04:25the oscillators would adapt on a slower time scale
- 04:29than the oscillators themselves.
- 04:31And that adaptation could represent
- 04:34different types of learning processes.
- 04:36For example, the fire-together wire-together rules
- 04:39or Hebbian learning,
- 04:41you can look at causal learning rules,
- 04:43or anti-Hebbian learning rules.
- 04:45And this is just an example I've run of this system.
- 04:48This system of OD is sort of interesting
- 04:50because it can generate all sorts of different patterns.
- 04:52You can see synchronized firing,
- 04:54you can see traveling waves,
- 04:55you can see chaos,
- 04:57these occur at different sort of critical boundaries.
- 04:59So you can see phase transitions
- 05:01when you have large collections of these oscillators.
- 05:04And depending on how they're coupled together,
- 05:05it behaves differently.
- 05:07In particular some of what's interesting here is
- 05:09that starting from some random seed topology,
- 05:13the dynamics play forward from that initial condition,
- 05:16and that random seed topology produces some ensemble of
- 05:19wiring patterns that are of themselves random.
- 05:22You can think about the ensemble of wiring patterns
- 05:24as being chaotic,
- 05:25sort of realizations of some random initialization.
- 05:29That said, you can also observe structures
- 05:32within the systems of coupled oscillators.
- 05:33So you could see large scale cyclic structures
- 05:36representing organized causal firing patterns
- 05:38in certain regimes.
- 05:40So this is a nice example where graph structure
- 05:42and learning parameters can determine dynamics,
- 05:44and in turn where those dynamics can determine structure.
- 05:48On the other side, you can also think
- 05:49about a data-driven side instead of a model-driven side.
- 05:53If we empirically observe sample trajectories
- 05:56of some observables, for example, neuron recordings,
- 05:58then we might hope to infer something
- 05:59about the connectivity that generates them.
- 06:01And so here instead of starting by posing a model,
- 06:04and then simulating it and studying how it behaves,
- 06:06we can instead study data or try to study structure in data.
- 06:10Often that data comes in the form of covariance matrices
- 06:12representing firing rates.
- 06:14And these covariance matrices maybe auto covariance matrices
- 06:17with some sort of time lag.
- 06:19Here there are a couple of standard structural approaches,
- 06:22so there is a motivic expansion approach.
- 06:25This was at least introduced by Brent Doiron's lab
- 06:28with his student, Gay Walker.
- 06:30Here the idea is that you define some graph motifs,
- 06:34and then you can study the dynamics
- 06:36in terms of those graph motifs.
- 06:38For example, if you build a power series in those motifs,
- 06:41then you can try to represent your covariance matrices
- 06:44in terms of that power series.
- 06:45And this is something we're gonna talk
- 06:46about quite a bit today.
- 06:47This is really part of why I was inspired by this work is,
- 06:49I had been working separately on the idea of
- 06:51looking at covariance matrices
- 06:53in terms of these power series expansions.
- 06:56This is also connected to topological data analysis,
- 06:59and this is where the conversations with Lek-Heng
- 07:01played a role in this work.
- 07:03Topological data analysis aims to construct graphs,
- 07:07representing dynamical sort of systems.
- 07:08For example, you might look at the dynamical similarity
- 07:11of firing patterns of certain neurons,
- 07:13and then tries to study the topology of those graphs.
- 07:18Again, this sort of leads to similar questions,
- 07:20but we can be a little bit more precise here
- 07:21for thinking in neuroscience.
- 07:24We can say more precisely, for example,
- 07:25how is information processing and transfer represented,
- 07:29both in these covariance matrices and the structures
- 07:32that we hope to extract from them.
- 07:33In particular, can we try
- 07:35and infer causality from firing patterns?
- 07:39And this is fundamentally an information theoretic question.
- 07:42Really we're asking, can we study
- 07:43the directed exchange of information from trajectories?
- 07:47Here one approach, I mean, in some sense
- 07:49you can never tell causality without some underlying model,
- 07:53without some underlying understanding of the mechanism.
- 07:56So if all we can do is observe,
- 07:58then we need to define what we mean by causality.
- 08:01A reasonable sort of standard definition here
- 08:03is Wiener Causality,
- 08:04which says that two times series share a causal relation,
- 08:06so we say X causes Y,
- 08:08if the history of X informs a future of Y.
- 08:12Note that here "cause" put in quotes,
- 08:14really means forecasts.
- 08:15That means that the past or the present of X
- 08:18carries relevant information about the future of Y.
- 08:22A natural measure of that information is transfer entropy.
- 08:26Transfer entropy was introduced by Schreiber in 2000,
- 08:30and it's the expected KL divergence
- 08:32between the distribution of the future of Y
- 08:35given the history of X
- 08:38and the marginal distribution of the future of Y.
- 08:41So essentially it's how much predictive information
- 08:43does X carry about Y?
- 08:46This is a nice quantity for a couple of reasons.
- 08:48First, it's zero when two trajectories are independent.
- 08:51Second, since it's just defined in terms of
- 08:53these conditional distributions, it's model free.
- 08:56So I put here no with a star because this,
- 08:59the generative assumptions actually do matter
- 09:01when you go to try and compute it,
- 09:02but in principle it's defined independent of the model.
- 09:05Again, unlike some other effective causality measures,
- 09:07it doesn't require introducing some time lag to define.
- 09:11It's a naturally directed quantity, right?
- 09:13We can say that the future of Y
- 09:15conditioned on the past of X and...
- 09:17That transfer entropy is defined on the terms of
- 09:20the future of Y conditioned on the past of X and Y.
- 09:23And that quantity is directed because reversing X and Y,
- 09:27it does not sort of symmetrically change this statement.
- 09:30This is different than quantities
- 09:31like mutual information or correlation
- 09:32that are also often used
- 09:34to try to measure effective connectivity in networks,
- 09:37which are fundamentally symmetric quantities.
- 09:41Transfer entropy is also less corrupted
- 09:43by measurement noise, linear mixing of signals,
- 09:46or shared coupling to external sources.
- 09:50Lastly, and maybe most interestingly,
- 09:52if we think in terms of correlations,
- 09:54correlations are really moments,
- 09:56correlations are really about covariances, right?
- 09:57Second order moments.
- 09:59Transfer entropies, these are about entropies,
- 10:01these are sort of logs of distributions,
- 10:04and so they depend on the full shape of these distributions,
- 10:06which means that transfer entropy can capture coupling
- 10:10that is maybe not apparent or not obvious,
- 10:13just looking at a second order moment type analysis.
- 10:17So transfer entropy has been applied pretty broadly.
- 10:20It's been applied to spiking cortical networks
- 10:22and calcium imaging,
- 10:24to MEG data in motor tasks and auditory discrimination.
- 10:29It's been applied to motion recognition,
- 10:31precious metal prices
- 10:32and multivariate time series forecasting,
- 10:34and more recently to accelerate learning
- 10:36in different artificial neural nets.
- 10:38So you can look at feedforward architectures,
- 10:40convolution architectures, even recurrent neural nets,
- 10:42and transfer entropy has been used
- 10:44to accelerate learning in those frameworks.
- 10:49For this part of the talk,
- 10:50I'd like to focus really on two questions.
- 10:52First, how do we compute transfer entropy?
- 10:55And then second, if we could compute transfer entropy
- 10:58and build a graph out of that,
- 11:00how would we study the structure of that graph?
- 11:01Essentially, how is information flow structured?
- 11:05We'll start with computing in transfer entropy.
- 11:09To compute transfer entropy,
- 11:10we actually need to write down an equation.
- 11:13So transfer entropy was originally introduced
- 11:14for discrete time arbitrary order Markov processes.
- 11:18So suppose we have two Markov processes: X and Y,
- 11:21and we'll let X of N denote the state of
- 11:23process X at time N,
- 11:25and XNK where the K is in superscript to denote the sequence
- 11:29starting from N minus K plus one going up to N.
- 11:32So that's sort of the last K states
- 11:35that the process X visited,
- 11:37then the transfer entropy from Y to X,
- 11:40they're denoted T, Y arrow to X
- 11:45is the entropy of the future of X conditioned on its past
- 11:50minus the entropy of the future of X conditioned on its past
- 11:54and the past of the trajectory Y.
- 11:56So here you can think the transfer entropy
- 11:57is essentially the reduction in entropy
- 11:59of the future states of X
- 12:00when incorporating the past of Y.
- 12:03This means that computing transfer entropy
- 12:05reduces to estimating essentially these entropies.
- 12:07That means we need to be able to estimate
- 12:08essentially the conditional distributions
- 12:10inside of these parentheses.
- 12:14That's easy for certain processes.
- 12:15So for example, if X and Y are Gaussian processes,
- 12:19then really what we're trying to compute
- 12:20is conditional mutual information.
- 12:22And there are nice equations
- 12:23for conditional mutual information
- 12:25when you have Gaussian random variables.
- 12:26So if I have three Gaussian random variables: X, Y, Z,
- 12:29possibly multivariate with joint covariance sigma,
- 12:33then the conditional mutual information
- 12:35between these variables,
- 12:36so the mutual information between X and Y conditioned on Z
- 12:39is just given by this ratio of log determinants
- 12:42of those covariances.
- 12:45In particular, a common test model used
- 12:48in sort of the transfer entropy literature
- 12:51are linear auto-regressive processes
- 12:53because a linear auto-regressive process
- 12:55when perturbed by Gaussian noise
- 12:57produces a Gaussian process.
- 12:58All of the different
- 12:59joint marginal conditional distributions are all Gaussian,
- 13:02which means that we can compute
- 13:03these covariances analytically,
- 13:05which then means that you can compute
- 13:06the transfer entropy analytically.
- 13:07So these linear auto-regressive processes
- 13:09are nice test cases
- 13:10because you can do everything analytically.
- 13:12They're also somewhat disappointing or somewhat limiting
- 13:15because in this linear auto-regressive case,
- 13:17transfer entropy is the same as Granger causality.
- 13:22And in this Gaussian case,
- 13:24essentially what we've done is we've reduced
- 13:26transfer entropy to a study of time-lagged correlations,
- 13:29so this becomes the same
- 13:30as sort of a correlation based analysis,
- 13:32we can't incorporate information beyond the second moments,
- 13:34if we restrict ourselves to Gaussian processes
- 13:36or Gaussian approximations.
- 13:39The other thing to note is this is strongly model-dependent
- 13:41because this particular formula
- 13:43for computing mutual information
- 13:44depends on having Gaussian distributions.
- 13:50In a more general setting or a more empirical setting,
- 13:53you might observe some data.
- 13:55You don't know if that data
- 13:56comes from some particular process,
- 13:58so you can't necessarily assume
- 13:59that conditional distributions are Gaussian,
- 14:01but we would still like to estimate transfer entropy,
- 14:03which leads to the problem of estimating transfer entropy
- 14:06given an observed time series.
- 14:08We would like to do this again, sans some model assumption,
- 14:10so we don't wanna assume Gaussianity.
- 14:13This is sort of trivial,
- 14:14again, I star that in discrete state spaces
- 14:17because essentially it amounts to counting occurrences,
- 14:20but it becomes difficult whenever the state spaces are large
- 14:23and/or high dimensional as they often are.
- 14:26This leads to a couple of different approaches.
- 14:28So as a first example, let's consider spike train data.
- 14:32So spike train data consists essentially of
- 14:35binning the state of a neuron into either on or off.
- 14:39So neurons, you can think either in the state zero or one,
- 14:41and then a pair wise calculation for transfer entropy
- 14:44only requires estimating a joint probability distribution
- 14:48over four to the K plus L states where K plus L,
- 14:51K is the history of X that we remember,
- 14:54and L is the history of Y.
- 14:57So if sort of the Markov process
- 14:59generating the spike train data is not of high order,
- 15:02does not have a long memory,
- 15:04then these K and L can be small,
- 15:06and this state space is fairly small,
- 15:08so this falls into that first category
- 15:10when we're looking at a discrete state space,
- 15:12and it's not too difficult.
- 15:15In a more general setting,
- 15:16if we don't try to bin the states
- 15:18of the neurons to on or off,
- 15:19for example, maybe we're looking at a firing rate model
- 15:22where we wanna look at the firing rates of the neurons,
- 15:24and that's a continuous random variable,
- 15:27then we need some other types of estimators.
- 15:29So the common estimator used here
- 15:31is a kernel density estimator, a KSG estimator,
- 15:34and this is designed for large continuous
- 15:36or high dimensional state spaces,
- 15:37e.g. sort of these firing rate models.
- 15:40Typically the approach is to employ a Takens delay map,
- 15:43which embeds your high dimensional data
- 15:45in some sort of lower dimensional space
- 15:48that tries to capture the intrinsic dimension
- 15:50of the attractor that your dynamic process settles onto.
- 15:55And then you try to estimate an unknown density
- 15:57based on this delay map using a k-nearest
- 15:59neighbor kernel density estimate.
- 16:01The advantage of this sort of
- 16:03k-nearest neighbor kernel density is
- 16:05that it dynamically adapts the width of the kernel,
- 16:07giving your sample density.
- 16:09And this has been implemented in some open source toolkits,
- 16:11these are the toolkits that we've been working with.
- 16:15So we've tested this in a couple of different models,
- 16:18and really I'd say this work,
- 16:19this is still very much work in progress,
- 16:20this is work that Bowen was developing over the summer.
- 16:23And so we developed a couple different models to test.
- 16:26The first were these Linear Auto-Regressive Networks,
- 16:29and we just used these to test
- 16:31the accuracy of the estimators
- 16:32because everything here is Gaussian,
- 16:33so you can compute the necessary
- 16:35transfer entropies analytically.
- 16:37The next more interesting class of networks
- 16:39are Threshold Linear Networks or TLNs.
- 16:42These are a firing rate model where your rate R
- 16:44obeys this sarcastic differential equation.
- 16:47So the rate of change and the rate, DR of T is,
- 16:51so you have sort of a leak term, negative RFT,
- 16:53and then plus here, this is essentially a coupling.
- 16:57All of this is inside here, the brackets with a plus,
- 17:00this is like a ReLU function,
- 17:02so this is just taking the positive part
- 17:04of what's on the inside.
- 17:05Here B is an activation threshold,
- 17:08W is a wiring matrix,
- 17:09and then R are those rates, again.
- 17:11And then C here, that's essentially covariance
- 17:13for some noise term for terming this process,
- 17:17we use these TLNs to test the sensitivity
- 17:19of our transfer entropy estimators
- 17:21to common and private noise sources as you change C,
- 17:24as well as sort of how well the entropy network
- 17:26agrees with the wiring matrix.
- 17:31A particular class of TLNs
- 17:32were really nice for these experiments
- 17:35what are called Combinatorial Threshold Linear Networks.
- 17:37These are really pretty new,
- 17:38these were introduced by Carina Curto's lab this year,
- 17:42and really this was inspired by a talk
- 17:45I'd seen her give at FACM in May.
- 17:49These are threshold linear networks
- 17:51where the weight matrix here, W,
- 17:52representing the wiring of the neurons
- 17:55is determined by a directed graph G.
- 17:58So you start with some directed graph G,
- 18:00that's what's shown here on the left.
- 18:01This figure is adapted from Carina's paper,
- 18:03this is a very nice paper
- 18:04if you'd like to take a look at it.
- 18:07And if I and J are not connected,
- 18:10then the weight matrix is assigned one value;
- 18:12and if they are connected, then it's assigned another value,
- 18:14and the wiring is zero if I equals J.
- 18:18These networks are nice
- 18:20if we wanna test structural hypotheses
- 18:22because it's very easy to predict from the input graph
- 18:25how the output dynamics of the network should behave,
- 18:28and they are really beautiful analysis
- 18:30that Carina does in this paper to show
- 18:32that you can produce all these different
- 18:33interlocking patterns of limit cycles
- 18:35and multistable states,
- 18:36and chaos, and all these nice patterns,
- 18:38and you can design them by picking these nice
- 18:41sort of directed graphs.
- 18:44The last class of networks that we've built to test
- 18:46are Leaky-Integrate and Fire Networks.
- 18:48So here we're using a Leaky-Integrate and Fire model
- 18:51where our wiring matrix, W, is drawn randomly.
- 18:54It's block stochastic, which means
- 18:57that it's (indistinct) between blocks.
- 19:00And it's a balanced network,
- 19:02so we have excitatory and inhibitory neurons
- 19:04that talk to each other,
- 19:06and maintain a sort of a balance in the dynamics here.
- 19:09The hope is to pick a large enough scale network
- 19:11that we see properly chaotic dynamics
- 19:13using this Leaky-Integrate and Fire model.
- 19:17These tests have yielded fairly mixed results,
- 19:21so the simple tests behave sort of as expected.
- 19:24So the estimators that are used are biased,
- 19:27and the bias typically decays slower
- 19:29than the variance in estimation,
- 19:30which means that you do need fairly long trajectories
- 19:32to try to properly estimate the transfer entropy.
- 19:36That said, transfer entropy does correctly identify
- 19:38causal relationships and simple graphs,
- 19:40and transfer entropy matches the underlying structure
- 19:44used in a Combinatorial Threshold Linear Network, so CTLN.
- 19:49Unfortunately, these results did not carry over as cleanly
- 19:52to the Leaky-Integrate and Fire models
- 19:54or to model sort of larger models.
- 19:56So what I'm showing you on the right here,
- 19:58this is a matrix where we've calculated
- 20:00the pairwise transfer entropy between all neurons
- 20:03in a 150 neuron balanced network.
- 20:06This has shown absolute, this has shown in the log scale.
- 20:09And the main thing I wanna highlight for it
- 20:11to taking a look at this matrix
- 20:12is very hard to see exactly what the structure is.
- 20:15You see this banding,
- 20:17that's because neurons tend to be highly predictive
- 20:20if they fire a lot.
- 20:21So there's a strong correlation
- 20:22between the transfer entropy, between X and Y,
- 20:25and just the activity level of X,
- 20:29but it's hard to distinguish block-wise differences,
- 20:31for example, between inhibitory neurons
- 20:33and excitatory neurons, and that really takes plotting out.
- 20:36So here this box in a whisker plot on the bottom,
- 20:39this is showing you if we group entries of this matrix
- 20:43by the type of connection,
- 20:44so maybe excitatory to excitatory,
- 20:46or inhibitory to excitatory, or so on,
- 20:48that the distribution of realized transfer entropy
- 20:50is really a different,
- 20:52but they're different in sort of subtle ways.
- 20:54So in this sort of larger scale balance network,
- 20:58it's much less clear whether transfer entropy
- 21:02effectively is like equated in some way
- 21:05with the true connectivity or wiring.
- 21:09In some ways, this is not a surprise
- 21:10because the behavior of the balance networks
- 21:12is inherently balanced,
- 21:13and (indistinct) inherently unstructured,
- 21:16but there are ways in which these experiments
- 21:18have sort of revealed confounding factors
- 21:20that are conceptual factors
- 21:22that make transfer entropies
- 21:24not as sort of an ideal measure,
- 21:25or maybe not as ideal as it seems
- 21:28given the start of this talk.
- 21:29So for example, suppose two trajectories:
- 21:33X and Y are both strongly driven by a third trajectory, Z,
- 21:36but X responds to Z first.
- 21:39Well, then the present information about X
- 21:40or the present state of X carries information
- 21:42about the future of Y, so X is predictive of Y,
- 21:45so X forecast Y.
- 21:46So in the transfer entropy or Wiener causality setting,
- 21:48we would say X causes Y,
- 21:51even if X and Y are only both responding to Z.
- 21:54So here in this example,
- 21:56suppose you have a directed tree where information
- 21:59or sort of dynamics propagate down the tree.
- 22:02If you look at this node here, PJ and I,
- 22:07PJ will react
- 22:08to essentially information traveling down this tree
- 22:12before I does, so PJ would be predictive for I.
- 22:15So we would observe an effective connection
- 22:19where PJ forecasts I,
- 22:21which means that neurons that are not directly connected
- 22:23may influence each other,
- 22:24and that this transfer entropy
- 22:26really you should think of in terms of forecasting,
- 22:29not in terms of being a direct analog to the wiring matrix.
- 22:33One way around this is to condition on the state
- 22:35of the rest of the network
- 22:37before you start doing some averaging.
- 22:39This leads to some other notions of entropy.
- 22:41So for example, causation entropy,
- 22:42and this is sort of a promising direction,
- 22:44but it's not a time to explore yet.
- 22:47So that's the estimation side,
- 22:49those are the tools for estimating transfer entropy.
- 22:52Now let's switch gears
- 22:53and talk about that second question I had introduced,
- 22:55which is essentially, how do we analyze structure?
- 22:57Suppose we could calculate a transfer entropy graph,
- 23:00how would we extract structural information from that graph?
- 23:04And here, I'm going to be introducing some tools
- 23:06that I've worked on for awhile
- 23:08for describing sort of random structures and graphs.
- 23:11These are tied back to some work I'd really done
- 23:15as a graduate student in conversations with Lek-Heng.
- 23:18So we start in a really simple context,
- 23:19which is the graph or network.
- 23:21This could be directed or undirected,
- 23:22however, we're gonna require that does not have self-loops,
- 23:24then it's finite.
- 23:26We'll let V here be the number of vertices
- 23:28and E be the number of edges.
- 23:30Then the object of study that we'll introduce
- 23:33is something called an edge flow.
- 23:34An edge flow is essentially a function
- 23:35on the edges of the graph.
- 23:37So this is a function that accepts pairs of endpoints
- 23:40and returns a real number,
- 23:42and this is an alternating function.
- 23:43So if I had to take F of IJ,
- 23:45that's negative F of JI
- 23:47because you can think of F of IJ as being some flow,
- 23:49like a flow of material between I and J,
- 23:52hence this name, edge flow.
- 23:54This is analogous to a vector field
- 23:56because this is like the analogous construction
- 23:58to a vector field in the graph,
- 23:59and represents some sort of flow between nodes.
- 24:02Edge flows are really sort of generic things,
- 24:04so you can take this idea of an edge flow
- 24:07and apply it in a lot of different areas
- 24:09because really all you need is,
- 24:10you just need to structure some alternating function
- 24:12on the edges of the graph.
- 24:13So I've sort of read papers
- 24:16and worked in a bunch of these different areas,
- 24:19particularly I've focused on applications of this
- 24:21in game theory, in pairwise and social choice settings,
- 24:25in biology and Markov chains.
- 24:26And a lot of this project has been attempting
- 24:28to take this experience working with edge flows in,
- 24:31for example, say non-equilibrium thermodynamics
- 24:34or looking at pairwise preference data,
- 24:36and looking at a different application area
- 24:38here to neuroscience.
- 24:40Really you could think about the edge flow
- 24:42or a relevant edge flow in neuroscience,
- 24:43you might be asking about asymmetries and wiring patterns,
- 24:46or differences in directed influence or causality,
- 24:49or really you could think about these
- 24:50transfer entropy quantities.
- 24:51This is why I was excited about transfer entropy.
- 24:53Transfer entropy is inherently directed notion
- 24:56of information flow,
- 24:57so it's natural to think
- 24:59that if you can calculate things like a transfer entropy,
- 25:01then really what you're studying
- 25:03is some sort of edge flow on a graph.
- 25:06Edge flows often are subject to
- 25:08sort of the same set of common questions.
- 25:10So if I wanna analyze the structure of an edge flow,
- 25:12there's some really big global questions
- 25:14that I would often ask,
- 25:15that get asked in all these different application areas.
- 25:19One common question is,
- 25:20well, does the flow originate somewhere and end somewhere?
- 25:23Are there sources and sinks in the graph?
- 25:25Another is, does it circulate?
- 25:26And if it does circulate, on what scales and where?
- 25:31If you have a network that's connected
- 25:33to a whole exterior network,
- 25:34for example, if you're looking at some small subsystem
- 25:37that's embedded in a much larger system
- 25:38as is almost always the case in neuroscience,
- 25:41then you also need to think about,
- 25:42what passes through the network?
- 25:43So is there a flow or a current that moves
- 25:46through the boundary of the network?
- 25:47Is there information that flows through the network
- 25:51that you're studying?
- 25:52And in particular if we have these different types of flow,
- 25:55if flow can originate and source and end in sinks,
- 25:57if it can circulate, if it can pass through,
- 25:59can we decompose the flow into pieces that do each of these,
- 26:03and ask how much of the flow does one, two, or three?
- 26:07Those questions lead to a decomposition.
- 26:11So here we're going to start with this simple idea,
- 26:13we're going to decompose an edge flow
- 26:15by projecting it onto orthogonal subspaces
- 26:17associated with some graph operators.
- 26:20Generically if we consider two linear operators: A and B,
- 26:24where the product A times B equals zero,
- 26:27then the range of B must be contained
- 26:29in the null space of A,
- 26:31which means that I can express
- 26:33essentially any set of real numbers.
- 26:35So you can think of this as being the vector space
- 26:38of possible edge flows as a direct sum of the range of B,
- 26:43the range of A transpose
- 26:45and the intersection of the null space of B transpose
- 26:47in the null space of A.
- 26:48This blue subspace, this is called the harmonic space,
- 26:53and this is trivial
- 26:56in many applications
- 26:58if you choose A and B correctly.
- 27:00So there's often settings where you can pick A and B,
- 27:02so that these two null spaces have no intersection,
- 27:06and then this decomposition boils down
- 27:08to just separating a vector space
- 27:10into the range of B and the range of A transpose.
- 27:16In the graph setting,
- 27:17our goal is essentially to pick these operators
- 27:19to the meaningful things.
- 27:20That is to pick graph operators,
- 27:22so that these subspaces carry a meaningful,
- 27:26or carry meaning in the structural context.
- 27:30So let's think a little bit about graph operators here,
- 27:33so let's look at two different classes of operators.
- 27:35So we can consider matrices that have E rows and N columns,
- 27:40or matrices that have L rows and E columns where,
- 27:44again, E is the number of edges in this graph.
- 27:48If I have a matrix with E rows,
- 27:50then each column of the matrix has as many entries
- 27:53as there are edges in the graph,
- 27:55so it can be thought of as itself an edge flow.
- 27:57So you could think that this matrix is composed
- 27:59of a set of columns where each column is some particular
- 28:02sort of motivic flow or flow motif.
- 28:05In contrast if I look at a matrix where I have E columns,
- 28:09then each row of the matrix is a flow motif,
- 28:11so products against M
- 28:14evaluate inner products against specific flow motifs.
- 28:18That means that in this context,
- 28:20if I look at the range of this matrix,
- 28:21this is really a linear combination
- 28:23of a specific subset of flow motifs.
- 28:25And in this context,
- 28:26if I look at the null space of the matrix,
- 28:28I'm looking at all edge flows orthogonal
- 28:30to that set of flow motifs.
- 28:32So here if I look at the range of a matrix with E rows,
- 28:36that subspace is essentially a modeling behavior
- 28:39similar to the motifs.
- 28:40So if I pick a set of motifs that flow out of a node
- 28:44or flow into a node,
- 28:45then this range is going to be a subspace of edge flows
- 28:48that tend to originate in sources and end in sinks.
- 28:51In contrast here, the null space of M,
- 28:54that's all edge flows orthogonal to the flow motifs,
- 28:57so it models behavior distinct from the motifs.
- 28:59Essentially this space asks, what doesn't the flow do?
- 29:02Whereas this space asks, what does the flow do?
- 29:07Here is a simple, sort of very classical example.
- 29:09And really this goes all the way back to,
- 29:11you could think like Kirchhoff electric circuit theory.
- 29:14We can define two operators.
- 29:15Here G, this is essentially a gradient operator.
- 29:18And if you've taken some graph theory,
- 29:20you might know this as the edge incidence matrix.
- 29:22This is a matrix which essentially records
- 29:25the endpoints of an edge
- 29:26and evaluates differences across it.
- 29:29So, for example, if I look at this first row of G,
- 29:33this corresponds to edge one in the graph,
- 29:35and if I had a function defined on the nodes in the graph,
- 29:39products with G would evaluate differences across this edge.
- 29:43If you look at its columns,
- 29:44each column here is a flow motif.
- 29:46So, for example, this highlighted second column,
- 29:49this is entries: one, negative one, zero, negative one.
- 29:52If you carry those back to the edges,
- 29:53that corresponds to this specific flow motif.
- 29:56So here this gradient,
- 29:58it's adjoint to essentially a divergence operator,
- 30:00which means that the flow motifs are unit inflows
- 30:03or unit outflows from specific nodes,
- 30:05like what's shown here.
- 30:07You can also introduce something like a curl operator.
- 30:10The curl operator evaluates paths, sums around loops.
- 30:13So this row here, for example, this is a flow motif
- 30:16corresponding to the loop labeled A in this graph.
- 30:20You could certainly imagine
- 30:21other operators' built cutter, other motifs,
- 30:23these operators are particularly nice
- 30:25because they define principled subspaces.
- 30:28So if we apply that generic decomposition,
- 30:31then we could say that the space
- 30:32of possible edge flows are E,
- 30:34it can be decomposed into the range of the grading operator,
- 30:37the range of the curl transpose,
- 30:39and the intersection of their null spaces
- 30:42into this harmonic space.
- 30:44This is nice because the range of the gradient that flows,
- 30:46it start and end somewhere.
- 30:48Those are flows that are associated with
- 30:50like motion down a potential.
- 30:52So these if you're thinking physics,
- 30:53you might say that these are sort of conservative,
- 30:55these are like flows generated by a voltage
- 30:57if you're looking at electric circuit.
- 30:59These cyclic flows, well, these are the flows
- 31:01in the range of the curl transpose,
- 31:03and then this harmonic space,
- 31:04those are flows that enter and leave the network
- 31:06without either starting or ending
- 31:09a sink or a source, or circulating.
- 31:11So you can think that really this decomposes
- 31:13the space of edge flows into flows that start
- 31:16and end somewhere inside the network.
- 31:17Flows that circulate within the network,
- 31:19and flows that do neither,
- 31:20i.e. flows that enter and leave the network.
- 31:22So this accomplishes that initial decomposition
- 31:25I'd set out at the start.
- 31:28Once we have this decomposition, then we can evaluate
- 31:31the sizes of the components of decomposition to measure
- 31:34how much of the flow starts and ends somewhere,
- 31:38how much circulates and so on.
- 31:39So we can introduce these generic measures
- 31:41we're given some operator N,
- 31:44we decompose the space of edge flows
- 31:46into the range of M and the null space of M transpose,
- 31:49which means we can project F onto these subspaces,
- 31:52and then just evaluate the sizes of these components.
- 31:55And that's a way of measuring
- 31:57how much of the flow behaves like
- 31:59the flow motifs contained in this operator,
- 32:01and how much it doesn't.
- 32:04So, yeah, so that lets us answer this question,
- 32:07and this is the tool that we're going to be using
- 32:09sort of as our measurable.
- 32:12Now that's totally easy to do,
- 32:16if you're given a fixed edge flow and a fixed graph
- 32:17because if you have fixed graph,
- 32:18you can build your operators, you choose the motifs,
- 32:20you have fixed edge flow, you just project the edge flow
- 32:23onto the subspaces spanned by those operators,
- 32:25and you're done.
- 32:27However, there are many cases where it's worth thinking
- 32:31about a distribution of edge flows,
- 32:33and then expected structures given that distribution.
- 32:37So here we're going to be considering random edge flows,
- 32:39for example, in edge flow capital F,
- 32:41here I'm using capital letters to denote random quantities
- 32:43sampled from an edge flow distributions.
- 32:45This is a distribution of possible edge flows.
- 32:46And this is worth thinking about
- 32:48because many generative models are stochastic.
- 32:51They may involve some random seed,
- 32:53or they may, for example, like that neural model
- 32:55or a lot of these sort of neural models be chaotic.
- 32:58So even if they are deterministic generative models,
- 33:01the output data behaves
- 33:03as it was sampled from the distribution.
- 33:05On the empirical side, for example,
- 33:07when we're estimating transfer entropy
- 33:09or estimating some information flow,
- 33:11then there's always some degree of measurement error
- 33:13or uncertainty in that estimate,
- 33:15which really means sort of from a Bayesian perspective,
- 33:18we should be thinking that our estimator
- 33:21is a point estimate drawn from some
- 33:23posterior distribution of edge flows,
- 33:24and then we're back in the setting where,
- 33:25again, we need to talk about a distribution.
- 33:28Lastly, this random edge flow setting is also
- 33:31really important if we wanna compare to null hypotheses
- 33:35because often if you want to compare
- 33:37to some sort of null hypothesis,
- 33:38it's helpful to have an ensemble of edge flows
- 33:41to compare against, which means that we would like
- 33:44to be able to talk about expected structure
- 33:46under varying distributional assumptions.
- 33:50If we can talk meaningfully about random edge flows,
- 33:54then really what we can start doing is
- 33:56we can start bridging the expected structure
- 33:59back to the distribution.
- 34:00So what we're looking for is a way of explaining
- 34:03sort of generic expectations
- 34:05of what the structure will look like
- 34:07as we vary this distribution of edge flows.
- 34:10You could think that a particular dynamical system
- 34:13generates a wiring pattern,
- 34:17that generates firing dynamics,
- 34:19those firing dynamics determine
- 34:21some sort of information flow graph.
- 34:23And then that information flow graph
- 34:25is really a sample from that generative model.
- 34:28And we would like to be able to talk about,
- 34:30what would we expect if we knew the distribution
- 34:33of edge flows about the global structure?
- 34:35That is, we'd like to bridge global structure
- 34:37back to this distribution,
- 34:39and then ideally you would bridge that distribution back
- 34:41to the generative mechanism.
- 34:42This is a project for a future work,
- 34:45obviously this is fairly ambitious.
- 34:47However, this first point is something that you can do
- 34:51really in fairly explicit detail.
- 34:53And that's what I'd like to spell out
- 34:54with the end of this talk is
- 34:55how do you bridge global structure
- 34:58back to a distribution of edge flows?
- 35:02So, yeah, so that's the main question,
- 35:05how does the choice of distribution
- 35:06influence the expected global flow structure?
- 35:12So first, we start with the Lemma.
- 35:15Suppose that we have a distribution of edge flows
- 35:17with some expectation F bar and some covariance,
- 35:20here I'm using double bar V to denote covariance.
- 35:24We'll let S contained in the set of,
- 35:26or S be a subspace
- 35:29contained within the vector space of edge flows,
- 35:31and we'll let Ps of S be the orthogonal projector onto S.
- 35:35Then Fs of S, that's the projection F onto this subspace S,
- 35:40the expectation of its norm squared
- 35:43is the norm of the expected flow projected onto S squared.
- 35:48So this is essentially the expectation of the sample
- 35:53is the measure evaluated of the expected sample.
- 35:56And then plus a term that involves an inner product
- 35:58between the projector onto the subspace,
- 36:00and the covariance matrix for the edge flows.
- 36:02Here this denotes the matrix inner product,
- 36:04so this is just the sum overall IJ entries.
- 36:09What's nice about this formula
- 36:10is at least in terms of expectation, it reduces the study
- 36:16of the bridge between distribution
- 36:18and network structure to a study of moments, right?
- 36:22Because we've replaced the distributional problem here
- 36:24with a linear algebra problem
- 36:27that's posed in terms of this projector,
- 36:29the projector under the subspace S,
- 36:31which is determined by the topology of the network,
- 36:33and the variance in that edge flow
- 36:36which is determined by your generative model.
- 36:40Well, you might say, okay, well, (laughs) fine,
- 36:42this is a matrix inner product, we can just stop here,
- 36:44we could compute this projector,
- 36:45we could sample a whole bunch of edge flows,
- 36:47compute this covariance.
- 36:48So you can do this matrix inner product,
- 36:50but I sort of agree
- 36:51because I suspect that you can really do more
- 36:55with this sort of inner product.
- 36:57So I'd like to highlight some challenges
- 37:00associated with this inner product.
- 37:03So first, let's say, I asked you to design a distribution
- 37:06with tunable global structure.
- 37:07So for example, I said, I want you to pick
- 37:09a generative model or design a distribution of edge flows
- 37:12that when I sample edge flows from it,
- 37:14their expected structures matched some expectation.
- 37:18It's not obvious how to do that given this formula,
- 37:22it's not obvious in particular
- 37:23because these projectors,
- 37:24like the projector on the subspace S typically depend
- 37:27in fairly non-trivial ways on the graph topology.
- 37:30So small changes in the graph topology
- 37:32can completely change as projector.
- 37:34In essence, it's hard to isolate topology from distribution.
- 37:37You can think that this inner product,
- 37:39if I think about it in terms of the IJ entries,
- 37:43while easy to compute, it's not easy to interpret
- 37:47because I and J are somewhat arbitrary indexing.
- 37:49And obviously really the topology of the graph,
- 37:51it's not encoded in the indexing,
- 37:53that's encoded in the structure of these matrices.
- 37:56So in some ways what we really need is a better basis
- 37:59for computing this inner product.
- 38:01In addition, computing this inner product
- 38:03just may not be empirically feasible
- 38:05because it might not be feasible
- 38:07to estimate all these covariances.
- 38:08There's lots of settings
- 38:09where if you have a random edge flow,
- 38:11it becomes very expensive to try to estimate
- 38:13all the covariances in this graph,
- 38:14err, sorry, in this matrix
- 38:16because this matrix has as many entries
- 38:19as there are pairs of edges in the graph.
- 38:22And typically that number of edges grows fairly quickly
- 38:26in the number of nodes of the graph.
- 38:27So in the worst case,
- 38:29the size of these matrices
- 38:31goes not to the square of the number of nodes of the graph,
- 38:33but the number of nodes of the graph to the fourth,
- 38:35so this becomes very expensive very fast.
- 38:37Again, we could try to address this problem
- 38:41if we had a better basis for performing this inner product
- 38:43because we might hope to be able to truncate
- 38:46somewhere in that basis,
- 38:47and use a lower dimensional representation.
- 38:50So to build there, I'm gonna show you
- 38:52a particular family of covariances.
- 38:55We're going to start with a very simple generative model,
- 38:58so let's suppose that each node of the graph
- 39:00is assigned some set of attributes,
- 39:02here a random vector X sampled from a...
- 39:04So you can think of trait space,
- 39:05a space of possible attributes,
- 39:07and these are sampled i.i.d.
- 39:09In addition, we'll assume
- 39:10that there exists an alternating function F,
- 39:13which accepts pairs of attributes and returns a real number.
- 39:17So this is something that I can evaluate
- 39:19on the endpoints of an edge,
- 39:21and return an edge flow value.
- 39:24In this setting,
- 39:26everything that I'd shown you before simplifies.
- 39:29So if my edge flow F is drawn by first sampling
- 39:33a set of attributes,
- 39:34and then plugging those attributes
- 39:35into functions on the edges, then the
- 39:40mean edge flow is zero, so that F bar goes away,
- 39:44and the covariance reduces to this form.
- 39:46So you have a standard form where the covariance
- 39:48in the edge flow is a function of two scalar quantities,
- 39:52that's sigma squared in row.
- 39:53These are both statistics associated with this function
- 39:56and the distribution of traits.
- 39:59And then some matrices,
- 40:00so we have an identity matrix,
- 40:02and we have this gradient matrix showing up again.
- 40:05This is really nice because when you plug it back in
- 40:07to try to compute say the expected sizes of the components,
- 40:13this matrix inner product
- 40:15that I was complaining about before,
- 40:17this whole matrix inner product simplifies.
- 40:19So when you have a variance
- 40:21that's in this nice, simple canonical form,
- 40:23then the expected overall size of the edge flow,
- 40:26that's just sigma squared,
- 40:27the expected size projected onto that
- 40:30sort of conservative subspace
- 40:32that breaks into this combination
- 40:35of the sigma squared in the row.
- 40:37Again, those are some simple statistics.
- 40:39And then V, E, L, and E,
- 40:41those are just sort of
- 40:42essentially dimension counting on the network.
- 40:43So this is the number of vertices, the number of edges,
- 40:47and the number of loops,
- 40:48the number of loops that's the number of edges
- 40:49minus the number of vertices plus one.
- 40:52And similarly, the expected cyclic size
- 40:55or size of the cyclic component reduces to,
- 40:57again, this sort of scalar factor
- 40:59in terms of some simple statistics
- 41:01and some dimension counting sort of
- 41:03topology related quantities.
- 41:07So this is very nice because this allows us
- 41:11to really separate the role of topology
- 41:13from the role of the generative model.
- 41:14The generative model determines sigma in row,
- 41:17and topology determines these dimensions.
- 41:22It turns out that the same thing is true,
- 41:26even if you don't sample the edge flow
- 41:29using this sort of trait approach,
- 41:31but the graph is complete.
- 41:33So if your graph is complete,
- 41:34then no matter how you sample your edge flow,
- 41:37for any edge flow distribution,
- 41:38exactly the same formulas hold,
- 41:40you just replace those simple statistics
- 41:43with estimators for those statistics given your sample flow.
- 41:47And this is sort of a striking result
- 41:49because this says that this conclusion
- 41:51that was linked to some specific generative model
- 41:54with some very sort of specific assumptions, right?
- 41:56We assumed it was i.i.d. extends to all complete graphs,
- 41:59regardless of the actual distribution that we sampled from.
- 42:05Up until this point,
- 42:06this is kind of just an algebra miracle.
- 42:09And one of the things I'd like to do
- 42:10at the end of this talk is explain why this is true,
- 42:13and show how to generalize these results.
- 42:16So to build there,
- 42:17let's emphasize some of the advantages of this.
- 42:19So first, the advantages of this model,
- 42:22it's mechanistically plausible in certain settings,
- 42:24it cleanly separated the role of topology and distribution.
- 42:28And these coefficients that had to do with the topology,
- 42:30these are just dimensions,
- 42:31these are non-negative quantities.
- 42:34So it's easy to work out monotonic relationships
- 42:36between expected structure
- 42:38and simple statistics of the edge flow distribution.
- 42:44The fact that you can do that enables more general analysis.
- 42:47So what I'm showing you on the right here,
- 42:48this is from a different application area.
- 42:51This was an experiment where we trained
- 42:53a set of agents to play a game using a genetic algorithm,
- 42:58and then we looked at the expected sizes of sort of cyclic
- 43:01and acyclic components in a tournament among those agents.
- 43:05And you could actually predict these curves
- 43:08using this sort of type of structural analysis
- 43:10because it was possible to predict the dynamics
- 43:13of the simple statistics, this sigma in this row.
- 43:17So this is a really powerful analytical tool,
- 43:20but it is limited to this particular model.
- 43:23In particular, it only models unstructured cycles.
- 43:26So if you look at the cyclic component
- 43:27generated by this model, it just looks like random noise
- 43:30that's been projected onto the range of the curl transpose.
- 43:34It's limited to correlations on adjacent edges,
- 43:36so we only generate correlations on edges
- 43:38that share an endpoint
- 43:39because you could think that all of the original
- 43:41random information comes from the endpoints.
- 43:45And then it's in some ways not general enough,
- 43:47so it lacks some expressivity.
- 43:48We can't parametize all possible expected structures
- 43:51by picking a sigma in a row.
- 43:54And we lack some notion of sufficiency,
- 43:56i.e. if the graph is not complete,
- 43:58then this nice algebraic property
- 44:01that it actually didn't matter what the distribution was,
- 44:03this fails to hold.
- 44:04So if the graph is not complete,
- 44:06then projection onto the family of covariances
- 44:09parameterized in this fashion
- 44:11changes the expected global structure.
- 44:15So we would like to address these limitations.
- 44:17And so our goal for the next part of this talk
- 44:19is to really generalize these results.
- 44:21To generalize, we're going to
- 44:23switch our perspective a little bit.
- 44:25So I'll recall this formula
- 44:27that if we generate our edge flow
- 44:30by sampling quantities on the endpoints,
- 44:32and then plugging them into functions on the edges,
- 44:34then you necessarily get a covariance
- 44:35that's in this two parameter family
- 44:37where I have two scalar quantities
- 44:39associated with the statistics of the edge flow.
- 44:41That's the sigma in this row.
- 44:42And then I have some matrices that are associated
- 44:44with the topology of the network
- 44:45in the subspaces I'm projecting onto.
- 44:48These are related to a different way
- 44:51of looking at the graph.
- 44:52So I can start with my original graph
- 44:54and then I can convert it to an edge graph
- 44:57where I have one node per edge in the graph,
- 45:00and nodes are connected if they share an endpoint.
- 45:04You can then assign essentially signs to these edges
- 45:07based on whether the edge direction chosen
- 45:11in the original graph is consistent
- 45:12or inconsistent at the node that links to edges.
- 45:16So for example, edges one and two both point into this node,
- 45:20so there's an edge that's linking
- 45:21one and two in the edge graph with a positive sum.
- 45:25This essentially tells you that the influence of
- 45:29random information assigned on this node linking one and two
- 45:33would positively correlate the sample edge flow
- 45:36on edges one and two.
- 45:38Then this form, what this form
- 45:41sort of for covariance matrices says,
- 45:43is that we're looking at families of edge flows
- 45:46that have correlations on edges sharing an endpoint.
- 45:49So edges at distance one in this edge graph,
- 45:51and non-adjacent edges are
- 45:52entirely independent of each other.
- 45:56Okay?
- 45:58So that's essentially what
- 45:59the trait performance model is doing,
- 46:01is it's parameterizing a family of covariance matrices
- 46:04where we're modeling correlations at distance one,
- 46:06but not further in the edge graph.
- 46:08So then the natural thought
- 46:09for how to generalize these results is to ask,
- 46:11can we model longer distance correlations
- 46:13through this graph?
- 46:15To do so, let's think a little bit about
- 46:17what this matrix that's showing up inside the covariance is.
- 46:21So we have a gradient, tons of gradient transpose.
- 46:24This is an effect of Laplacian for that edge graph.
- 46:30And you can do this for other motifs.
- 46:32If you think about different sort of motif constructions,
- 46:35essentially if you take a product of M transpose times M,
- 46:38that will generate something that looks like a Laplacian
- 46:41or an adjacency matrix for a graph
- 46:44where I'm assigning nodes to be motifs
- 46:47and looking at the overlap of motifs.
- 46:50And if I look at M times M transpose,
- 46:52and I'm looking at the overlap of edges via shared motifs.
- 46:55So these operators you can think
- 46:56about as being Laplacians for some sort of graph
- 46:59that's generated from the original graph motifs.
- 47:04Like any adjacency matrix,
- 47:06powers of something like GG transpose minus 2I,
- 47:11that will model connections along longer paths
- 47:14along longer distances in these graphs
- 47:16associated with motifs,
- 47:17in this case with the edge graph.
- 47:20So our thought is maybe,
- 47:21well, we could extend this trait performance family
- 47:23of covariance matrices by instead of only looking at
- 47:27a linear combination of an identity matrix, and this matrix,
- 47:31we could look at a power series.
- 47:32So we could consider combining powers of this matrix.
- 47:37And this will generate this family of matrices
- 47:39that are parameterized by some set of coefficients-
- 47:41<v Robert>Dr. Strang?</v>
- 47:43<v ->Ah, yes?</v> <v ->I apologize (mumbles)</v>
- 47:44I just wanna remind you
- 47:46that we have a rather tight time limit,
- 47:48approximately a couple of minutes.
- 47:50<v ->Yes, of course.</v>
- 47:52So here, the idea is to parametize this family of matrices
- 47:57by introducing a set of polynomials with coefficients alpha,
- 48:00and then plugging into the polynomial,
- 48:03the Laplacian that's generated by sort of the,
- 48:06or the adjacent matrix
- 48:08generated by the graph motifs we're interested in.
- 48:11And that trait performance result,
- 48:12that was really just looking at the first order case here,
- 48:14that was looking at a linear polynomial
- 48:17with these chosen coefficients.
- 48:20This power series model is really nice analytically,
- 48:24so if we start with some graph operator M,
- 48:28and we consider the family of covariance matrices
- 48:31generated by plugging M, M transpose
- 48:34into some polynomial and power series,
- 48:36then this family of matrices is contained
- 48:39within the span of powers of M, M transpose.
- 48:45You can talk about this family
- 48:47sort of in terms of combinatorics.
- 48:48So for example, if we use that gradient
- 48:50times gradient transpose minus twice the identity,
- 48:52then powers of this is essentially, again, paths counting.
- 48:55So this is counting paths of length N.
- 48:58You can also look at things like the trace of these powers.
- 49:00So if you look at the trace series,
- 49:02that's the sequence where you look at the trace
- 49:04of powers of these,
- 49:06essentially these adjacency matrices.
- 49:09This is doing some sort of loop count
- 49:11where we're counting loops of different length.
- 49:14And you could think that this trace series
- 49:15in some sense is controlling amplification
- 49:17of self-correlations within the sampled edge flow.
- 49:22Depending on the generative model,
- 49:23we might wanna use different operators
- 49:25for generating this family.
- 49:26So, for example, going back to that
- 49:28synaptic plasticity model with coupled oscillators,
- 49:31in this case using the gradient to generate
- 49:34the family of covariance matrices.
- 49:35It's not really the right structure
- 49:37because the dynamics of the model
- 49:39sort of have these natural cyclic connections.
- 49:43So it's better to build the power series using the curl.
- 49:46So depending on your model,
- 49:47you can adapt this power series family
- 49:49by plugging in a different graph operator.
- 49:53Let's see now, what happens if we try to compute
- 49:55the expected sizes of some components
- 49:58using a power series of this form?
- 50:00So if the variance or covariance matrix
- 50:04for our edge flow is a power series in,
- 50:06for example, the gradient, gradient transpose,
- 50:08then the expected sizes of the measures
- 50:12can all be expressed as
- 50:13linear combinations of this trace series
- 50:16and the coefficients of the original polynomial.
- 50:19For example, the expected cyclic size of the flow
- 50:21is just the polynomial evaluated at negative two
- 50:24multiplied by the number of the loops in the graph.
- 50:26And this really generalizes that trait performance result
- 50:29because the trait performance result is given
- 50:31by restricting these polynomials to be linear.
- 50:34Okay?
- 50:36This you can extend sort of to other bases,
- 50:41but really what this accomplishes is
- 50:43by generalizing trait performance,
- 50:45we achieve this sort of generic properties
- 50:50that it failed to have.
- 50:52So in particular, if I have an edge flow subspace S
- 50:56spanned by a set of flow motifs stored in some operator M,
- 50:59then this power series family of covariance
- 51:01is associated with the Laplacian,
- 51:03that is M times M transpose is both expressive
- 51:07in the sense that for any non-negative A and B,
- 51:11I can pick some alpha and beta,
- 51:13so that the expected size of the projection of F
- 51:16onto the subspace is A,
- 51:18and the projected size of F
- 51:19onto the subspace orthogonal to S is B
- 51:23for any covariance in this power series family.
- 51:27And it's sufficient in the sense
- 51:29that for any edge flow distribution
- 51:31with mean zero in covariance V.
- 51:35If C is the matrix nearest to V in Frobenius norm
- 51:38restricted to the power series family,
- 51:40then these inner products computed in terms of C
- 51:44are exactly the same as the inner products
- 51:46computed in terms of V,
- 51:47so they directly predict the structure,
- 51:49which means that if I use this power series family,
- 51:51discrepancies off of this family
- 51:54don't change the expected structure.
- 51:57Okay?
- 51:57So I know I'm short on time here,
- 51:59so I'd like to skip then just to the end of this talk.
- 52:03There's further things you can do with this,
- 52:04this is sort of really nice.
- 52:06Mathematically you can build an approximation theory
- 52:08out of this and study for different random graph families,
- 52:12how many terms in these power series you need?
- 52:15And those terms define some nice,
- 52:17sort of simple minimal set of statistics
- 52:19to try to sort of estimate structure,
- 52:22but I'd like to really just get to the end here
- 52:25and emphasize the takeaways from this talk.
- 52:28So the first half of this talk
- 52:30was focused on information flow.
- 52:32What we saw is that information flow is a non-trivial,
- 52:35but well studied, estimation problem.
- 52:37And this is something that at least on my side
- 52:38sort of is a work in progress with students.
- 52:41Here in some ways, the conclusion of that first half
- 52:43would be that causation entropy
- 52:45may be a more appropriate measure than TE
- 52:47when trying to build these flow graphs
- 52:49to apply these structural measures to.
- 52:51Then on the structural side,
- 52:53we can say that power series family,
- 52:55this is a nice family of covariance matrices.
- 52:57It has nice properties that are useful empirically
- 52:59because they let us build global correlation structures
- 53:02from a sequence of local correlations
- 53:03from that power series.
- 53:06If you plug this back into the expected measures,
- 53:08you can recover monotonic relations,
- 53:10like in that limited trait performance case.
- 53:12And truncation of these power series
- 53:14reduces the number of quantities
- 53:16that you would actually need to measure.
- 53:19Actually to a number of quantities
- 53:20that can be quite small relative to the graph,
- 53:22and that's where this approximation theory comes in.
- 53:25One way, sort of maybe to summarize this entire approach
- 53:28is what we've done is by looking at these power series
- 53:31built in terms of the graph operators
- 53:33is it provides a way to study
- 53:35inherently heterogeneous connections,
- 53:38or covariances, or edge flow distributions
- 53:41using a homogeneous correlation model
- 53:43that's built sort of at multiple scales
- 53:45by starting the local scale, and then looking at powers.
- 53:49In some ways this is a comment
- 53:50that I ended a previous version of this talk with.
- 53:53I still think that this structural analysis is in some ways
- 53:56a hammer seeking a nail,
- 53:57and that this inflammation flow construction,
- 53:59this is work in progress to try to build that nail.
- 54:02So thank you all for your attention,
- 54:04I'll turn it now over to questions.
- 54:09<v Robert>(mumbles) really appreciate it.</v>
- 54:14Unfortunately, for those of you on Zoom,
- 54:16you're welcome to keep up the conversation,
- 54:17so (mumbles) unfortunately have to clear the room.
- 54:20So I do apologize (mumbles)
- 54:25Dr. Steinman?
- 54:27It might be interesting, yeah. (laughs)
- 54:28(students laugh)
- 54:30Dr. Strang? <v ->Oh, yes, yeah.</v>
- 54:33<v Robert>Okay, do you mind if people...?</v>
- 54:35Yeah, we have to clear the room,
- 54:36do you mind if people email you if they have questions?
- 54:40<v ->I'm sorry, I couldn't hear the end of the question.</v>
- 54:42Do I mind if...?
- 54:45<v Robert>We have to clear the room,</v>
- 54:47do you mind if people email you if they have questions?
- 54:49<v ->No, not at all.</v>
- 54:50<v Robert>(mumbles) may continue the conversation,</v>
- 54:52so I do apologize, they are literally
- 54:54just stepping in the room right now.
- 54:57<v ->Okay, no, yeah, that's totally fine.</v>
- 54:59<v Robert>Thank you, thank you.</v>
- 55:01And thanks again for a wonderful talk.
- 55:03<v ->Thank you.</v>