# YSPH Biostatistics Seminar: “Robust Mendelian Randomization in the Presence of Many Weak Instruments and Widespread Horizontal Pleiotropy”

September 21, 2022## Information

Speaker: Ting Ye, PhD, Assistant Professor, Department of Biostatistics, University of Washington

September 20, 2022

ID8096

To CiteDCA Citation Guide

- 00:00<v Instructor>Good afternoon.</v>
- 00:01In respect for everybody's time today,
- 00:04let's go ahead and get started.
- 00:07So today, it is my pleasure to introduce
- 00:09Dr. Alexander Strang.
- 00:12Dr. Strang earned his bachelor's in mathematics, in physics,
- 00:16as well as his PhD in applied mathematics
- 00:19from Case Western Reserve University in Cleveland, Ohio.
- 00:24Born in Ohio, so representing.
- 00:27He studies variational inference problems,
- 00:29noise propagation in biological networks,
- 00:32self organizing edge flows,
- 00:34and functional form game theory
- 00:36at the University of Chicago,
- 00:38where he is a William H. Kruskal Instructor
- 00:40of physics and applied mathematics.
- 00:43Today, he's going to talk to us about motivic expansion
- 00:47of global information flow in spike train data.
- 00:50Let's welcome our speaker.
- 00:54<v ->Okay, thank you very much.</v>
- 00:56Thank you, first, for the kind invite,
- 00:59and for the opportunity to speak here in your seminar.
- 01:03So, I'd like to start with some acknowledgements.
- 01:06This is very much work in progress.
- 01:09Part of what I'm going to be showing you today
- 01:11is really the work of a Master's student
- 01:12that I've been working with this summer, that's Bowen,
- 01:15and really, I'd like to thank Bowen
- 01:16for a lot of the simulation,
- 01:18and a lot of the TE calculation I'll show you later.
- 01:21This project, more generally, was born out of conversations
- 01:23with Brent Doiron and Lek-Heng Lim here at Chicago.
- 01:28Brent really was the inspiration
- 01:29for starting to venture into computational neuroscience.
- 01:33I really say that I am new to this world,
- 01:35this world is exciting to me, but really it's a world
- 01:38that I am actively exploring and learning about.
- 01:42So I look forward to conversations afterwards
- 01:44to learn more here.
- 01:46My background was much more inspired
- 01:48by Lek-Heng's work in computational technology,
- 01:52and some of what I'll be presenting today
- 01:54is really inspired by conversations with him.
- 01:58So, let's start with some introduction and motivation.
- 02:01The motivation, personally, for this talk.
- 02:05So it goes back, really, to work that I started
- 02:06when I was a graduate student.
- 02:08I've had this long standing interest in the interplay
- 02:11between structure and dynamics, in particular in networks.
- 02:14I've been interested in questions like
- 02:16how does the structure of a network determine
- 02:17dynamics of processes on that network.
- 02:21And, in turn, how do processes on that network
- 02:24give rise to structure?
- 02:26On the biological side...
- 02:30On the biological side, in today's talk,
- 02:32I'm going to be focusing on applications of this question
- 02:36within neural networks.
- 02:38And I think that this world of computational neuroscience
- 02:40is really exciting if you're interested in this interplay
- 02:42between structure and dynamics,
- 02:44because neural networks encode, transmit, and process
- 02:47information via dynamical processes.
- 02:49For example, the process, the dynamical process
- 02:53of a neural network is directed by the wiring patterns,
- 02:56by the structure of that network, and moreover,
- 02:58if you're talking about some sort of learning process,
- 03:01then those wiring patterns can change and adapt
- 03:04during the learning process, so they are themselves dynamic.
- 03:08In this area, I've been interested in questions like,
- 03:10how is the flow of information governed
- 03:12by the wiring pattern,
- 03:14how do patterns of information flow present themselves
- 03:17in data, and can they be inferred from that data,
- 03:19and what types of wiring patterns
- 03:21might develop during learning.
- 03:24Answering questions of this type
- 03:25requires a couple of things.
- 03:26So the very, very big picture requires a language
- 03:29for describing structures and patterns,
- 03:31it requires having a dynamical process,
- 03:33some sort of model for the neural net,
- 03:35and it requires a generating model
- 03:38that generates initial structure,
- 03:40and links structure to dynamics.
- 03:42Alternatively, if we don't generate things using a model,
- 03:45if we have some sort of observable or data,
- 03:47then we can try to work in the other direction
- 03:49and go from dynamics to structure.
- 03:52Today, during this talk, I'm going to be focusing really
- 03:54on this first piece,
- 03:55on a language for describing structures and patterns,
- 03:57and on the second piece,
- 03:59on an observable that I've been working on
- 04:01trying to compute to use to try to connect
- 04:05these three points together.
- 04:08So, to get started, a little bit of biology.
- 04:10Really, I was inspired in this project by a paper
- 04:13from Keiji Miura.
- 04:14He was looking at a coupled oscillator model,
- 04:17this was a Kuramoto model for neural activity
- 04:20where the firing dynamics interact with the wiring.
- 04:22So the wiring that couples the oscillators
- 04:26would adapt on a slower timescale
- 04:29than the oscillators themselves,
- 04:31and that adaptation could represent
- 04:34different types of learning processes.
- 04:36For example, the fire-together wire-together rules,
- 04:40so Hebbian learning,
- 04:41you could look at causal learning rules,
- 04:43or anti-Hebbian learning rules.
- 04:45This is just an example of one, of the system.
- 04:48This system of (indistinct) is sort of interesting
- 04:50because it can generate all sorts of different patterns.
- 04:52You can see synchronized firing,
- 04:54you can see traveling waves,
- 04:55you can see chaos.
- 04:57These occur at different critical boundaries.
- 04:59So you can see phase transmissions
- 05:01when you have large collections of these oscillators.
- 05:04And depending on how they're coupled together,
- 05:05it behaves differently.
- 05:07In particular, what's interesting here is that
- 05:10starting from some random seed topology,
- 05:13the dynamics play forward from that initial condition,
- 05:16and that random seed topology
- 05:18produces an ensemble of wiring patterns
- 05:20that are themselves random.
- 05:22And we can think of that ensemble of wiring patterns
- 05:24as being chaotic realizations of some random initialization.
- 05:29That said, you can also observe structures
- 05:32within the systems of coupled oscillators.
- 05:33So you can see large scale cyclic structures
- 05:36representing organized causal firing patterns
- 05:38in certain regimes.
- 05:40So this is a nice example
- 05:41where graph structure and learning parameters
- 05:43can determine dynamics, and in turn,
- 05:44where those dynamics can determine structure.
- 05:48On the other side, you can also think about
- 05:49a data-driven side instead of a model-driven side.
- 05:53If we empirically observe sample trajectories
- 05:56of some observables, for example, neuron recordings,
- 05:58then we might hope to infer something
- 05:59about the connectivity that generates them.
- 06:01And so here, instead of starting by posing a model,
- 06:04and then simulating it and setting up how it behaves,
- 06:06we can instead study data,
- 06:07or try to study structure in data.
- 06:10Often, that data comes in the form of covariance matrices
- 06:12representing firing rates.
- 06:14And these covariance matrices
- 06:15may be auto covariance matrices with some sort of time-lag.
- 06:19Here, there are a couple of standard structural approaches.
- 06:22So there's a motivic expansion approach.
- 06:25This was at least introduced by Brent Doiron's lab,
- 06:28with his student, Gabe Ocker.
- 06:30Here, the idea is that you define some graph motifs,
- 06:34and then you can study the dynamics
- 06:36in terms of those graph motifs.
- 06:38For example, if you build a power series in those motifs,
- 06:41then you can try to represent your covariance matrices
- 06:44in terms of that power series.
- 06:45And this is something we're gonna talk about
- 06:46quite a bit today.
- 06:47This really, part of why I was inspired by this work is
- 06:49I had been working separately
- 06:51on the idea of looking at covariance matrices
- 06:53in terms of these power series expansions.
- 06:56This is also connected to topological data analysis,
- 06:59and this is where the conversations with Lek-Heng
- 07:01played a role in this work.
- 07:03Topological data analysis aims to construct graphs
- 07:07representing dynamical systems.
- 07:08For example, you might look at the dynamical similarity
- 07:11of firing patterns of certain neurons,
- 07:13and then try to study the topology of those graphs.
- 07:18Again, this leads to similar questions,
- 07:20but we could be a little bit more precise here
- 07:21for thinking in neuroscience.
- 07:24We can say more precisely, for example,
- 07:25how is information processing and transfer represented,
- 07:29both in these covariance matrices
- 07:31and the structures that we hope to extract from them?
- 07:33In particular, can we try and infer causality
- 07:36from firing patterns?
- 07:39And this is fundamentally an information theoretic question.
- 07:42And really, we're asking,
- 07:43can we study the directed exchange of information
- 07:45from trajectories?
- 07:47Here, one approach, I mean, in some sense,
- 07:49you can never tell causality without some underlying model,
- 07:53without some underlying understanding and mechanism,
- 07:56so if all we can do is observe,
- 07:58then we need to define what we mean by causality.
- 08:01A reasonable standard definition here is Wiener causality,
- 08:04which says that two time series share a causal relation,
- 08:06so we say x causes y,
- 08:08if the history of x informs the future of y.
- 08:12And note that here, cause, I put in quotes,
- 08:14really means forecasts.
- 08:16It means that the past, or the present of x,
- 08:18carries relevant information about the future of y.
- 08:22A natural measure of that information is transfer entropy.
- 08:26Transfer entropy was introduced by Schrieber in 2000,
- 08:30and is the expected KL divergence
- 08:32between the distribution of the future of y
- 08:35given the history of x,
- 08:38and the marginal distribution of the future of y.
- 08:41So essentially, it's how much predictive information
- 08:43does x carry about y.
- 08:46This is a nice quantity for a couple of reasons.
- 08:48First, it's zero when two trajectories are independent.
- 08:51Second, since it's just defining
- 08:53some of these conditional distributions, it's model free,
- 08:56so I put here no with a star,
- 08:58because generative assumptions actually do matter
- 09:01when you go to try and compute it, but in principle,
- 09:02it's defined independent of the model.
- 09:05Again, unlike some other effective causality measures,
- 09:07it doesn't require introducing some time-lag to define.
- 09:11It's a naturally directed quantity.
- 09:13We can say that the future of y
- 09:15conditioned on the past of x...
- 09:17That transfer entropy is defined in terms of the future of y
- 09:20conditioned on the past of x and y.
- 09:23And that quantity is directed, because reversing x and y
- 09:27does not symmetrically change the statement.
- 09:30This is different than quantities like mutual information
- 09:32or correlation, that are also often used
- 09:34to try and measure effective connectivity in networks
- 09:37which are fundamentally symmetric quantities.
- 09:41Transfer entropy is also less corrupted
- 09:43by measurement noise, linear mixing of signals,
- 09:46or shared coupling to external sources.
- 09:50Lastly, and maybe most interestingly,
- 09:52if we think in terms of correlations,
- 09:54correlations are really moments,
- 09:56correlations are really about covariances, right,
- 09:57second order moments.
- 09:59Transfer entropies, these are about entropies,
- 10:01these are logs of distributions,
- 10:04and so they depend on the full shape of these distributions.
- 10:06Which means that transfer entropy can capture coupling
- 10:10that is maybe not apparent, or not obvious
- 10:13just looking at second order moment type analysis.
- 10:17So transfer entropy has been applied pretty broadly.
- 10:20It's been applied to spiking cortical networks
- 10:22and calcium imaging,
- 10:24to MEG data in motor tasks and auditory discrimination,
- 10:29it's been applied to emotion recognition,
- 10:31precious metal prices
- 10:32and multivariate time series forecasting,
- 10:34and more recently, to accelerate learning
- 10:36in different artificial neural nets.
- 10:38So you can look at feedforward architectures,
- 10:40convolutional architectures, even recurrent neural nets.
- 10:42And transfer entropy has been used to accelerate learning
- 10:45in those frameworks.
- 10:49For this part of the talk,
- 10:50I'd like to focus really on two questions.
- 10:52First, how do we compute transfer entropy,
- 10:55and then second, if we could compute transfer entropy,
- 10:58and build a graph out of that,
- 11:00how would we study the structure of that graph?
- 11:01Essentially, how is information flow structured?
- 11:05We'll start with computing transfer entropy.
- 11:09To compute transfer entropy,
- 11:10we actually need to write down an equation.
- 11:13So transfer entropy was originally introduced
- 11:14for discrete time arbitrary order Markov processes.
- 11:18So suppose we have two Markov processes X and Y.
- 11:20And we'll let Xn denote the state of process X at time n,
- 11:25and Xnk, where the k is in superscript,
- 11:27to denote the sequence starting from n minus k plus 1
- 11:31going up to n.
- 11:32So that's the last k states that the process X visited.
- 11:37Then, the transfer entropy from Y to X,
- 11:40they're denoted T, Y over to X,
- 11:44is the entropy of the future of X, conditioned on its past,
- 11:50minus the entropy of the future of X conditioned on its past
- 11:54and the past of the trajectory Y.
- 11:56So here, you can think the transfer entropy
- 11:57is essentially the reduction in entropy
- 11:59of the future states of X when incorporating the past of Y.
- 12:03This means that computing transfer entropy
- 12:05reduces to estimating essentially these entropies.
- 12:07That means we need to estimate essentially
- 12:09the conditional distributions inside of these parentheses.
- 12:14That's easy for certain processes, so for example,
- 12:16if X and Y are Gaussian processes,
- 12:19then really what we're trying to compute
- 12:20is conditional mutual information,
- 12:22and there are nice equations
- 12:23for conditional mutual information
- 12:25when you have Gaussian random variables.
- 12:26So if I have three Gaussian random variables, X, Y, Z,
- 12:29possibly multivariate, with joint covariant sigma,
- 12:33then the conditional mutual information
- 12:35between these variables, so the mutual information
- 12:37between X and Y conditioned on Z,
- 12:39is just given by this ratio of log determinants
- 12:42of those convariances.
- 12:45In particular, a common test model used
- 12:48in the transfer entropy literature
- 12:51are linear auto-regressive processes,
- 12:53because a linear auto-regressive process,
- 12:55when perturbed by Gaussian noise,
- 12:57produces a Gaussian process.
- 12:58All of the different joint marginal
- 13:00conditional distributions are all Gaussian,
- 13:02which means that we can compute
- 13:03these covariances analytically, which then means
- 13:06that you can compute the transfer entropy analytically.
- 13:07So these linear auto-regressive processes
- 13:09are nice test cases,
- 13:10'cause you can do everything analytically.
- 13:12They're also somewhat disappointing, or somewhat limiting,
- 13:15because in this linear auto-regressive case,
- 13:17transfer entropy is the same as Granger causality.
- 13:22And in this Gaussian case, essentially what we've done
- 13:25is we've reduced transfer entropy
- 13:27to a study of time-lagged correlations.
- 13:29So this becomes the same as a correlation based analysis.
- 13:32We can't incorporate information beyond the second moments
- 13:34if we restrict ourselves to Gaussian processes,
- 13:36or Gaussian approximations.
- 13:39The other thing to note is this is strongly model dependent,
- 13:41because this particular formula
- 13:43for computing mutual information
- 13:44depends on having Gaussian distributions.
- 13:50In a more general setting, or a more empirical setting,
- 13:53you might observe some data.
- 13:55You don't know if that data
- 13:56comes from some particular process,
- 13:58and you can't necessarily assume
- 13:59the conditional distribution is Gaussian.
- 14:01But we would still like to estimate transfer entropy,
- 14:03which leads to the problem of estimating transfer entropy
- 14:06given an observed time series.
- 14:08We would like to do this, again, sans model assumptions,
- 14:11so we don't want to assume Gaussianity.
- 14:13This is sort of trivial, again, I star that,
- 14:16in discrete state spaces,
- 14:17because essentially it amounts to counting occurrences.
- 14:20But it becomes difficult whenever the state spaces are large
- 14:23and/or high dimensional, as they often are.
- 14:26This leads to a couple of different approaches.
- 14:28So, as a first example, let's consider spike train data.
- 14:32So spike train data consists, essentially,
- 14:34of binning the state of a neuron into either on or off.
- 14:39So neurons, you can think either in a state zero or one.
- 14:41And then a pairwise calculation for transfer entropy
- 14:44only requires estimating a joint probability distribution
- 14:48over 4 to the k plus l states, where k plus l,
- 14:51k is the history of x that we remember,
- 14:54and l is the history of y.
- 14:56So if the Markov process generating the spike train data
- 15:01is not of high order, does not have a long memory,
- 15:04then these k and l can be small,
- 15:06and this state space is fairly small,
- 15:08so this falls into that first category,
- 15:10when we're looking at a discrete state space
- 15:12and it's not too difficult.
- 15:15In a more general setting, if we don't try to bin the states
- 15:18of the neurons to on or off,
- 15:19for example, maybe we're looking at a firing rate model,
- 15:22where we want to look at the firing rates of the neurons,
- 15:24and that's a continuous random variable,
- 15:27then we need some other types of estimators.
- 15:29So the common estimator used here
- 15:31is a kernel density estimator, or KSG estimator.
- 15:34And this is designed for large, continuous,
- 15:36or high dimensional state spaces,
- 15:37e.g., these firing rate models.
- 15:40Typically, the approach is to employ a Takens delay map,
- 15:43which embeds your high dimensional data
- 15:45in some sort of lower dimensional space,
- 15:48that tries to capture the intrinsic dimension
- 15:50of the attractor that your dynamic process settles onto.
- 15:55And then you try to estimate an unknown density
- 15:57based on this delay map using a k-nearest neighbor
- 16:00kernel density estimate.
- 16:01The advantage of this sort of k-nearest neighbor
- 16:04kernel density is it dynamically adapts
- 16:06the width of the kernel given your sample density.
- 16:09And this has been implemented in some open source toolkits.
- 16:11These are the toolkits that we've been working with.
- 16:15So we've tested this on a couple of different models.
- 16:18And really, I'd say this work,
- 16:19this is still very much work in progress,
- 16:20this is work that Bowen was developing over the summer.
- 16:23And so we developed a couple of different models to test.
- 16:26The first were these linear auto-regressive networks,
- 16:29and we just used these
- 16:30to test the accuracy of the estimators,
- 16:32because everything here is Gaussian, so you can compute
- 16:34the necessary transfer entropies analytically.
- 16:37The next, more interesting class of networks
- 16:39are threshold linear networks, or TLNs.
- 16:42These are a firing rate model, where your rate, r,
- 16:44obeys this stochastic differential equation.
- 16:47So the rate of change in the rate, dr(t), is...
- 16:51So you have sort of a leaf term, -r(t), and then plus,
- 16:55here, this is essentially a coupling,
- 16:57all of this is inside here, the brackets with a plus,
- 17:00this is like a (indistinct) function,
- 17:02so this is just taking the positive part
- 17:04of what's on the inside.
- 17:05Here, b is an activation threshold,
- 17:08W is a wiring matrix, and then r are those rates again.
- 17:11And then C here, that's essentially covariants
- 17:13for some noise term perturbing this process.
- 17:17We use these TLNs to test the sensitivity
- 17:19of our transfer entropy estimators
- 17:21to common and private noise sources as you change C,
- 17:24as well as how well the transfer entropy network agrees
- 17:27with the wiring matrix.
- 17:31A particular class of TLNs that were really nice
- 17:33for these experiments are called
- 17:35combinatorial threshold linear networks.
- 17:37These are really pretty new,
- 17:38these were introduced by Carina Curto's lab this year.
- 17:42And really, this was inspired by a talk I'd seen her give
- 17:47at FACM in May.
- 17:49These are threshold linear networks
- 17:51where the weight matrix here, W,
- 17:52representing the wiring of the neurons,
- 17:55is determined by a directed graph G.
- 17:58So you start with some directed graph G,
- 18:00that's what's shown here on the left.
- 18:01This figure is adapted from Carina's paper,
- 18:03this is a very nice paper
- 18:04if you'd like to take a look at it.
- 18:07And if i and j are not connected,
- 18:10then the weight matrix is assigned one value,
- 18:12and if they are connected, then it's assigned another value.
- 18:14And the wiring is zero if i equals j.
- 18:18These networks are nice if we want to test
- 18:20structural hypotheses, because it's very easy to predict
- 18:24from the input graph how the output dynamics
- 18:27of the network should behave.
- 18:28They're a really beautiful analysis
- 18:30that Carina does in this paper to show
- 18:32that you can produce all these different
- 18:33interlocking patterns of limit cycles,
- 18:35and multi-step states, and chaos,
- 18:37and all these nice patterns,
- 18:38and you can design them
- 18:39by picking these nice directed graphs.
- 18:44The last class of networks that we've built to test
- 18:46are leaky-integrate and fire networks.
- 18:48So here, we're using a leaky integrate and fire model,
- 18:51where our wiring matrix W is drawn randomly,
- 18:54it's block-stochastic,
- 18:57which means that it's Erdos-Renyi between blocks.
- 19:00And it's a balanced network,
- 19:02so we have excitatory and inhibitory neurons
- 19:04that talk to each other and maintain a balance
- 19:08in the dynamics here.
- 19:09The hope is to pick a large enough scale network
- 19:11that we see properly chaotic dynamics
- 19:13using this leaky integrate and fire model.
- 19:17These tests have yielded fairly mixed results.
- 19:21So the simple tests behave as expected.
- 19:24So the estimators that are used are biased,
- 19:27and the bias typically decays slower
- 19:29than the variance estimation,
- 19:30which means that you do need fairly long trajectories
- 19:32to try to properly estimate the transfer entropy.
- 19:36That said, transfer entropy does correctly identify
- 19:38causal relationships in simple graphs,
- 19:40and transfer entropy matches the underlying structure
- 19:44used in combinatorial threshold linear networks, so CTLN.
- 19:49Unfortunately, these results did not carry over as cleanly
- 19:52to the leaky integrate and fire models,
- 19:54or to larger models.
- 19:56So what I'm showing you on the right here,
- 19:58this is a matrix where we've calculated
- 20:00the pairwise transfer entropy
- 20:02between all neurons in a 150 neuron balanced network.
- 20:06This is shown absolute, this is shown in the log scale.
- 20:09And the main thing I want to highlight, first,
- 20:11taking a look at this matrix,
- 20:12it's very hard to see exactly what the structure is.
- 20:15You see this banding?
- 20:17That's because neurons tend to be highly predictive
- 20:20if they fire a lot.
- 20:21So there's a strong correlation
- 20:22between the transfer entropy between x and y,
- 20:25and just the activity level of x.
- 20:29But it's hard to distinguish blockwise differences,
- 20:31for example, between inhibitory neurons, excitatory neurons,
- 20:34and that really takes plotting out,
- 20:36so here, this box and whisker plot on the bottom,
- 20:39this is showing you if we group entries of this matrix
- 20:43by type of connection.
- 20:44So maybe excitatory to excitatory,
- 20:45or inhibitor to excitatory, or so on,
- 20:48that the distribution of realized transfer entropy
- 20:50is really different,
- 20:52but they're different in sort of subtle ways.
- 20:54So in this larger scale balanced network,
- 20:58it's much less clear whether transfer entropy
- 21:02effectively is equated in some way
- 21:05with the true connectivity or wiring.
- 21:09In some ways, this is not a surprise,
- 21:10because the behavior of the balanced networks
- 21:12is inherently balanced,
- 21:13and Erdos-Renyi is inherently in the structure.
- 21:16But there are ways in which these experiments have revealed
- 21:19confounding factors that are conceptual factors
- 21:22that make transfer entropies not an ideal measure,
- 21:25or maybe not as ideal as it seems,
- 21:28given the start of this talk.
- 21:29So for example, suppose two trajectories X and Y
- 21:33are both strongly driven by a third trajectory Z,
- 21:36but X responds to Z first.
- 21:39Well, then the present information about X,
- 21:40or the present state of X,
- 21:41carries information about the future of Y,
- 21:43so X is predictive of Y.
- 21:45So X forecasts Y, so in the transfer entropy
- 21:47or Wiener causality setting, we would say X causes Y,
- 21:51even if X and Y are only both responding to Z.
- 21:54So here, in this example, suppose you have a directed tree
- 21:58where information or dynamics propagate down the tree.
- 22:02If you look at this node here, Pj and i,
- 22:07Pj will react to essentially information
- 22:11traveling down this tree before i does,
- 22:13so Pj would be predictive for i,
- 22:15so we would observe an effective connection,
- 22:19where Pj forecasts i.
- 22:21Which means that neurons that are not directly connected
- 22:23may influence each other, and that this transfer entropy,
- 22:26really, you should think of in terms of forecasting,
- 22:29not in terms of being a direct analog to the wiring matrix.
- 22:33One way around this is to condition
- 22:35on the state of the rest of the network
- 22:37before you start doing some averaging.
- 22:39This leads to some other notions of entropy,
- 22:41so, for example, causation entropy,
- 22:42and this is sort of a promising direction,
- 22:44but it's not a direction we've had time to explore yet.
- 22:47So that's the estimation side.
- 22:49Those are the tools for estimating transfer entropy.
- 22:52Now, let's switch gears
- 22:53and talk about that second question I introduced,
- 22:55which is essentially, how do we analyze structure.
- 22:57Suppose we could calculate a transfer entropy graph.
- 23:00How would we extract structural information from that graph?
- 23:04And here, I'm going to be introducing some tools
- 23:06that I've worked on for a while
- 23:08for describing random structures and graphs.
- 23:11These are tied back to some work I've really done
- 23:15as a graduate student, and conversations with Lek-Heng.
- 23:18So we start in a really simple context,
- 23:19we just have a graph or network.
- 23:21This could be directed or undirected,
- 23:23and we're gonna require that it does not have self-loops,
- 23:24and that it's finite.
- 23:26We'll let V here be the number of vertices,
- 23:28and E be the number of edges.
- 23:30Then the object of study that we'll introduce
- 23:33is something called an edge flow.
- 23:34An edge flow is essentially a function
- 23:35on the edges of the graph.
- 23:37So this is a function that accepts pairs of endpoints
- 23:40and returns a real number.
- 23:42And this is an alternating function,
- 23:43so if I take f(i, j), that's negative f(j, i),
- 23:47because you can think of f(i, j) as being some flow,
- 23:49like a flow of material between i and j,
- 23:52hence this name, edge flow.
- 23:54This is analogous to a vector field,
- 23:56because this is analogous to the structure of a vector field
- 23:58on the graph,
- 23:59and represents some sort of flow between nodes.
- 24:02Edge flows are really sort of generic things.
- 24:04So you can take this idea of an edge flow
- 24:07and apply it in a lot of different areas,
- 24:09because really all you need
- 24:10is you just need the structure of some alternating function
- 24:12on the edges of a graph.
- 24:13So I've read papers,
- 24:16and worked in a bunch of these different areas.
- 24:19Particularly, I've focused on applications of this
- 24:21in game theory, in pairwise and social choice settings,
- 24:25in biology and Markov chains.
- 24:26And a lot of this project has been attempting
- 24:28to take this experience working with edge flows in,
- 24:31for example, say, non-equilibrium thermodynamics,
- 24:34or looking at pairwise preference data,
- 24:36and looking at a different application area
- 24:38here to neuroscience.
- 24:40Really, you can you think about the edge flow,
- 24:42or relevant edge flow in neuroscience,
- 24:43you might be asking about asymmetries in wiring patterns,
- 24:46or differences in directed influence or causality,
- 24:49or, really, you can think about
- 24:50these transfer entropy quantities.
- 24:51This is why I was excited about transfer entropy.
- 24:53Transfer entropy is inherently directed notion
- 24:56of information flow, so it's natural to think that
- 24:59if you can calculate things like the transfer entropy,
- 25:01then really, what you're studying is some sort of edge flow
- 25:04on a graph.
- 25:06Edge flows often are subject to the same common questions.
- 25:10So if I want to analyze the structure of an edge flow,
- 25:12there's some really big global questions
- 25:14that I would often ask,
- 25:15that get asked in all these different application areas.
- 25:19One common question is,
- 25:20well, does the flow originate somewhere and end somewhere?
- 25:23Are there sources and sinks in the graph?
- 25:25Another is, does it circulate?
- 25:26And if it does circulate, on what scales, and where?
- 25:31If you have a network that's connected
- 25:33to a whole exterior network, for example,
- 25:35if you're looking at some small subsystem
- 25:37that's embedded in a much larger system,
- 25:38as is almost always the case in neuroscience,
- 25:41then you also need to think about
- 25:42what passes through the network.
- 25:43So, is there a flow or current
- 25:45that moves through the boundary of the network,
- 25:47and is there information that flows through
- 25:50the network that you're studying?
- 25:52And in particular, if we have these different types of flow,
- 25:55if flow can originate in source and end in sinks,
- 25:57if it can circulate, if it can pass through,
- 25:59can we decompose the flow into pieces that do each of these
- 26:03and ask how much of the flow does 1, 2, or 3?
- 26:07Those questions lead to a decomposition.
- 26:11So here, we're going to start with a simple idea.
- 26:13We're going to decompose an edge flow
- 26:15by projecting it onto orthogonal subspaces
- 26:17associated with some graph operators.
- 26:20Generically, if we consider two linear operators,
- 26:23A and B, where the product A times B equals zero,
- 26:27then the range of B must be contained
- 26:29in the null space of A,
- 26:31which means that I can express
- 26:33essentially any set of real numbers,
- 26:35so you can think of this as being
- 26:36the vector space of possible edge flows,
- 26:39as a direct sum of the range of B,
- 26:43the range of A transpose,
- 26:45and the intersection of the null space of B transpose
- 26:47and the null space of A.
- 26:48This blue subspace, this is called the harmonic space,
- 26:53and this is trivial in many applications
- 26:58if you choose A and B correctly.
- 27:00So there's often settings where you can pick A and B
- 27:02so that these two null spaces have no intersection,
- 27:06and then this decomposition boils down
- 27:08to just separating a vector space into the range of B
- 27:13and the range of A transpose.
- 27:16In the graph setting, our goal is essentially
- 27:18to pick these operators to be meaningful things,
- 27:20that is, to pick graph operators
- 27:22so that these subspaces carry a meaningful,
- 27:26or carry meaning in the structural context.
- 27:30So let's think a little bit about graph operators here.
- 27:33So, let's look at two different classes of operators.
- 27:35So we can consider matrices that have E rows and n columns,
- 27:40or matrices that have l rows and E columns,
- 27:43where again, E is the number of edges in this graph.
- 27:48If I have a matrix with E rows,
- 27:50then each column with a matrix has as many entries
- 27:53as there are edges in the graph,
- 27:55so it can be thought of as itself an edge flow.
- 27:57So you can think that this matrix
- 27:59is composed of a set of columns,
- 28:00where each column is some particular motivic flow,
- 28:03or flow motif.
- 28:05In contrast, if I look at a matrix where I have E columns,
- 28:09then each row of the matrix is a flow motif,
- 28:11so products against M evaluate inner products
- 28:16against specific flow motifs.
- 28:18That means in this context,
- 28:20if I look at the range of this matrix,
- 28:21this is really a linear combination
- 28:23of a specific subset of flow motifs,
- 28:25and in this context,
- 28:26if I look at the null space of the matrix,
- 28:28I'm looking at all edge flows orthogonal
- 28:30to that set of flow motifs.
- 28:32So here, if I look at the range of a matrix with E rows,
- 28:36that subspace is essentially modeling behavior
- 28:39similar to the motifs, so if I pick a set of motifs
- 28:42that flow out of a node, or flow into a node,
- 28:45then this range is going to be a subspace of edge flows
- 28:48that tend to originate in sources and end in sinks.
- 28:51In contrast, here, the null space of M,
- 28:54that's all edge flows orthogonal to the flow motifs,
- 28:57so it models behavior distinct from the motifs.
- 28:59Essentially, this space asks what doesn't the flow do,
- 29:02whereas this space asks what does the flow do.
- 29:07Here is a simple, very classical example.
- 29:09And this goes all the way back to, you can think,
- 29:11like Kirchhoff electric circuit theory.
- 29:14We can define two operators.
- 29:15Here, G, this is essentially a gradient operator.
- 29:18And if you've taken some graph theory, you might know this
- 29:20as the edge incidence matrix.
- 29:22This is the matrix which essentially records
- 29:25the endpoints of an edge,
- 29:26and evaluates differences across it.
- 29:29So for example, if I look at this first row of G,
- 29:33this corresponds to edge I in the graph,
- 29:35and if I had a function defined on the nodes in the graph,
- 29:39products with G would evaluate differences across this edge.
- 29:43If you look at its columns,
- 29:44each column here is a flow motif.
- 29:46So for example, this highlighted second column,
- 29:49this is entries 1, -1, 0, -1,
- 29:52if you carry those back to the edges,
- 29:53that corresponds to this specific flow motif.
- 29:56So here, this gradient, it's adjoint,
- 29:58so essentially a divergence operator,
- 30:00which means that the flow motifs are unit in flows
- 30:03or unit out flows for specific nodes,
- 30:05like what's shown here.
- 30:07You can also introduce something like a curl operator.
- 30:10The curl operator evaluates path sums around loops.
- 30:13So this row here, for example, this is a flow motif
- 30:16corresponding to the loop labeled A in this graph.
- 30:20You could certainly imagine other operators
- 30:22build other motifs.
- 30:23These operators are particularly nice,
- 30:25because they define principled subspaces.
- 30:28So if we apply that generic decomposition,
- 30:31then we could say that the space of possible edge flows, RE,
- 30:34can be decomposed into the range of the gradient operator,
- 30:37the range of the curl transpose,
- 30:39and the intersection of their null spaces,
- 30:42into this harmonic space.
- 30:44This is nice, because the range of the gradient,
- 30:46that's flows that start and end somewhere,
- 30:48those are flows that are associated
- 30:49with motion (indistinct) potential.
- 30:52So these, if you're thinking physics,
- 30:53you might say that these are conservative,
- 30:55these are flows generated by voltage
- 30:57if you're looking at an electric circuit.
- 30:59These cyclic flows, while these are the flows and range
- 31:01of the curl transpose, and then this harmonic space,
- 31:04those are flows that enter and leave the network
- 31:06without either starting or ending at a sink or a source,
- 31:10or circulating.
- 31:11So you can think that really,
- 31:12this decomposes the space of edge flows
- 31:14into flows that start and end somewhere inside the network,
- 31:17flows that circulate within the network,
- 31:19and flows that do neither,
- 31:20i.e. flows that enter and leave the network.
- 31:22So this accomplishes that initial decomposition
- 31:25I'd set out at the start.
- 31:28Once we have this decomposition,
- 31:29then we can evaluate the sizes of the components
- 31:33of the decomposition to measure how much of the flow
- 31:36starts and ends somewhere, how much circulates, and so on.
- 31:39So, we can introduce these generic measures,
- 31:41where given some operator M,
- 31:44we decompose the space of edge flows
- 31:46into the range of M and the null space of M transpose,
- 31:49which means we can project f onto these subspaces,
- 31:52and then evaluate the sizes of these components,
- 31:55and that's a way of measuring how much of the flow
- 31:58behaves like the flow motifs contained in this operator,
- 32:01and how much doesn't.
- 32:04So, yeah.
- 32:05So that lets us answer this question,
- 32:07and this is the tool that we're going to be using
- 32:09as our measurable.
- 32:12Now, that's totally easy to do,
- 32:16if you're given a fixed edge flow and a fixed graph.
- 32:17If you have a fixed graph, you can build your operators,
- 32:19you choose the motifs, you have fixed edge flow,
- 32:22you just project the edge flow onto the subspaces,
- 32:24span by those operators, and you're done.
- 32:27However, there are many cases where
- 32:30it's worth thinking about a distribution of edge flows,
- 32:33and then expected structures given that distribution.
- 32:37So here, we're going to be considering random edge flows,
- 32:39for example, an edge flow of capital F.
- 32:41Here, I'm using capital letters to denote random quantities
- 32:43sampled from an edge flow distribution.
- 32:45So this is the distribution of possible edge flows.
- 32:47And this is worth thinking about
- 32:48because many generative models are stochastic.
- 32:51They may involve some random seed,
- 32:53or they may, for example, like that neural model,
- 32:55or a lot of these sort of neural models, be chaotic,
- 32:58so even if they are deterministic generative models,
- 33:01the output data behaves as though it's been sampled
- 33:03from a distribution.
- 33:05On the empirical side, for example,
- 33:07when we're estimating transfer entropy,
- 33:09or estimating some information flow,
- 33:11then there's always some degree of measurement error,
- 33:13or uncertainty in the estimate,
- 33:15which really means that from a Bayesian perspective,
- 33:18we should be thinking that our estimator
- 33:21is a point estimate drawn from some posterior distribution
- 33:24of edge flows, and that we're back in the setting where,
- 33:25again, we need to talk about a distribution.
- 33:28Lastly, this random edge flow setting is also
- 33:31really important if we want to compare the null hypotheses.
- 33:35Because often, if you want to compare
- 33:37to some sort of null hypothesis,
- 33:38it's helpful to have an ensemble of edge flows
- 33:41to compare against,
- 33:43which means that we would like to be able to talk about
- 33:44expected structure under varying distributional assumptions.
- 33:50If we can talk meaningfully about random edge flows,
- 33:54then really what we can start doing
- 33:56is we can start bridging the expected structure
- 33:59back to the distribution.
- 34:00So what we're looking for
- 34:01is a way of explaining generic expectations
- 34:05of what the structure will look like
- 34:07as we vary this distribution of edge flows.
- 34:10You could think that a particular dynamical system
- 34:13generates a wiring pattern, that generates firing dynamics,
- 34:19those firing dynamics determine
- 34:21some sort of information flow graph,
- 34:23and then that information flow graph
- 34:25is really a sample from that generative model,
- 34:28and we would like to be able to talk about
- 34:30what would we expect
- 34:32if we knew the distribution of edge flows
- 34:34about the global structure.
- 34:35That is, we'd like to bridge global structure
- 34:37back to this distribution.
- 34:39And then, ideally, you'd bridge that distribution
- 34:41back to the generative mechanism.
- 34:42And this is a project for future work.
- 34:45Obviously, this is fairly ambitious.
- 34:47However, this first point is something you can do
- 34:51really in fairly explicit detail,
- 34:53and that's what I would like to spell out
- 34:54with the end of this talk,
- 34:55is how do you bridge global structure
- 34:58back to a distribution of edge flows.
- 35:02So yeah.
- 35:03So that's our main question.
- 35:05How does the choice of distribution
- 35:06influence the expected global flow structure?
- 35:12So first, let's start with a lemma.
- 35:15Suppose that we have a distribution of edge flows
- 35:17with some expectation f bar, and some covariance,
- 35:20here I'm using double bar V to denote covariance.
- 35:24We'll let S contained in the set of...
- 35:27S will be a subspace contained within
- 35:29the vector space of edge flows,
- 35:31and we'll let PS be the orthogonal projector onto S.
- 35:35Then FS, that's the projection of F onto this subspace S,
- 35:40the expectation of its norm squared
- 35:43is the norm of the expected flow projected onto S squared,
- 35:48so this is essentially the expectation of the sample
- 35:53is the measure evaluated with the expected sample.
- 35:56And then plus a term that involves an inner product
- 35:58between the projector on the subspace
- 36:00and the covariance matrix for the edge flows.
- 36:02Here, this denotes the matrix inner product,
- 36:04so is just the sum over all ij entries.
- 36:09What's nice about this formula is,
- 36:10at least in terms of expectation,
- 36:13it reduces this study of the bridge
- 36:17between distribution and network structure
- 36:20to a study of moments, right?
- 36:22Because we've replaced a distributional problem here
- 36:24with a linear algebra problem
- 36:27that's posed in terms of this projector,
- 36:29the projector out of the subspace S,
- 36:31which is determined by the topology of the network.
- 36:33And the variance in that edge flow,
- 36:36which is determined by your generative model.
- 36:40Well, you might say, "Okay, well, fine,
- 36:42this is a matrix inner product, we can just stop here.
- 36:44We could compute this projector.
- 36:45We could sample a whole bunch of edge flows
- 36:47to compute this covariance.
- 36:48So you can do this matrix inner product."
- 36:50But I'm sort of greedy, because I suspect
- 36:54that you can really do more with this inner product.
- 36:57So I'd like to highlight some challenges
- 37:00associated with this inner product.
- 37:03So first, let's say I asked you to design a distribution
- 37:06with tuneable global structure.
- 37:07So for example, I said I want you to
- 37:09pick a generative model,
- 37:10or design a distribution of edge flows,
- 37:12that when I sample edge flows from it,
- 37:14their expected structures match some expectation.
- 37:18It's not obvious how to do that given this formula.
- 37:22It's not obvious in particular, because these projectors,
- 37:24like the projector onto subspace S,
- 37:26typically depend in fairly non-trivial ways
- 37:29on the graph topology.
- 37:30So small changes in the graph topology
- 37:32can completely change this projector.
- 37:34In essence, it's hard to isolate topology from distribution.
- 37:37You could think that this inner product,
- 37:39if I think about it in terms of the ij entries,
- 37:43while easy to compute, is not easy to interpret,
- 37:47because i and j are somewhat arbitrary indexing.
- 37:49And obviously, really, the topology of the graph,
- 37:51it's not encoded in the indexing,
- 37:53it's encoded in the structure of these matrices.
- 37:56So in some ways, what we really need
- 37:57is a better basis for computing this inner product.
- 38:01In addition, computing this inner product
- 38:03just may not be empirically feasible,
- 38:05because it might not be feasible
- 38:07to estimate all these covariances.
- 38:08There's lots of settings where,
- 38:09if you have a random edge flow,
- 38:11it becomes very expensive to try to estimate
- 38:13all the covariances in this graph, or sorry,
- 38:15in this matrix, because this matrix has as many entries
- 38:19as there are pairs of edges in the graph.
- 38:22And typically, that number of edges grows fairly quickly
- 38:26in the number of nodes in the graph.
- 38:27So in the worst case, the size of these matrices
- 38:31goes not to the square of the number of nodes in the graph,
- 38:33but the number of nodes in the graph to the fourth,
- 38:35so this becomes very expensive very fast.
- 38:37Again, we could try to address this problem
- 38:41if we had a better basis for performing this inner product,
- 38:43because we might hope to be able to truncate
- 38:46somewhere in that basis,
- 38:47and use a lower dimensional representation.
- 38:50So, to build there,
- 38:52I'm going to show you a particular family of covariances.
- 38:55We're going to start with a very simple generative model.
- 38:58So let's suppose that each node of the graph
- 39:00is assigned some set of attributes,
- 39:02here, a random vector X, sampled from a...
- 39:04So you can think of trait space,
- 39:05a space of possible attributes.
- 39:07And these are sampled i.i.d.
- 39:09In addition, we'll assume there exists
- 39:11an alternating function f,
- 39:13which accepts pairs of attributes,
- 39:15and returns a real number.
- 39:17So this is something that I can evaluate on the endpoints
- 39:20of an edge, and return an edge flow value.
- 39:24In this setting,
- 39:26everything that I've shown you before simplifies.
- 39:29So if my edge flow F is drawn
- 39:32by first sampling a set of attributes,
- 39:34and then plugging those attributes into functions
- 39:36on the edges, then the mean edge flow is zero,
- 39:42so that f bar goes away,
- 39:44and the covariance reduces to this form.
- 39:46So you get a standard form,
- 39:47where the covariance and the edge flow
- 39:49is a function of two scalar quantities,
- 39:52that's sigma squared and rho,
- 39:53these are both statistics associated with this function
- 39:56and the distribution of traits.
- 39:59And then some matrices, so we have an identity matrix,
- 40:02and we have this gradient matrix showing up again.
- 40:05This is really nice, because when you plug it back in,
- 40:07we try to compute, say,
- 40:08the expected sizes of the components,
- 40:13this matrix inner product
- 40:15that I was complaining about before,
- 40:17this whole matrix inner product simplifies.
- 40:19So when you have a variance
- 40:21that's in this nice, simple, canonical form,
- 40:23then the expected overall size of the edge flow,
- 40:26that's just sigma squared, the expected size
- 40:29projected onto that conservative subspace,
- 40:32that breaks into this combination
- 40:35of the sigma squared and the rho,
- 40:37again, those are some simple statistics.
- 40:39And then V, E, L, and E, those are just
- 40:42essentially dimension counting on the network.
- 40:44So this is the number of vertices, the number of edges,
- 40:47and the number of loops, the number of loops,
- 40:48that's the number of edges
- 40:49minus the number of vertices plus one.
- 40:52And similarly, the expected cyclic size,
- 40:55or size of the cyclic component, reduces to,
- 40:57again, this scalar factor in terms of the statistics,
- 41:01and some dimension counting topology related quantities.
- 41:08So this is very nice,
- 41:09because this allows us to really separate
- 41:12the role of topology from the role of the generative model.
- 41:14The generative model determines sigma and rho,
- 41:17and topology determines these dimensions.
- 41:22It turns out that the same thing is true
- 41:26even if you don't sample the edge flow
- 41:29using this trait approach, but the graph is complete.
- 41:33So if your graph is complete,
- 41:34then no matter how you sample your edge flow,
- 41:37for any edge flow distribution,
- 41:38exactly the same formulas hold,
- 41:40you just replace those simple statistics
- 41:43with estimators for those statistics
- 41:45given your sampled flow.
- 41:47And this is sort of a striking result,
- 41:49because this says that this conclusion
- 41:51that was linked to some specific generative model
- 41:54with some very specific assumptions, right,
- 41:56we assumed it was i.i.d.,
- 41:57extends to all complete graphs,
- 41:59regardless of the actual distribution that we sampled from.
- 42:05Up until this point,
- 42:06this is kind of just an algebra miracle.
- 42:09And one of the things I'd like to do at the end of this talk
- 42:11is explain why this is true,
- 42:13and show how to generalize these results.
- 42:16So to build there,
- 42:17let's emphasize some of the advantages of this.
- 42:19So first, the advantages of the model,
- 42:22it's mechanistically plausible in certain settings,
- 42:24it cleanly separated the role of topology and distribution,
- 42:28and these coefficients that had to do with topology,
- 42:30these are just dimensions,
- 42:31these are non negative quantities,
- 42:34so it's easy to work out monotonic relationships
- 42:36between expected structure and simple statistics
- 42:40of the edge flow distribution.
- 42:44The fact that you can do that enables more general analysis.
- 42:47So I'm showing you on the right here,
- 42:48this is from a different application area.
- 42:51This was an experiment where we trained a set of agents
- 42:55to play a game using a genetic algorithm,
- 42:58and then we looked at the expected sizes
- 43:00of cyclic and acyclic components
- 43:02in a tournament among those agents.
- 43:05And you can actually predict these curves
- 43:08using this type of structure analysis,
- 43:10because it was possible to predict the dynamics
- 43:13of these simple statistics, this sigma and this rho.
- 43:18So this is a really powerful analytical tool,
- 43:20but it is limited to this particular model.
- 43:23In particular, it only models unstructured cycles,
- 43:26so if you look at the cyclic component
- 43:27generated by this model,
- 43:28it just looks like random noise that's been projected
- 43:31onto the range of the current transpose.
- 43:34It's limited to correlations on adjacent edges,
- 43:36so we only generate correlations
- 43:38on edges that share an endpoint, because you could think
- 43:40that all of the original random information
- 43:42comes from the endpoints.
- 43:44And then, in some ways, it's not general enough.
- 43:47So it lacks an expressivity.
- 43:48We can't parameterize all possible expected structures
- 43:51by picking sigma and rho.
- 43:54And we lack some notion of sufficiency,
- 43:56i.e. if the graph is not complete,
- 43:58then this nice algebraic property,
- 44:01that it actually didn't matter what the distribution was,
- 44:03this fails to hold.
- 44:04So if the graph is not complete,
- 44:06then projection onto the family of covariances
- 44:09parameterized in this fashion
- 44:11changes the expected global structure.
- 44:15So we would like to address these limitations.
- 44:17And so our goal for the next part of this talk
- 44:19is to really generalize these results.
- 44:21To generalize,
- 44:22we're going to switch our perspective a little bit.
- 44:25So I'll recall this formula,
- 44:27that if we generate our edge flow
- 44:30by sampling quantities on the endpoints,
- 44:32and then plugging them into functions on the edges,
- 44:34then you necessarily get a covariance
- 44:35that's in this two parameter family,
- 44:37where I have two scalar quantities
- 44:39associated with the statistics of the edge flow,
- 44:41that's this sigma and this rho,
- 44:42and then I have some matrices
- 44:43that are associated with the topology of the network
- 44:45in the subspaces I'm projecting onto.
- 44:48These are related to a different way
- 44:51of looking at the graph.
- 44:52So I can start with my original graph,
- 44:54and then I can convert it to an edge graph,
- 44:57where I have one node per edge in the graph,
- 45:00and nodes are connected if they share an endpoint.
- 45:04You can then assign essentially signs to these edges
- 45:07based on whether the edge direction
- 45:10chosen in the original graph is consistent or inconsistent
- 45:14at the node that links two edges.
- 45:16So for example, edges 1 and 2 both point in to this node,
- 45:20so there's an edge linking 1 and 2
- 45:22in the edge graph with a positive sign.
- 45:25This essentially tells you
- 45:25that the influence of random information
- 45:30assigned on this node linking 1 and 2
- 45:33would positively correlate the sample edge flow
- 45:36on edges 1 and 2.
- 45:38Then, this form, what this form for covariance matrices says
- 45:43is that we're looking at families of edge flows
- 45:46that have correlations on edges sharing an endpoint,
- 45:49so edges at distance one in this edge graph,
- 45:51and non-adjacent edges
- 45:52are entirely independent of each other.
- 45:56Okay.
- 45:58So that's essentially what the trait-performance model
- 46:00is doing, is it's parameterizing
- 46:02a family of covariance matrices,
- 46:04where we're modeling correlations at distance one,
- 46:06but not further in the edge graph.
- 46:08So then the natural thought
- 46:09for how to generalize these results is to ask,
- 46:11"Can we model longer distance correlations to this graph?"
- 46:15To do so, let's think a little bit
- 46:17about what this matrix
- 46:19that's showing up inside the covariances,
- 46:21so we have a gradient times a gradient transpose.
- 46:24This is in effect a Laplacian for that edge graph.
- 46:30And you can do this for other motifs.
- 46:32If you think about different motif constructions,
- 46:35essentially if you take a product of M transpose times M,
- 46:38that will generate something that looks like a Laplacian
- 46:41or an adjacency matrix for a graph
- 46:44where I'm assigning nodes to be motifs,
- 46:47and looking at the overlap of motifs.
- 46:50And if I look at M times M transpose,
- 46:52and I'm looking at the overlap of edges via shared motifs.
- 46:55So these operators you can think about as being Laplacians
- 46:57for some sort of graph
- 46:59that's generated from the original graph motifs.
- 47:04Like any adjacency matrix,
- 47:06powers of something like G, G transpose minus 2I,
- 47:11that would model connections along longer paths,
- 47:14along longer distances in these graphs
- 47:16associated with motifs, in this case, with the edge graph.
- 47:20So our thought is, maybe, well,
- 47:21we could extend this trait performance
- 47:23family of covariance matrices
- 47:25by instead of only looking at
- 47:27a linear combination of an identity matrix and this matrix,
- 47:31we could look at a power series.
- 47:32So we could consider combining powers of this matrix.
- 47:37And this would generate this family of matrices
- 47:39that are parameterized by some set of
- 47:41coefficients (indistinct)... <v ->Dr. Strang.</v>
- 47:43I apologize, I just wanted to remind you
- 47:46that we have a rather tight time limit,
- 47:48approximately a couple of minutes.
- 47:50<v ->Yes, of course.</v>
- 47:52So here, the idea is to parameterize this family of matrices
- 47:57by introducing a set of polynomials with coefficients alpha,
- 48:00and then plugging into the polynomial
- 48:03the Laplacian that's generated by...
- 48:06The adjacency matrix generated by the graph motifs
- 48:09we're interested in.
- 48:11And that trait performance result,
- 48:12that was really just looking at the first order case here,
- 48:14that was looking at a linear polynomial
- 48:17with these chosen coefficients.
- 48:20This power series model is really nice analytically.
- 48:24So if we start with some graph operator M,
- 48:28and we consider the family of covariance matrices
- 48:31generated by plugging M, M transpose into some
- 48:34polynomial and power series,
- 48:36then this family of matrices
- 48:39is contained within the span of powers of M, M transpose.
- 48:45You can talk about this family in terms of combinatorics.
- 48:48So, for example, if we use that gradient
- 48:50times gradient transpose minus twice the identity,
- 48:52then powers of this is essentially, again, path counting,
- 48:55so this is counting paths of length n.
- 48:58You can also look at things like the trace of these powers.
- 49:00So if you look at the trace series,
- 49:02that's the sequence where you look at the trace of powers
- 49:05of these, essentially, these adjacency matrices.
- 49:09This is doing some sort of loop count,
- 49:11where we're counting loops of different length.
- 49:14And you can think of this trace series, in some sense,
- 49:15as controlling amplification of self-correlations
- 49:19within the sampled edge flow.
- 49:22Depending on the generative model,
- 49:23we might want to use different operators
- 49:25for generating these families.
- 49:26So for example, going back to that synaptic plasticity model
- 49:29with coupled oscillators, in this case, using the gradient
- 49:33to generate the family of covariance matrices
- 49:35is not really the right structure,
- 49:37because the dynamics of the model
- 49:40have these natural cyclic connections.
- 49:43So it's better to build the power series using the curl.
- 49:46So depending on your model,
- 49:47you can adapt this power series family
- 49:49by plugging in a different graph operator.
- 49:53Let's see now what happens if we try to compute
- 49:55the expected sizes of some components
- 49:58using a power series of this form.
- 50:00So, if the variance, or covariance matrix for edge flow
- 50:04is a power series in, for example,
- 50:06the gradient, gradient transpose,
- 50:08then the expected sizes of the measures
- 50:12can all be expressed as linear combinations
- 50:14of this trace series
- 50:16and the coefficients of the original polynomial.
- 50:19For example, the expected cyclic size of the flow
- 50:21is just the polynomial evaluated at negative two,
- 50:24multiplied by the number of loops in the graph.
- 50:26And this, this really generalizes
- 50:28that trait performance result,
- 50:29because the trait performance result
- 50:30is given by restricting these polynomials to be linear.
- 50:36This, you can extend to other bases.
- 50:41But really, what this accomplishes
- 50:43is by generalizing trait performance,
- 50:45we achieve this generic properties that it failed to have.
- 50:52So in particular, if I have an edge flow subspace S
- 50:56spanned by the flow motifs stored in some operator M,
- 50:59then this power series family of covariances
- 51:01associated with the Laplacian, that is, M times M transpose,
- 51:05is both expressive, in the sense that
- 51:08for any non negative a and b,
- 51:11I can pick some alpha and beta
- 51:13so that the expected size
- 51:15of the projection of F onto the subspaces a,
- 51:18and the projected size of F on the subspace orthogonal to S
- 51:22is b for any covariance in this power series family.
- 51:27And it's sufficient in the sense that
- 51:30for any edge flow distribution with mean zero,
- 51:32and covariance V,
- 51:35if C is the matrix nearest to V in Frobenius norm,
- 51:38restricted to the power series family,
- 51:40then these inner products computed in terms of C
- 51:44are exactly the same as inner products
- 51:46computed in terms of V,
- 51:47so they directly predict the structure,
- 51:49which means that if I use this power series family,
- 51:51discrepancies off of this family
- 51:54don't change the expected structure.
- 51:57Okay.
- 51:57So, I know I'm short on time here,
- 51:59so I'd like to skip, then, just to the end of this talk.
- 52:03There's further things you can do with this,
- 52:04this is sort of really nice mathematically.
- 52:07You can build an approximation theory out of this,
- 52:10and study it for different random graph families,
- 52:12how many terms in these power series you need.
- 52:15And those terms define
- 52:16some nicer simple minimal set of statistics,
- 52:19to try to estimate structure.
- 52:22But I'd like to really just get to the end here,
- 52:25and emphasize the takeaways from this talk.
- 52:28So the first half of this talk
- 52:30was focused on information flow.
- 52:32What we saw is that information flow is a non-trivial,
- 52:35but well studied estimation problem.
- 52:37And this is something that, at least on my side,
- 52:38is a work in progress with students.
- 52:41Here, the, in some ways,
- 52:42the conclusion of that first half
- 52:43would be that causation entropy
- 52:45may be a more appropriate measure than TE
- 52:47when trying to build these flow graphs
- 52:49to apply these structural measures to.
- 52:51Then, on the structural side, we can say that
- 52:54power series families,
- 52:55this is a nice family of covariance matrices.
- 52:57It has nice properties that are useful empirically,
- 52:59because they let us build global correlation structures
- 53:02from a sequence of local correlations
- 53:03from that power series.
- 53:06If you plug this back into the expected measures,
- 53:08you can recover monotonic relations,
- 53:10like in that limited trait performance case.
- 53:12And truncation of these power series
- 53:14reduces the number of quantities
- 53:16that you would actually need to measure.
- 53:19Actually, to a number of quantities that can be quite small
- 53:21relative to the graph,
- 53:22and that's where this approximation theory comes in.
- 53:25One way, maybe to sort of summarize this entire approach,
- 53:28is what we've done is by looking at these power series
- 53:31built in terms of the graph operators,
- 53:33is it provides a way to study
- 53:35inherently heterogeneous connections, or covariances,
- 53:39or edge flows distributions,
- 53:41using a homogeneous correlation model
- 53:43that's built at multiple scales by starting with local scale
- 53:46and then looking at powers.
- 53:49In some ways, this is a common...
- 53:50I ended a previous version of this talk with,
- 53:53I still think that this structural analysis is,
- 53:55in some ways, a hammer seeking a nail,
- 53:57and that this information flow construction,
- 53:59this is work in progress to try and build that nail.
- 54:02So thank you all for your attention.
- 54:04I'll turn it now over to questions.
- 54:07<v Instructor>(indistinct)</v>
- 54:09Thank you so much for your talk.
- 54:11Really appreciate it.
- 54:15For those of you on Zoom,
- 54:16you're welcome to keep up the conversations,
- 54:17but unfortunately we have to clear the room,
- 54:20so I do apologize.
- 54:21But, (indistinct).
- 54:25Dr. Strang?
- 54:26Am I muted?
- 54:30Dr. Strang?
- 54:32<v ->Oh, yes, yeah.</v>
- 54:33<v Instructor>Okay, do you mind if people...</v>
- 54:35We have to clear the room, do you mind if people
- 54:37email you if they have questions?
- 54:40<v ->I'm sorry, I couldn't hear the end of the question.</v>
- 54:42Do I mind if...
- 54:45<v Instructor>We have to clear the room,</v>
- 54:47do you mind if people email you if they have questions,
- 54:49and (indistinct)... <v ->No, no, not at all.</v>
- 54:52<v Instructor>So I do apologize, they are literally</v>
- 54:54(indistinct) the room right now.
- 54:57<v ->Okay, no, yeah, that's totally fine.</v>
- 54:59<v Instructor>Thank you.</v>
- 55:01And thanks again for a wonderful talk.
- 55:03<v ->Thank you.</v>