YSPH Biostatistics Seminar: “Robust Mendelian Randomization in the Presence of Many Weak Instruments and Widespread Horizontal Pleiotropy”

September 21, 2022

Information

Speaker: Ting Ye, PhD, Assistant Professor, Department of Biostatistics, University of Washington

September 20, 2022

ID8096

To CiteDCA Citation Guide

00:00<v Instructor>Good afternoon.</v>
00:01In respect for everybody's time today,
00:04let's go ahead and get started.
00:07So today, it is my pleasure to introduce
00:09Dr. Alexander Strang.
00:12Dr. Strang earned his bachelor's in mathematics, in physics,
00:16as well as his PhD in applied mathematics
00:19from Case Western Reserve University in Cleveland, Ohio.
00:24Born in Ohio, so representing.
00:27He studies variational inference problems,
00:29noise propagation in biological networks,
00:32self organizing edge flows,
00:34and functional form game theory
00:36at the University of Chicago,
00:38where he is a William H. Kruskal Instructor
00:40of physics and applied mathematics.
00:43Today, he's going to talk to us about motivic expansion
00:47of global information flow in spike train data.
00:50Let's welcome our speaker.
00:54<v ->Okay, thank you very much.</v>
00:56Thank you, first, for the kind invite,
00:59and for the opportunity to speak here in your seminar.
01:03So, I'd like to start with some acknowledgements.
01:06This is very much work in progress.
01:09Part of what I'm going to be showing you today
01:11is really the work of a Master's student
01:12that I've been working with this summer, that's Bowen,
01:15and really, I'd like to thank Bowen
01:16for a lot of the simulation,
01:18and a lot of the TE calculation I'll show you later.
01:21This project, more generally, was born out of conversations
01:23with Brent Doiron and Lek-Heng Lim here at Chicago.
01:28Brent really was the inspiration
01:29for starting to venture into computational neuroscience.
01:33I really say that I am new to this world,
01:35this world is exciting to me, but really it's a world
01:38that I am actively exploring and learning about.
01:42So I look forward to conversations afterwards
01:44to learn more here.
01:46My background was much more inspired
01:48by Lek-Heng's work in computational technology,
01:52and some of what I'll be presenting today
01:54is really inspired by conversations with him.
01:58So, let's start with some introduction and motivation.
02:01The motivation, personally, for this talk.
02:05So it goes back, really, to work that I started
02:06when I was a graduate student.
02:08I've had this long standing interest in the interplay
02:11between structure and dynamics, in particular in networks.
02:14I've been interested in questions like
02:16how does the structure of a network determine
02:17dynamics of processes on that network.
02:21And, in turn, how do processes on that network
02:24give rise to structure?
02:26On the biological side...
02:30On the biological side, in today's talk,
02:32I'm going to be focusing on applications of this question
02:36within neural networks.
02:38And I think that this world of computational neuroscience
02:40is really exciting if you're interested in this interplay
02:42between structure and dynamics,
02:44because neural networks encode, transmit, and process
02:47information via dynamical processes.
02:49For example, the process, the dynamical process
02:53of a neural network is directed by the wiring patterns,
02:56by the structure of that network, and moreover,
02:58if you're talking about some sort of learning process,
03:01then those wiring patterns can change and adapt
03:04during the learning process, so they are themselves dynamic.
03:08In this area, I've been interested in questions like,
03:10how is the flow of information governed
03:12by the wiring pattern,
03:14how do patterns of information flow present themselves
03:17in data, and can they be inferred from that data,
03:19and what types of wiring patterns
03:21might develop during learning.
03:24Answering questions of this type
03:25requires a couple of things.
03:26So the very, very big picture requires a language
03:29for describing structures and patterns,
03:31it requires having a dynamical process,
03:33some sort of model for the neural net,
03:35and it requires a generating model
03:38that generates initial structure,
03:40and links structure to dynamics.
03:42Alternatively, if we don't generate things using a model,
03:45if we have some sort of observable or data,
03:47then we can try to work in the other direction
03:49and go from dynamics to structure.
03:52Today, during this talk, I'm going to be focusing really
03:54on this first piece,
03:55on a language for describing structures and patterns,
03:57and on the second piece,
03:59on an observable that I've been working on
04:01trying to compute to use to try to connect
04:05these three points together.
04:08So, to get started, a little bit of biology.
04:10Really, I was inspired in this project by a paper
04:13from Keiji Miura.
04:14He was looking at a coupled oscillator model,
04:17this was a Kuramoto model for neural activity
04:20where the firing dynamics interact with the wiring.
04:22So the wiring that couples the oscillators
04:26would adapt on a slower timescale
04:29than the oscillators themselves,
04:31and that adaptation could represent
04:34different types of learning processes.
04:36For example, the fire-together wire-together rules,
04:40so Hebbian learning,
04:41you could look at causal learning rules,
04:43or anti-Hebbian learning rules.
04:45This is just an example of one, of the system.
04:48This system of (indistinct) is sort of interesting
04:50because it can generate all sorts of different patterns.
04:52You can see synchronized firing,
04:54you can see traveling waves,
04:55you can see chaos.
04:57These occur at different critical boundaries.
04:59So you can see phase transmissions
05:01when you have large collections of these oscillators.
05:04And depending on how they're coupled together,
05:05it behaves differently.
05:07In particular, what's interesting here is that
05:10starting from some random seed topology,
05:13the dynamics play forward from that initial condition,
05:16and that random seed topology
05:18produces an ensemble of wiring patterns
05:20that are themselves random.
05:22And we can think of that ensemble of wiring patterns
05:24as being chaotic realizations of some random initialization.
05:29That said, you can also observe structures
05:32within the systems of coupled oscillators.
05:33So you can see large scale cyclic structures
05:36representing organized causal firing patterns
05:38in certain regimes.
05:40So this is a nice example
05:41where graph structure and learning parameters
05:43can determine dynamics, and in turn,
05:44where those dynamics can determine structure.
05:48On the other side, you can also think about
05:49a data-driven side instead of a model-driven side.
05:53If we empirically observe sample trajectories
05:56of some observables, for example, neuron recordings,
05:58then we might hope to infer something
05:59about the connectivity that generates them.
06:01And so here, instead of starting by posing a model,
06:04and then simulating it and setting up how it behaves,
06:06we can instead study data,
06:07or try to study structure in data.
06:10Often, that data comes in the form of covariance matrices
06:12representing firing rates.
06:14And these covariance matrices
06:15may be auto covariance matrices with some sort of time-lag.
06:19Here, there are a couple of standard structural approaches.
06:22So there's a motivic expansion approach.
06:25This was at least introduced by Brent Doiron's lab,
06:28with his student, Gabe Ocker.
06:30Here, the idea is that you define some graph motifs,
06:34and then you can study the dynamics
06:36in terms of those graph motifs.
06:38For example, if you build a power series in those motifs,
06:41then you can try to represent your covariance matrices
06:44in terms of that power series.
06:45And this is something we're gonna talk about
06:46quite a bit today.
06:47This really, part of why I was inspired by this work is
06:49I had been working separately
06:51on the idea of looking at covariance matrices
06:53in terms of these power series expansions.
06:56This is also connected to topological data analysis,
06:59and this is where the conversations with Lek-Heng
07:01played a role in this work.
07:03Topological data analysis aims to construct graphs
07:07representing dynamical systems.
07:08For example, you might look at the dynamical similarity
07:11of firing patterns of certain neurons,
07:13and then try to study the topology of those graphs.
07:18Again, this leads to similar questions,
07:20but we could be a little bit more precise here
07:21for thinking in neuroscience.
07:24We can say more precisely, for example,
07:25how is information processing and transfer represented,
07:29both in these covariance matrices
07:31and the structures that we hope to extract from them?
07:33In particular, can we try and infer causality
07:36from firing patterns?
07:39And this is fundamentally an information theoretic question.
07:42And really, we're asking,
07:43can we study the directed exchange of information
07:45from trajectories?
07:47Here, one approach, I mean, in some sense,
07:49you can never tell causality without some underlying model,
07:53without some underlying understanding and mechanism,
07:56so if all we can do is observe,
07:58then we need to define what we mean by causality.
08:01A reasonable standard definition here is Wiener causality,
08:04which says that two time series share a causal relation,
08:06so we say x causes y,
08:08if the history of x informs the future of y.
08:12And note that here, cause, I put in quotes,
08:14really means forecasts.
08:16It means that the past, or the present of x,
08:18carries relevant information about the future of y.
08:22A natural measure of that information is transfer entropy.
08:26Transfer entropy was introduced by Schrieber in 2000,
08:30and is the expected KL divergence
08:32between the distribution of the future of y
08:35given the history of x,
08:38and the marginal distribution of the future of y.
08:41So essentially, it's how much predictive information
08:43does x carry about y.
08:46This is a nice quantity for a couple of reasons.
08:48First, it's zero when two trajectories are independent.
08:51Second, since it's just defining
08:53some of these conditional distributions, it's model free,
08:56so I put here no with a star,
08:58because generative assumptions actually do matter
09:01when you go to try and compute it, but in principle,
09:02it's defined independent of the model.
09:05Again, unlike some other effective causality measures,
09:07it doesn't require introducing some time-lag to define.
09:11It's a naturally directed quantity.
09:13We can say that the future of y
09:15conditioned on the past of x...
09:17That transfer entropy is defined in terms of the future of y
09:20conditioned on the past of x and y.
09:23And that quantity is directed, because reversing x and y
09:27does not symmetrically change the statement.
09:30This is different than quantities like mutual information
09:32or correlation, that are also often used
09:34to try and measure effective connectivity in networks
09:37which are fundamentally symmetric quantities.
09:41Transfer entropy is also less corrupted
09:43by measurement noise, linear mixing of signals,
09:46or shared coupling to external sources.
09:50Lastly, and maybe most interestingly,
09:52if we think in terms of correlations,
09:54correlations are really moments,
09:56correlations are really about covariances, right,
09:57second order moments.
09:59Transfer entropies, these are about entropies,
10:01these are logs of distributions,
10:04and so they depend on the full shape of these distributions.
10:06Which means that transfer entropy can capture coupling
10:10that is maybe not apparent, or not obvious
10:13just looking at second order moment type analysis.
10:17So transfer entropy has been applied pretty broadly.
10:20It's been applied to spiking cortical networks
10:22and calcium imaging,
10:24to MEG data in motor tasks and auditory discrimination,
10:29it's been applied to emotion recognition,
10:31precious metal prices
10:32and multivariate time series forecasting,
10:34and more recently, to accelerate learning
10:36in different artificial neural nets.
10:38So you can look at feedforward architectures,
10:40convolutional architectures, even recurrent neural nets.
10:42And transfer entropy has been used to accelerate learning
10:45in those frameworks.
10:49For this part of the talk,
10:50I'd like to focus really on two questions.
10:52First, how do we compute transfer entropy,
10:55and then second, if we could compute transfer entropy,
10:58and build a graph out of that,
11:00how would we study the structure of that graph?
11:01Essentially, how is information flow structured?
11:05We'll start with computing transfer entropy.
11:09To compute transfer entropy,
11:10we actually need to write down an equation.
11:13So transfer entropy was originally introduced
11:14for discrete time arbitrary order Markov processes.
11:18So suppose we have two Markov processes X and Y.
11:20And we'll let Xn denote the state of process X at time n,
11:25and Xnk, where the k is in superscript,
11:27to denote the sequence starting from n minus k plus 1
11:31going up to n.
11:32So that's the last k states that the process X visited.
11:37Then, the transfer entropy from Y to X,
11:40they're denoted T, Y over to X,
11:44is the entropy of the future of X, conditioned on its past,
11:50minus the entropy of the future of X conditioned on its past
11:54and the past of the trajectory Y.
11:56So here, you can think the transfer entropy
11:57is essentially the reduction in entropy
11:59of the future states of X when incorporating the past of Y.
12:03This means that computing transfer entropy
12:05reduces to estimating essentially these entropies.
12:07That means we need to estimate essentially
12:09the conditional distributions inside of these parentheses.
12:14That's easy for certain processes, so for example,
12:16if X and Y are Gaussian processes,
12:19then really what we're trying to compute
12:20is conditional mutual information,
12:22and there are nice equations
12:23for conditional mutual information
12:25when you have Gaussian random variables.
12:26So if I have three Gaussian random variables, X, Y, Z,
12:29possibly multivariate, with joint covariant sigma,
12:33then the conditional mutual information
12:35between these variables, so the mutual information
12:37between X and Y conditioned on Z,
12:39is just given by this ratio of log determinants
12:42of those convariances.
12:45In particular, a common test model used
12:48in the transfer entropy literature
12:51are linear auto-regressive processes,
12:53because a linear auto-regressive process,
12:55when perturbed by Gaussian noise,
12:57produces a Gaussian process.
12:58All of the different joint marginal
13:00conditional distributions are all Gaussian,
13:02which means that we can compute
13:03these covariances analytically, which then means
13:06that you can compute the transfer entropy analytically.
13:07So these linear auto-regressive processes
13:09are nice test cases,
13:10'cause you can do everything analytically.
13:12They're also somewhat disappointing, or somewhat limiting,
13:15because in this linear auto-regressive case,
13:17transfer entropy is the same as Granger causality.
13:22And in this Gaussian case, essentially what we've done
13:25is we've reduced transfer entropy
13:27to a study of time-lagged correlations.
13:29So this becomes the same as a correlation based analysis.
13:32We can't incorporate information beyond the second moments
13:34if we restrict ourselves to Gaussian processes,
13:36or Gaussian approximations.
13:39The other thing to note is this is strongly model dependent,
13:41because this particular formula
13:43for computing mutual information
13:44depends on having Gaussian distributions.
13:50In a more general setting, or a more empirical setting,
13:53you might observe some data.
13:55You don't know if that data
13:56comes from some particular process,
13:58and you can't necessarily assume
13:59the conditional distribution is Gaussian.
14:01But we would still like to estimate transfer entropy,
14:03which leads to the problem of estimating transfer entropy
14:06given an observed time series.
14:08We would like to do this, again, sans model assumptions,
14:11so we don't want to assume Gaussianity.
14:13This is sort of trivial, again, I star that,
14:16in discrete state spaces,
14:17because essentially it amounts to counting occurrences.
14:20But it becomes difficult whenever the state spaces are large
14:23and/or high dimensional, as they often are.
14:26This leads to a couple of different approaches.
14:28So, as a first example, let's consider spike train data.
14:32So spike train data consists, essentially,
14:34of binning the state of a neuron into either on or off.
14:39So neurons, you can think either in a state zero or one.
14:41And then a pairwise calculation for transfer entropy
14:44only requires estimating a joint probability distribution
14:48over 4 to the k plus l states, where k plus l,
14:51k is the history of x that we remember,
14:54and l is the history of y.
14:56So if the Markov process generating the spike train data
15:01is not of high order, does not have a long memory,
15:04then these k and l can be small,
15:06and this state space is fairly small,
15:08so this falls into that first category,
15:10when we're looking at a discrete state space
15:12and it's not too difficult.
15:15In a more general setting, if we don't try to bin the states
15:18of the neurons to on or off,
15:19for example, maybe we're looking at a firing rate model,
15:22where we want to look at the firing rates of the neurons,
15:24and that's a continuous random variable,
15:27then we need some other types of estimators.
15:29So the common estimator used here
15:31is a kernel density estimator, or KSG estimator.
15:34And this is designed for large, continuous,
15:36or high dimensional state spaces,
15:37e.g., these firing rate models.
15:40Typically, the approach is to employ a Takens delay map,
15:43which embeds your high dimensional data
15:45in some sort of lower dimensional space,
15:48that tries to capture the intrinsic dimension
15:50of the attractor that your dynamic process settles onto.
15:55And then you try to estimate an unknown density
15:57based on this delay map using a k-nearest neighbor
16:00kernel density estimate.
16:01The advantage of this sort of k-nearest neighbor
16:04kernel density is it dynamically adapts
16:06the width of the kernel given your sample density.
16:09And this has been implemented in some open source toolkits.
16:11These are the toolkits that we've been working with.
16:15So we've tested this on a couple of different models.
16:18And really, I'd say this work,
16:19this is still very much work in progress,
16:20this is work that Bowen was developing over the summer.
16:23And so we developed a couple of different models to test.
16:26The first were these linear auto-regressive networks,
16:29and we just used these
16:30to test the accuracy of the estimators,
16:32because everything here is Gaussian, so you can compute
16:34the necessary transfer entropies analytically.
16:37The next, more interesting class of networks
16:39are threshold linear networks, or TLNs.
16:42These are a firing rate model, where your rate, r,
16:44obeys this stochastic differential equation.
16:47So the rate of change in the rate, dr(t), is...
16:51So you have sort of a leaf term, -r(t), and then plus,
16:55here, this is essentially a coupling,
16:57all of this is inside here, the brackets with a plus,
17:00this is like a (indistinct) function,
17:02so this is just taking the positive part
17:04of what's on the inside.
17:05Here, b is an activation threshold,
17:08W is a wiring matrix, and then r are those rates again.
17:11And then C here, that's essentially covariants
17:13for some noise term perturbing this process.
17:17We use these TLNs to test the sensitivity
17:19of our transfer entropy estimators
17:21to common and private noise sources as you change C,
17:24as well as how well the transfer entropy network agrees
17:27with the wiring matrix.
17:31A particular class of TLNs that were really nice
17:33for these experiments are called
17:35combinatorial threshold linear networks.
17:37These are really pretty new,
17:38these were introduced by Carina Curto's lab this year.
17:42And really, this was inspired by a talk I'd seen her give
17:47at FACM in May.
17:49These are threshold linear networks
17:51where the weight matrix here, W,
17:52representing the wiring of the neurons,
17:55is determined by a directed graph G.
17:58So you start with some directed graph G,
18:00that's what's shown here on the left.
18:01This figure is adapted from Carina's paper,
18:03this is a very nice paper
18:04if you'd like to take a look at it.
18:07And if i and j are not connected,
18:10then the weight matrix is assigned one value,
18:12and if they are connected, then it's assigned another value.
18:14And the wiring is zero if i equals j.
18:18These networks are nice if we want to test
18:20structural hypotheses, because it's very easy to predict
18:24from the input graph how the output dynamics
18:27of the network should behave.
18:28They're a really beautiful analysis
18:30that Carina does in this paper to show
18:32that you can produce all these different
18:33interlocking patterns of limit cycles,
18:35and multi-step states, and chaos,
18:37and all these nice patterns,
18:38and you can design them
18:39by picking these nice directed graphs.
18:44The last class of networks that we've built to test
18:46are leaky-integrate and fire networks.
18:48So here, we're using a leaky integrate and fire model,
18:51where our wiring matrix W is drawn randomly,
18:54it's block-stochastic,
18:57which means that it's Erdos-Renyi between blocks.
19:00And it's a balanced network,
19:02so we have excitatory and inhibitory neurons
19:04that talk to each other and maintain a balance
19:08in the dynamics here.
19:09The hope is to pick a large enough scale network
19:11that we see properly chaotic dynamics
19:13using this leaky integrate and fire model.
19:17These tests have yielded fairly mixed results.
19:21So the simple tests behave as expected.
19:24So the estimators that are used are biased,
19:27and the bias typically decays slower
19:29than the variance estimation,
19:30which means that you do need fairly long trajectories
19:32to try to properly estimate the transfer entropy.
19:36That said, transfer entropy does correctly identify
19:38causal relationships in simple graphs,
19:40and transfer entropy matches the underlying structure
19:44used in combinatorial threshold linear networks, so CTLN.
19:49Unfortunately, these results did not carry over as cleanly
19:52to the leaky integrate and fire models,
19:54or to larger models.
19:56So what I'm showing you on the right here,
19:58this is a matrix where we've calculated
20:00the pairwise transfer entropy
20:02between all neurons in a 150 neuron balanced network.
20:06This is shown absolute, this is shown in the log scale.
20:09And the main thing I want to highlight, first,
20:11taking a look at this matrix,
20:12it's very hard to see exactly what the structure is.
20:15You see this banding?
20:17That's because neurons tend to be highly predictive
20:20if they fire a lot.
20:21So there's a strong correlation
20:22between the transfer entropy between x and y,
20:25and just the activity level of x.
20:29But it's hard to distinguish blockwise differences,
20:31for example, between inhibitory neurons, excitatory neurons,
20:34and that really takes plotting out,
20:36so here, this box and whisker plot on the bottom,
20:39this is showing you if we group entries of this matrix
20:43by type of connection.
20:44So maybe excitatory to excitatory,
20:45or inhibitor to excitatory, or so on,
20:48that the distribution of realized transfer entropy
20:50is really different,
20:52but they're different in sort of subtle ways.
20:54So in this larger scale balanced network,
20:58it's much less clear whether transfer entropy
21:02effectively is equated in some way
21:05with the true connectivity or wiring.
21:09In some ways, this is not a surprise,
21:10because the behavior of the balanced networks
21:12is inherently balanced,
21:13and Erdos-Renyi is inherently in the structure.
21:16But there are ways in which these experiments have revealed
21:19confounding factors that are conceptual factors
21:22that make transfer entropies not an ideal measure,
21:25or maybe not as ideal as it seems,
21:28given the start of this talk.
21:29So for example, suppose two trajectories X and Y
21:33are both strongly driven by a third trajectory Z,
21:36but X responds to Z first.
21:39Well, then the present information about X,
21:40or the present state of X,
21:41carries information about the future of Y,
21:43so X is predictive of Y.
21:45So X forecasts Y, so in the transfer entropy
21:47or Wiener causality setting, we would say X causes Y,
21:51even if X and Y are only both responding to Z.
21:54So here, in this example, suppose you have a directed tree
21:58where information or dynamics propagate down the tree.
22:02If you look at this node here, Pj and i,
22:07Pj will react to essentially information
22:11traveling down this tree before i does,
22:13so Pj would be predictive for i,
22:15so we would observe an effective connection,
22:19where Pj forecasts i.
22:21Which means that neurons that are not directly connected
22:23may influence each other, and that this transfer entropy,
22:26really, you should think of in terms of forecasting,
22:29not in terms of being a direct analog to the wiring matrix.
22:33One way around this is to condition
22:35on the state of the rest of the network
22:37before you start doing some averaging.
22:39This leads to some other notions of entropy,
22:41so, for example, causation entropy,
22:42and this is sort of a promising direction,
22:44but it's not a direction we've had time to explore yet.
22:47So that's the estimation side.
22:49Those are the tools for estimating transfer entropy.
22:52Now, let's switch gears
22:53and talk about that second question I introduced,
22:55which is essentially, how do we analyze structure.
22:57Suppose we could calculate a transfer entropy graph.
23:00How would we extract structural information from that graph?
23:04And here, I'm going to be introducing some tools
23:06that I've worked on for a while
23:08for describing random structures and graphs.
23:11These are tied back to some work I've really done
23:15as a graduate student, and conversations with Lek-Heng.
23:18So we start in a really simple context,
23:19we just have a graph or network.
23:21This could be directed or undirected,
23:23and we're gonna require that it does not have self-loops,
23:24and that it's finite.
23:26We'll let V here be the number of vertices,
23:28and E be the number of edges.
23:30Then the object of study that we'll introduce
23:33is something called an edge flow.
23:34An edge flow is essentially a function
23:35on the edges of the graph.
23:37So this is a function that accepts pairs of endpoints
23:40and returns a real number.
23:42And this is an alternating function,
23:43so if I take f(i, j), that's negative f(j, i),
23:47because you can think of f(i, j) as being some flow,
23:49like a flow of material between i and j,
23:52hence this name, edge flow.
23:54This is analogous to a vector field,
23:56because this is analogous to the structure of a vector field
23:58on the graph,
23:59and represents some sort of flow between nodes.
24:02Edge flows are really sort of generic things.
24:04So you can take this idea of an edge flow
24:07and apply it in a lot of different areas,
24:09because really all you need
24:10is you just need the structure of some alternating function
24:12on the edges of a graph.
24:13So I've read papers,
24:16and worked in a bunch of these different areas.
24:19Particularly, I've focused on applications of this
24:21in game theory, in pairwise and social choice settings,
24:25in biology and Markov chains.
24:26And a lot of this project has been attempting
24:28to take this experience working with edge flows in,
24:31for example, say, non-equilibrium thermodynamics,
24:34or looking at pairwise preference data,
24:36and looking at a different application area
24:38here to neuroscience.
24:40Really, you can you think about the edge flow,
24:42or relevant edge flow in neuroscience,
24:43you might be asking about asymmetries in wiring patterns,
24:46or differences in directed influence or causality,
24:49or, really, you can think about
24:50these transfer entropy quantities.
24:51This is why I was excited about transfer entropy.
24:53Transfer entropy is inherently directed notion
24:56of information flow, so it's natural to think that
24:59if you can calculate things like the transfer entropy,
25:01then really, what you're studying is some sort of edge flow
25:04on a graph.
25:06Edge flows often are subject to the same common questions.
25:10So if I want to analyze the structure of an edge flow,
25:12there's some really big global questions
25:14that I would often ask,
25:15that get asked in all these different application areas.
25:19One common question is,
25:20well, does the flow originate somewhere and end somewhere?
25:23Are there sources and sinks in the graph?
25:25Another is, does it circulate?
25:26And if it does circulate, on what scales, and where?
25:31If you have a network that's connected
25:33to a whole exterior network, for example,
25:35if you're looking at some small subsystem
25:37that's embedded in a much larger system,
25:38as is almost always the case in neuroscience,
25:41then you also need to think about
25:42what passes through the network.
25:43So, is there a flow or current
25:45that moves through the boundary of the network,
25:47and is there information that flows through
25:50the network that you're studying?
25:52And in particular, if we have these different types of flow,
25:55if flow can originate in source and end in sinks,
25:57if it can circulate, if it can pass through,
25:59can we decompose the flow into pieces that do each of these
26:03and ask how much of the flow does 1, 2, or 3?
26:07Those questions lead to a decomposition.
26:11So here, we're going to start with a simple idea.
26:13We're going to decompose an edge flow
26:15by projecting it onto orthogonal subspaces
26:17associated with some graph operators.
26:20Generically, if we consider two linear operators,
26:23A and B, where the product A times B equals zero,
26:27then the range of B must be contained
26:29in the null space of A,
26:31which means that I can express
26:33essentially any set of real numbers,
26:35so you can think of this as being
26:36the vector space of possible edge flows,
26:39as a direct sum of the range of B,
26:43the range of A transpose,
26:45and the intersection of the null space of B transpose
26:47and the null space of A.
26:48This blue subspace, this is called the harmonic space,
26:53and this is trivial in many applications
26:58if you choose A and B correctly.
27:00So there's often settings where you can pick A and B
27:02so that these two null spaces have no intersection,
27:06and then this decomposition boils down
27:08to just separating a vector space into the range of B
27:13and the range of A transpose.
27:16In the graph setting, our goal is essentially
27:18to pick these operators to be meaningful things,
27:20that is, to pick graph operators
27:22so that these subspaces carry a meaningful,
27:26or carry meaning in the structural context.
27:30So let's think a little bit about graph operators here.
27:33So, let's look at two different classes of operators.
27:35So we can consider matrices that have E rows and n columns,
27:40or matrices that have l rows and E columns,
27:43where again, E is the number of edges in this graph.
27:48If I have a matrix with E rows,
27:50then each column with a matrix has as many entries
27:53as there are edges in the graph,
27:55so it can be thought of as itself an edge flow.
27:57So you can think that this matrix
27:59is composed of a set of columns,
28:00where each column is some particular motivic flow,
28:03or flow motif.
28:05In contrast, if I look at a matrix where I have E columns,
28:09then each row of the matrix is a flow motif,
28:11so products against M evaluate inner products
28:16against specific flow motifs.
28:18That means in this context,
28:20if I look at the range of this matrix,
28:21this is really a linear combination
28:23of a specific subset of flow motifs,
28:25and in this context,
28:26if I look at the null space of the matrix,
28:28I'm looking at all edge flows orthogonal
28:30to that set of flow motifs.
28:32So here, if I look at the range of a matrix with E rows,
28:36that subspace is essentially modeling behavior
28:39similar to the motifs, so if I pick a set of motifs
28:42that flow out of a node, or flow into a node,
28:45then this range is going to be a subspace of edge flows
28:48that tend to originate in sources and end in sinks.
28:51In contrast, here, the null space of M,
28:54that's all edge flows orthogonal to the flow motifs,
28:57so it models behavior distinct from the motifs.
28:59Essentially, this space asks what doesn't the flow do,
29:02whereas this space asks what does the flow do.
29:07Here is a simple, very classical example.
29:09And this goes all the way back to, you can think,
29:11like Kirchhoff electric circuit theory.
29:14We can define two operators.
29:15Here, G, this is essentially a gradient operator.
29:18And if you've taken some graph theory, you might know this
29:20as the edge incidence matrix.
29:22This is the matrix which essentially records
29:25the endpoints of an edge,
29:26and evaluates differences across it.
29:29So for example, if I look at this first row of G,
29:33this corresponds to edge I in the graph,
29:35and if I had a function defined on the nodes in the graph,
29:39products with G would evaluate differences across this edge.
29:43If you look at its columns,
29:44each column here is a flow motif.
29:46So for example, this highlighted second column,
29:49this is entries 1, -1, 0, -1,
29:52if you carry those back to the edges,
29:53that corresponds to this specific flow motif.
29:56So here, this gradient, it's adjoint,
29:58so essentially a divergence operator,
30:00which means that the flow motifs are unit in flows
30:03or unit out flows for specific nodes,
30:05like what's shown here.
30:07You can also introduce something like a curl operator.
30:10The curl operator evaluates path sums around loops.
30:13So this row here, for example, this is a flow motif
30:16corresponding to the loop labeled A in this graph.
30:20You could certainly imagine other operators
30:22build other motifs.
30:23These operators are particularly nice,
30:25because they define principled subspaces.
30:28So if we apply that generic decomposition,
30:31then we could say that the space of possible edge flows, RE,
30:34can be decomposed into the range of the gradient operator,
30:37the range of the curl transpose,
30:39and the intersection of their null spaces,
30:42into this harmonic space.
30:44This is nice, because the range of the gradient,
30:46that's flows that start and end somewhere,
30:48those are flows that are associated
30:49with motion (indistinct) potential.
30:52So these, if you're thinking physics,
30:53you might say that these are conservative,
30:55these are flows generated by voltage
30:57if you're looking at an electric circuit.
30:59These cyclic flows, while these are the flows and range
31:01of the curl transpose, and then this harmonic space,
31:04those are flows that enter and leave the network
31:06without either starting or ending at a sink or a source,
31:10or circulating.
31:11So you can think that really,
31:12this decomposes the space of edge flows
31:14into flows that start and end somewhere inside the network,
31:17flows that circulate within the network,
31:19and flows that do neither,
31:20i.e. flows that enter and leave the network.
31:22So this accomplishes that initial decomposition
31:25I'd set out at the start.
31:28Once we have this decomposition,
31:29then we can evaluate the sizes of the components
31:33of the decomposition to measure how much of the flow
31:36starts and ends somewhere, how much circulates, and so on.
31:39So, we can introduce these generic measures,
31:41where given some operator M,
31:44we decompose the space of edge flows
31:46into the range of M and the null space of M transpose,
31:49which means we can project f onto these subspaces,
31:52and then evaluate the sizes of these components,
31:55and that's a way of measuring how much of the flow
31:58behaves like the flow motifs contained in this operator,
32:01and how much doesn't.
32:04So, yeah.
32:05So that lets us answer this question,
32:07and this is the tool that we're going to be using
32:09as our measurable.
32:12Now, that's totally easy to do,
32:16if you're given a fixed edge flow and a fixed graph.
32:17If you have a fixed graph, you can build your operators,
32:19you choose the motifs, you have fixed edge flow,
32:22you just project the edge flow onto the subspaces,
32:24span by those operators, and you're done.
32:27However, there are many cases where
32:30it's worth thinking about a distribution of edge flows,
32:33and then expected structures given that distribution.
32:37So here, we're going to be considering random edge flows,
32:39for example, an edge flow of capital F.
32:41Here, I'm using capital letters to denote random quantities
32:43sampled from an edge flow distribution.
32:45So this is the distribution of possible edge flows.
32:47And this is worth thinking about
32:48because many generative models are stochastic.
32:51They may involve some random seed,
32:53or they may, for example, like that neural model,
32:55or a lot of these sort of neural models, be chaotic,
32:58so even if they are deterministic generative models,
33:01the output data behaves as though it's been sampled
33:03from a distribution.
33:05On the empirical side, for example,
33:07when we're estimating transfer entropy,
33:09or estimating some information flow,
33:11then there's always some degree of measurement error,
33:13or uncertainty in the estimate,
33:15which really means that from a Bayesian perspective,
33:18we should be thinking that our estimator
33:21is a point estimate drawn from some posterior distribution
33:24of edge flows, and that we're back in the setting where,
33:25again, we need to talk about a distribution.
33:28Lastly, this random edge flow setting is also
33:31really important if we want to compare the null hypotheses.
33:35Because often, if you want to compare
33:37to some sort of null hypothesis,
33:38it's helpful to have an ensemble of edge flows
33:41to compare against,
33:43which means that we would like to be able to talk about
33:44expected structure under varying distributional assumptions.
33:50If we can talk meaningfully about random edge flows,
33:54then really what we can start doing
33:56is we can start bridging the expected structure
33:59back to the distribution.
34:00So what we're looking for
34:01is a way of explaining generic expectations
34:05of what the structure will look like
34:07as we vary this distribution of edge flows.
34:10You could think that a particular dynamical system
34:13generates a wiring pattern, that generates firing dynamics,
34:19those firing dynamics determine
34:21some sort of information flow graph,
34:23and then that information flow graph
34:25is really a sample from that generative model,
34:28and we would like to be able to talk about
34:30what would we expect
34:32if we knew the distribution of edge flows
34:34about the global structure.
34:35That is, we'd like to bridge global structure
34:37back to this distribution.
34:39And then, ideally, you'd bridge that distribution
34:41back to the generative mechanism.
34:42And this is a project for future work.
34:45Obviously, this is fairly ambitious.
34:47However, this first point is something you can do
34:51really in fairly explicit detail,
34:53and that's what I would like to spell out
34:54with the end of this talk,
34:55is how do you bridge global structure
34:58back to a distribution of edge flows.
35:02So yeah.
35:03So that's our main question.
35:05How does the choice of distribution
35:06influence the expected global flow structure?
35:12So first, let's start with a lemma.
35:15Suppose that we have a distribution of edge flows
35:17with some expectation f bar, and some covariance,
35:20here I'm using double bar V to denote covariance.
35:24We'll let S contained in the set of...
35:27S will be a subspace contained within
35:29the vector space of edge flows,
35:31and we'll let PS be the orthogonal projector onto S.
35:35Then FS, that's the projection of F onto this subspace S,
35:40the expectation of its norm squared
35:43is the norm of the expected flow projected onto S squared,
35:48so this is essentially the expectation of the sample
35:53is the measure evaluated with the expected sample.
35:56And then plus a term that involves an inner product
35:58between the projector on the subspace
36:00and the covariance matrix for the edge flows.
36:02Here, this denotes the matrix inner product,
36:04so is just the sum over all ij entries.
36:09What's nice about this formula is,
36:10at least in terms of expectation,
36:13it reduces this study of the bridge
36:17between distribution and network structure
36:20to a study of moments, right?
36:22Because we've replaced a distributional problem here
36:24with a linear algebra problem
36:27that's posed in terms of this projector,
36:29the projector out of the subspace S,
36:31which is determined by the topology of the network.
36:33And the variance in that edge flow,
36:36which is determined by your generative model.
36:40Well, you might say, "Okay, well, fine,
36:42this is a matrix inner product, we can just stop here.
36:44We could compute this projector.
36:45We could sample a whole bunch of edge flows
36:47to compute this covariance.
36:48So you can do this matrix inner product."
36:50But I'm sort of greedy, because I suspect
36:54that you can really do more with this inner product.
36:57So I'd like to highlight some challenges
37:00associated with this inner product.
37:03So first, let's say I asked you to design a distribution
37:06with tuneable global structure.
37:07So for example, I said I want you to
37:09pick a generative model,
37:10or design a distribution of edge flows,
37:12that when I sample edge flows from it,
37:14their expected structures match some expectation.
37:18It's not obvious how to do that given this formula.
37:22It's not obvious in particular, because these projectors,
37:24like the projector onto subspace S,
37:26typically depend in fairly non-trivial ways
37:29on the graph topology.
37:30So small changes in the graph topology
37:32can completely change this projector.
37:34In essence, it's hard to isolate topology from distribution.
37:37You could think that this inner product,
37:39if I think about it in terms of the ij entries,
37:43while easy to compute, is not easy to interpret,
37:47because i and j are somewhat arbitrary indexing.
37:49And obviously, really, the topology of the graph,
37:51it's not encoded in the indexing,
37:53it's encoded in the structure of these matrices.
37:56So in some ways, what we really need
37:57is a better basis for computing this inner product.
38:01In addition, computing this inner product
38:03just may not be empirically feasible,
38:05because it might not be feasible
38:07to estimate all these covariances.
38:08There's lots of settings where,
38:09if you have a random edge flow,
38:11it becomes very expensive to try to estimate
38:13all the covariances in this graph, or sorry,
38:15in this matrix, because this matrix has as many entries
38:19as there are pairs of edges in the graph.
38:22And typically, that number of edges grows fairly quickly
38:26in the number of nodes in the graph.
38:27So in the worst case, the size of these matrices
38:31goes not to the square of the number of nodes in the graph,
38:33but the number of nodes in the graph to the fourth,
38:35so this becomes very expensive very fast.
38:37Again, we could try to address this problem
38:41if we had a better basis for performing this inner product,
38:43because we might hope to be able to truncate
38:46somewhere in that basis,
38:47and use a lower dimensional representation.
38:50So, to build there,
38:52I'm going to show you a particular family of covariances.
38:55We're going to start with a very simple generative model.
38:58So let's suppose that each node of the graph
39:00is assigned some set of attributes,
39:02here, a random vector X, sampled from a...
39:04So you can think of trait space,
39:05a space of possible attributes.
39:07And these are sampled i.i.d.
39:09In addition, we'll assume there exists
39:11an alternating function f,
39:13which accepts pairs of attributes,
39:15and returns a real number.
39:17So this is something that I can evaluate on the endpoints
39:20of an edge, and return an edge flow value.
39:24In this setting,
39:26everything that I've shown you before simplifies.
39:29So if my edge flow F is drawn
39:32by first sampling a set of attributes,
39:34and then plugging those attributes into functions
39:36on the edges, then the mean edge flow is zero,
39:42so that f bar goes away,
39:44and the covariance reduces to this form.
39:46So you get a standard form,
39:47where the covariance and the edge flow
39:49is a function of two scalar quantities,
39:52that's sigma squared and rho,
39:53these are both statistics associated with this function
39:56and the distribution of traits.
39:59And then some matrices, so we have an identity matrix,
40:02and we have this gradient matrix showing up again.
40:05This is really nice, because when you plug it back in,
40:07we try to compute, say,
40:08the expected sizes of the components,
40:13this matrix inner product
40:15that I was complaining about before,
40:17this whole matrix inner product simplifies.
40:19So when you have a variance
40:21that's in this nice, simple, canonical form,
40:23then the expected overall size of the edge flow,
40:26that's just sigma squared, the expected size
40:29projected onto that conservative subspace,
40:32that breaks into this combination
40:35of the sigma squared and the rho,
40:37again, those are some simple statistics.
40:39And then V, E, L, and E, those are just
40:42essentially dimension counting on the network.
40:44So this is the number of vertices, the number of edges,
40:47and the number of loops, the number of loops,
40:48that's the number of edges
40:49minus the number of vertices plus one.
40:52And similarly, the expected cyclic size,
40:55or size of the cyclic component, reduces to,
40:57again, this scalar factor in terms of the statistics,
41:01and some dimension counting topology related quantities.
41:08So this is very nice,
41:09because this allows us to really separate
41:12the role of topology from the role of the generative model.
41:14The generative model determines sigma and rho,
41:17and topology determines these dimensions.
41:22It turns out that the same thing is true
41:26even if you don't sample the edge flow
41:29using this trait approach, but the graph is complete.
41:33So if your graph is complete,
41:34then no matter how you sample your edge flow,
41:37for any edge flow distribution,
41:38exactly the same formulas hold,
41:40you just replace those simple statistics
41:43with estimators for those statistics
41:45given your sampled flow.
41:47And this is sort of a striking result,
41:49because this says that this conclusion
41:51that was linked to some specific generative model
41:54with some very specific assumptions, right,
41:56we assumed it was i.i.d.,
41:57extends to all complete graphs,
41:59regardless of the actual distribution that we sampled from.
42:05Up until this point,
42:06this is kind of just an algebra miracle.
42:09And one of the things I'd like to do at the end of this talk
42:11is explain why this is true,
42:13and show how to generalize these results.
42:16So to build there,
42:17let's emphasize some of the advantages of this.
42:19So first, the advantages of the model,
42:22it's mechanistically plausible in certain settings,
42:24it cleanly separated the role of topology and distribution,
42:28and these coefficients that had to do with topology,
42:30these are just dimensions,
42:31these are non negative quantities,
42:34so it's easy to work out monotonic relationships
42:36between expected structure and simple statistics
42:40of the edge flow distribution.
42:44The fact that you can do that enables more general analysis.
42:47So I'm showing you on the right here,
42:48this is from a different application area.
42:51This was an experiment where we trained a set of agents
42:55to play a game using a genetic algorithm,
42:58and then we looked at the expected sizes
43:00of cyclic and acyclic components
43:02in a tournament among those agents.
43:05And you can actually predict these curves
43:08using this type of structure analysis,
43:10because it was possible to predict the dynamics
43:13of these simple statistics, this sigma and this rho.
43:18So this is a really powerful analytical tool,
43:20but it is limited to this particular model.
43:23In particular, it only models unstructured cycles,
43:26so if you look at the cyclic component
43:27generated by this model,
43:28it just looks like random noise that's been projected
43:31onto the range of the current transpose.
43:34It's limited to correlations on adjacent edges,
43:36so we only generate correlations
43:38on edges that share an endpoint, because you could think
43:40that all of the original random information
43:42comes from the endpoints.
43:44And then, in some ways, it's not general enough.
43:47So it lacks an expressivity.
43:48We can't parameterize all possible expected structures
43:51by picking sigma and rho.
43:54And we lack some notion of sufficiency,
43:56i.e. if the graph is not complete,
43:58then this nice algebraic property,
44:01that it actually didn't matter what the distribution was,
44:03this fails to hold.
44:04So if the graph is not complete,
44:06then projection onto the family of covariances
44:09parameterized in this fashion
44:11changes the expected global structure.
44:15So we would like to address these limitations.
44:17And so our goal for the next part of this talk
44:19is to really generalize these results.
44:21To generalize,
44:22we're going to switch our perspective a little bit.
44:25So I'll recall this formula,
44:27that if we generate our edge flow
44:30by sampling quantities on the endpoints,
44:32and then plugging them into functions on the edges,
44:34then you necessarily get a covariance
44:35that's in this two parameter family,
44:37where I have two scalar quantities
44:39associated with the statistics of the edge flow,
44:41that's this sigma and this rho,
44:42and then I have some matrices
44:43that are associated with the topology of the network
44:45in the subspaces I'm projecting onto.
44:48These are related to a different way
44:51of looking at the graph.
44:52So I can start with my original graph,
44:54and then I can convert it to an edge graph,
44:57where I have one node per edge in the graph,
45:00and nodes are connected if they share an endpoint.
45:04You can then assign essentially signs to these edges
45:07based on whether the edge direction
45:10chosen in the original graph is consistent or inconsistent
45:14at the node that links two edges.
45:16So for example, edges 1 and 2 both point in to this node,
45:20so there's an edge linking 1 and 2
45:22in the edge graph with a positive sign.
45:25This essentially tells you
45:25that the influence of random information
45:30assigned on this node linking 1 and 2
45:33would positively correlate the sample edge flow
45:36on edges 1 and 2.
45:38Then, this form, what this form for covariance matrices says
45:43is that we're looking at families of edge flows
45:46that have correlations on edges sharing an endpoint,
45:49so edges at distance one in this edge graph,
45:51and non-adjacent edges
45:52are entirely independent of each other.
45:56Okay.
45:58So that's essentially what the trait-performance model
46:00is doing, is it's parameterizing
46:02a family of covariance matrices,
46:04where we're modeling correlations at distance one,
46:06but not further in the edge graph.
46:08So then the natural thought
46:09for how to generalize these results is to ask,
46:11"Can we model longer distance correlations to this graph?"
46:15To do so, let's think a little bit
46:17about what this matrix
46:19that's showing up inside the covariances,
46:21so we have a gradient times a gradient transpose.
46:24This is in effect a Laplacian for that edge graph.
46:30And you can do this for other motifs.
46:32If you think about different motif constructions,
46:35essentially if you take a product of M transpose times M,
46:38that will generate something that looks like a Laplacian
46:41or an adjacency matrix for a graph
46:44where I'm assigning nodes to be motifs,
46:47and looking at the overlap of motifs.
46:50And if I look at M times M transpose,
46:52and I'm looking at the overlap of edges via shared motifs.
46:55So these operators you can think about as being Laplacians
46:57for some sort of graph
46:59that's generated from the original graph motifs.
47:04Like any adjacency matrix,
47:06powers of something like G, G transpose minus 2I,
47:11that would model connections along longer paths,
47:14along longer distances in these graphs
47:16associated with motifs, in this case, with the edge graph.
47:20So our thought is, maybe, well,
47:21we could extend this trait performance
47:23family of covariance matrices
47:25by instead of only looking at
47:27a linear combination of an identity matrix and this matrix,
47:31we could look at a power series.
47:32So we could consider combining powers of this matrix.
47:37And this would generate this family of matrices
47:39that are parameterized by some set of
47:41coefficients (indistinct)... <v ->Dr. Strang.</v>
47:43I apologize, I just wanted to remind you
47:46that we have a rather tight time limit,
47:48approximately a couple of minutes.
47:50<v ->Yes, of course.</v>
47:52So here, the idea is to parameterize this family of matrices
47:57by introducing a set of polynomials with coefficients alpha,
48:00and then plugging into the polynomial
48:03the Laplacian that's generated by...
48:06The adjacency matrix generated by the graph motifs
48:09we're interested in.
48:11And that trait performance result,
48:12that was really just looking at the first order case here,
48:14that was looking at a linear polynomial
48:17with these chosen coefficients.
48:20This power series model is really nice analytically.
48:24So if we start with some graph operator M,
48:28and we consider the family of covariance matrices
48:31generated by plugging M, M transpose into some
48:34polynomial and power series,
48:36then this family of matrices
48:39is contained within the span of powers of M, M transpose.
48:45You can talk about this family in terms of combinatorics.
48:48So, for example, if we use that gradient
48:50times gradient transpose minus twice the identity,
48:52then powers of this is essentially, again, path counting,
48:55so this is counting paths of length n.
48:58You can also look at things like the trace of these powers.
49:00So if you look at the trace series,
49:02that's the sequence where you look at the trace of powers
49:05of these, essentially, these adjacency matrices.
49:09This is doing some sort of loop count,
49:11where we're counting loops of different length.
49:14And you can think of this trace series, in some sense,
49:15as controlling amplification of self-correlations
49:19within the sampled edge flow.
49:22Depending on the generative model,
49:23we might want to use different operators
49:25for generating these families.
49:26So for example, going back to that synaptic plasticity model
49:29with coupled oscillators, in this case, using the gradient
49:33to generate the family of covariance matrices
49:35is not really the right structure,
49:37because the dynamics of the model
49:40have these natural cyclic connections.
49:43So it's better to build the power series using the curl.
49:46So depending on your model,
49:47you can adapt this power series family
49:49by plugging in a different graph operator.
49:53Let's see now what happens if we try to compute
49:55the expected sizes of some components
49:58using a power series of this form.
50:00So, if the variance, or covariance matrix for edge flow
50:04is a power series in, for example,
50:06the gradient, gradient transpose,
50:08then the expected sizes of the measures
50:12can all be expressed as linear combinations
50:14of this trace series
50:16and the coefficients of the original polynomial.
50:19For example, the expected cyclic size of the flow
50:21is just the polynomial evaluated at negative two,
50:24multiplied by the number of loops in the graph.
50:26And this, this really generalizes
50:28that trait performance result,
50:29because the trait performance result
50:30is given by restricting these polynomials to be linear.
50:36This, you can extend to other bases.
50:41But really, what this accomplishes
50:43is by generalizing trait performance,
50:45we achieve this generic properties that it failed to have.
50:52So in particular, if I have an edge flow subspace S
50:56spanned by the flow motifs stored in some operator M,
50:59then this power series family of covariances
51:01associated with the Laplacian, that is, M times M transpose,
51:05is both expressive, in the sense that
51:08for any non negative a and b,
51:11I can pick some alpha and beta
51:13so that the expected size
51:15of the projection of F onto the subspaces a,
51:18and the projected size of F on the subspace orthogonal to S
51:22is b for any covariance in this power series family.
51:27And it's sufficient in the sense that
51:30for any edge flow distribution with mean zero,
51:32and covariance V,
51:35if C is the matrix nearest to V in Frobenius norm,
51:38restricted to the power series family,
51:40then these inner products computed in terms of C
51:44are exactly the same as inner products
51:46computed in terms of V,
51:47so they directly predict the structure,
51:49which means that if I use this power series family,
51:51discrepancies off of this family
51:54don't change the expected structure.
51:57Okay.
51:57So, I know I'm short on time here,
51:59so I'd like to skip, then, just to the end of this talk.
52:03There's further things you can do with this,
52:04this is sort of really nice mathematically.
52:07You can build an approximation theory out of this,
52:10and study it for different random graph families,
52:12how many terms in these power series you need.
52:15And those terms define
52:16some nicer simple minimal set of statistics,
52:19to try to estimate structure.
52:22But I'd like to really just get to the end here,
52:25and emphasize the takeaways from this talk.
52:28So the first half of this talk
52:30was focused on information flow.
52:32What we saw is that information flow is a non-trivial,
52:35but well studied estimation problem.
52:37And this is something that, at least on my side,
52:38is a work in progress with students.
52:41Here, the, in some ways,
52:42the conclusion of that first half
52:43would be that causation entropy
52:45may be a more appropriate measure than TE
52:47when trying to build these flow graphs
52:49to apply these structural measures to.
52:51Then, on the structural side, we can say that
52:54power series families,
52:55this is a nice family of covariance matrices.
52:57It has nice properties that are useful empirically,
52:59because they let us build global correlation structures
53:02from a sequence of local correlations
53:03from that power series.
53:06If you plug this back into the expected measures,
53:08you can recover monotonic relations,
53:10like in that limited trait performance case.
53:12And truncation of these power series
53:14reduces the number of quantities
53:16that you would actually need to measure.
53:19Actually, to a number of quantities that can be quite small
53:21relative to the graph,
53:22and that's where this approximation theory comes in.
53:25One way, maybe to sort of summarize this entire approach,
53:28is what we've done is by looking at these power series
53:31built in terms of the graph operators,
53:33is it provides a way to study
53:35inherently heterogeneous connections, or covariances,
53:39or edge flows distributions,
53:41using a homogeneous correlation model
53:43that's built at multiple scales by starting with local scale
53:46and then looking at powers.
53:49In some ways, this is a common...
53:50I ended a previous version of this talk with,
53:53I still think that this structural analysis is,
53:55in some ways, a hammer seeking a nail,
53:57and that this information flow construction,
53:59this is work in progress to try and build that nail.
54:02So thank you all for your attention.
54:04I'll turn it now over to questions.
54:07<v Instructor>(indistinct)</v>
54:09Thank you so much for your talk.
54:11Really appreciate it.
54:15For those of you on Zoom,
54:16you're welcome to keep up the conversations,
54:17but unfortunately we have to clear the room,
54:20so I do apologize.
54:21But, (indistinct).
54:25Dr. Strang?
54:26Am I muted?
54:30Dr. Strang?
54:32<v ->Oh, yes, yeah.</v>
54:33<v Instructor>Okay, do you mind if people...</v>
54:35We have to clear the room, do you mind if people
54:37email you if they have questions?
54:40<v ->I'm sorry, I couldn't hear the end of the question.</v>
54:42Do I mind if...
54:45<v Instructor>We have to clear the room,</v>
54:47do you mind if people email you if they have questions,
54:49and (indistinct)... <v ->No, no, not at all.</v>
54:52<v Instructor>So I do apologize, they are literally</v>
54:54(indistinct) the room right now.
54:57<v ->Okay, no, yeah, that's totally fine.</v>
54:59<v Instructor>Thank you.</v>
55:01And thanks again for a wonderful talk.
55:03<v ->Thank you.</v>