# YSPH Biostatistics Seminar: "Exploring Space and Time for Identifying Gene Interactions Using Single-cell Transcriptomics"

October 05, 2021## Information

Atul Deshpande, PhD, Postdoctoral Researcher, Division of Biostatistics and Bioinformatics, Johns Hopkins University

October 5, 2021

ID6959

To CiteDCA Citation Guide

- 00:00<v ->Today it is my honor to introduce,</v>
- 00:02Dr. Atul Deshpande.
- 00:04Dr. Deshpande is a postdoctoral researcher
- 00:07in the lab of Dr. Elana Fertig
- 00:09in the department of oncology,
- 00:11at Johns Hopkins University.
- 00:13He has a PhD in electrical engineering
- 00:15from the University of Wisconsin-Madison,
- 00:17and his interests include
- 00:18the use of time series analysis
- 00:20and spatial statistics
- 00:21for modeling biological processes.
- 00:24He's currently developing analysis techniques
- 00:26to use single cell and spacial multigenomics
- 00:28for the characterization of
- 00:30the tumor microenvironment
- 00:32and intracellular signaling networks.
- 00:34Welcome. (students applause)
- 00:40<v ->Well, thank you so much.</v>
- 00:41And once I figure out my...
- 00:48Where my PowerPoint window is,
- 00:49we can start in earnest.
- 00:52Okay, yeah, thank you for the kind introduction.
- 00:55So, I'm Atul Deshpande,
- 00:57and today the title of my talk is exploring time
- 01:01and space for identifying gene interactions
- 01:04using single cell transcriptomics.
- 01:07So, what do time and space mean
- 01:10in the context of this talk?
- 01:13So, they refer to recent technological advances
- 01:15and the algorithms, which are the foundation
- 01:17for the projects I will be talking about.
- 01:20And the first advance is the ability
- 01:24to measure gene expression in individual cells.
- 01:27This in turn inspired development
- 01:29of algorithms that ordered these cells along
- 01:32the biological trajectory.
- 01:34Using these algorithms, we can observe changes
- 01:37in gene expression in
- 01:39a pseudo temporal reference for pseudo time,
- 01:43which is a measure of the progress
- 01:45of the biological process.
- 01:48The second is a more recent ability
- 01:50to measure gene expression
- 01:52within the spatial context of the tissue.
- 01:54But this we can analyze changes
- 01:56in gene expression
- 01:58as cellular neighborhoods change,
- 02:00or as the tissue type changes.
- 02:07So, before single cell transcriptomics,
- 02:10we would usually get one measurement
- 02:12of gene expression from a collected sample.
- 02:15And this is now called
- 02:18bulk RNA-seq in retroactively.
- 02:22However, as this measurement would just be
- 02:26an average of the population of cells
- 02:27in the sample, and it would obscure information
- 02:31about the different cell types, or different
- 02:34cell states in the population.
- 02:36With single-cell RNA-seq,
- 02:37we can now measure gene expression
- 02:40in individual cells.
- 02:41Depending on technology, this can range
- 02:43from a few hundred cells up to hundreds
- 02:46of thousands of cells.
- 02:49And this allows us to observe
- 02:51the full heterogeneity of the cell population
- 02:56represented by gene expression.
- 02:59And using this high dimensional data
- 03:02that we now have,
- 03:03we can characterize different cell types
- 03:05and cell states as gene expression vectors.
- 03:12So, one drawback of this technique
- 03:14is the issue of technical dropouts.
- 03:17Now, this is characterized by observing,
- 03:21as in us observing a lot
- 03:23of false zeroes, or zero inflated measurements,
- 03:26because we are unable to reliably measure
- 03:29the low iron accounts in individual cells.
- 03:35Now, the first project
- 03:39that I will discuss uses
- 03:43a single cell RNA-seq technology,
- 03:46or as it's downstream of that.
- 03:49And it uses also downstream of algorithms,
- 03:53which order single cell data into trajectories,
- 03:58which represent the biology
- 04:01that they might be studying.
- 04:02For example, let's say if you are...
- 04:04You have a dataset, which corresponds
- 04:09to stem cell differentiation,
- 04:11there are probably now 70 different
- 04:15trajectory inference methods depending on what
- 04:17kind of datasets you are studying,
- 04:21what biology you want to study,
- 04:23how big the dataset is,
- 04:25or what the expected trajectory is
- 04:28of the biology that you're studying maybe.
- 04:30And they attempt to order these cells based
- 04:34on the expression of potentially
- 04:37a few key marker genes, or how, which genes
- 04:40are differentially expressed along
- 04:43the biological process.
- 04:46So, anytime you collect,
- 04:48let's say a single cell RNA-seq data,
- 04:51you would find a mix of cells,
- 04:54and that was the entire motivation
- 04:56for doing this.
- 04:57But that mix of cells would have
- 05:01a range of cell states,
- 05:03which could correspond to
- 05:07from the beginning of the biological process,
- 05:09to the very end of the biological process.
- 05:12And what these algorithms are trying to do
- 05:14is they're trying to fit these cells
- 05:18in their right place, in the biological process.
- 05:23And once we do that, we can actually observe
- 05:25the gene expression along this ordering.
- 05:30And a lot of these methods also assign
- 05:34a pseudo time to each cell,
- 05:35which tells you how far along in the biology
- 05:39they think, or they hypothesize that the cell is.
- 05:43And so, the question that we wanted
- 05:45to ask is given this pseudo temporal ordering
- 05:50of the cells, which gives us
- 05:54a gene expression dynamics
- 05:55in the pseudo temporal reference.
- 05:58Can we use these dynamics
- 06:03to infer gene regulatory networks?
- 06:07Or any directed networks from say,
- 06:10sets of genes to their targets.
- 06:14And the second question
- 06:14was whether the assigned pseudo time values help
- 06:19us in the network inference task.
- 06:25So, to make the, I guess,
- 06:31explanation more approachable,
- 06:34I will just use an example dataset.
- 06:37And as I explained, the concepts I've...
- 06:41We will just see what that means
- 06:44in terms of this dataset.
- 06:45So, this is a dataset from Semrau et al,
- 06:50and this is a single cell data
- 06:53from retinoic acid, driven differentiation.
- 06:57And in this mouse, embryonic stem cells
- 07:01differentiate into neuroectoderm
- 07:02and extraembryonic endoderm cells.
- 07:06Now the data as collected had nine samples,
- 07:10one before the differentiation starts
- 07:12and one after every six hours.
- 07:15So, you have data collected over 96 hours
- 07:19from nine samples, and each sample has 384 cells.
- 07:24So overall, I believe we have something
- 07:27like you can do the math.
- 07:29I guess, 2,600 cells or something like that.
- 07:33So, we chose to apply
- 07:37two trajectory inference methods to this.
- 07:39So, the first one is monocle 2,
- 07:42which is also called Monocle DDR tree, I believe.
- 07:45And the second one is PAGA Tree.
- 07:47So, both of these methods identify
- 07:50a bifurcating trajectory from these cells.
- 07:53And so, the first one is to the left
- 07:56where the embryonic stem cells are actually
- 08:01on the right of...
- 08:03I'm not sure if people can see my mouse pointer,
- 08:07but yeah, they're on the right of the trajectory.
- 08:09And then, towards the bottom left,
- 08:14you go into a neuroectoderm state
- 08:16and towards the...
- 08:19Right, top left, you go into an endoderm state.
- 08:24And on the right side, the way PAGA Tree
- 08:27infers trajectory is you have
- 08:30the embryonic stem cells on the top left.
- 08:33And then, it identifies
- 08:35a few more branches than Monocle does.
- 08:39But both of these
- 08:40identify branching trajectories.
- 08:43And in each case we selected
- 08:48the two branches,
- 08:49which corresponded to markers, which were,
- 08:54which ended up being high for neuroectoderm.
- 08:56So, the trajectories, the sub trajectories
- 09:00from each method that we've wanted to study
- 09:03was the embryonic stem cells to neuroectoderm,
- 09:08using these two methods.
- 09:11So, this as in, so we had...
- 09:14We have these two trajectory inference methods,
- 09:16which assigned their own pseudo times,
- 09:18and this is the pseudo temporal expression
- 09:23dynamics for the same gene.
- 09:25I did not mark which gene it was, but yeah,
- 09:29so this was for the same gene.
- 09:30And you can see that the dynamics
- 09:33that each of these trajectories gives
- 09:35us is different.
- 09:37First of all, the main branch,
- 09:40or sub part of the trajectory that
- 09:42we are considering has
- 09:44a different number of cells.
- 09:46And these cells may not necessarily be common
- 09:48to both end.
- 09:49There will be some which are common
- 09:50to both of these trajectories,
- 09:52but some others which are completely different.
- 09:54But also, that the cell ordering itself
- 09:57that each method based on whatever mathematics
- 10:01they use, or whatever algorithms they use,
- 10:05would differ between these two methods.
- 10:08So, as you see, Monocle has a higher expression
- 10:12much earlier in the pseudo time,
- 10:14as opposed to PAGA Tree, which has much later.
- 10:18And the pseudo times here,
- 10:20were not exactly 100, they're just nominalized
- 10:22to 100 just represent progress from 0%
- 10:25of the biology to 100% of the biology,
- 10:31or as inferred by that method.
- 10:34So, now what are the challenges associated
- 10:37with order single-cell data?
- 10:39So, the first one is that unlike say,
- 10:43stock data, or say weather data,
- 10:47or something like that, you don't necessarily
- 10:49have a uniform distribution of cells.
- 10:54And if you're going to do a time series analysis,
- 10:56that would mean that you do not
- 10:57have regularly spaced time series,
- 11:00but you actually
- 11:01have irregularly space time series.
- 11:03On top of that, the pseudo time values
- 11:05that are assigned to the cells
- 11:07and ordering stem cells is uncertain.
- 11:13Now, finally, we recall that we had the issue
- 11:17of zero inflated measurements,
- 11:19or false zeroes in the meter
- 11:21because of technical dropouts.
- 11:26So, the question is how to overcome all
- 11:29of these drawbacks
- 11:32to try and find
- 11:36networks from this time series data.
- 11:40So, the project that we had,
- 11:43it resulted in basically
- 11:45an algorithm called SINGE,
- 11:46which is single cell inference
- 11:48of networks from Granger ensembles.
- 11:50So, this was done at the Morgridge Institute
- 11:53for Research in Madison, Wisconsin.
- 11:55And these are my collaborators on this project.
- 12:01And let's see, okay.
- 12:03So, the main concept that we build on
- 12:06is basically the Granger causality test.
- 12:08It was introduced by Clive Granger in 1960s.
- 12:14And to give a very simple example
- 12:16of what it's trying to say is, let's say
- 12:17if you have two times series X and Y,
- 12:22now Granger causality tests, whether
- 12:26the prediction of current values of Y
- 12:28improves by using past values of X,
- 12:31in addition to past values of Y.
- 12:34And if that happens, then we say
- 12:36that X Granger causes Y.
- 12:38So, this is basically a lag regression
- 12:41between X and Y.
- 12:42So, this has had applications
- 12:44in econometrics and finance,
- 12:46and is also being used
- 12:47in computational neuroscience and biology,
- 12:51as noted in these examples here.
- 12:55Now, the multivariate Granger causality test
- 12:58can be thought of as setting up and solving
- 13:00a vector, or regression model,
- 13:02where you have say, P genes, T time points
- 13:05and L lags.
- 13:06Where L lags is telling you how many,
- 13:11say your relationships with the past expressions
- 13:14you're trying to model.
- 13:16And once you have that,
- 13:17you could think of solving this way,
- 13:21our model by just minimizing
- 13:24this objective function here.
- 13:27And that would give you, I guess,
- 13:28a few edges between the past values
- 13:31of all of the genes and your target gene.
- 13:34Okay, maybe I should have explained
- 13:36this figure first.
- 13:37So, you have all the regular,
- 13:39all the possible regulators of a gene,
- 13:42and then you have a target gene,
- 13:43and you're trying to identify
- 13:46what explains what past values
- 13:49of any of these genes explains
- 13:51the current values of the target gene.
- 13:55And if you wanted to have
- 13:59a sparse representation of this network,
- 14:02or have an...
- 14:03Count only a few of the edges,
- 14:05you would introduce this by CT parameter,
- 14:08which would ensure that the edges from say,
- 14:12all of these genes to your target
- 14:15are not numerous.
- 14:16And you can explain the biology in a few edges.
- 14:22Now, to counter the irregularity
- 14:27of the time series, we use
- 14:30an idea called Generalized Lasso Granger.
- 14:33So, what this does is,
- 14:36I'm not sure, maybe I have...
- 14:39Yeah, okay, so just to recall, right?
- 14:44So, you have a pseudo temporal data,
- 14:46which has irregular time series,
- 14:48and you have missing values,
- 14:51which show up as zeros here, right?
- 14:54So, we want to adapt the Lasso Granger test
- 15:00for irregular time series.
- 15:02So, what was previously,
- 15:05basically coefficients from older samples
- 15:07in regular time series,
- 15:09now becomes coefficients from just timestamps
- 15:15in the past.
- 15:16Because you might not necessarily have
- 15:18a sample at that point.
- 15:21Furthermore, we can rethink basically,
- 15:28the object to function as originally,
- 15:33if it was a dot predict between
- 15:35the coefficients and the values
- 15:38of the gene expression,
- 15:41we rethink that as a weighted dot predict,
- 15:45where basically we...
- 15:48And this is the description
- 15:49of the weighted dot predict, where you use
- 15:51a Gaussian kernel to weight the inputs
- 15:56pseudo product based on their proximity
- 15:59to the timestamps that you...
- 16:02That correspond to these coefficients.
- 16:04So, these ellipses here show kernels,
- 16:08I guess, they represent kernels.
- 16:10They don't necessarily stop at these bandwidths,
- 16:12but they just keep going
- 16:13because they're ghosting kernels.
- 16:16But these just represent the kernels,
- 16:18where basically, if you have
- 16:20a timestamp corresponding to coefficient
- 16:22and you have no sample at that timestamp,
- 16:25that doesn't necessarily mean
- 16:26that the input to the gene predict it is zero.
- 16:31So, basically what you would do is
- 16:33you would just look at a bin around
- 16:36that timestamp, and weight input from regulators,
- 16:42depending on their proximity to this timestamp.
- 16:46So, if the sample is exactly at
- 16:51the timestamp that you expect,
- 16:52you would rate it highly based
- 16:54on discussion kernel, and the farther
- 16:56you move away from the timestamp,
- 16:58the weaker the rate of
- 17:02that particular sample would be.
- 17:05So, what this helps us do
- 17:07is if there are say more than one cells
- 17:10in close proximity, it would take input
- 17:14from all of them.
- 17:15If there are no cells in the close proximity
- 17:18to at least take input from some cells,
- 17:20which are farther away, and so on.
- 17:25So, yeah, as in this works
- 17:27with irregular time series,
- 17:28because you don't necessarily have
- 17:30to expect samples in the past at the timestamps
- 17:34that you wanted them to.
- 17:36And yeah, I think we already discussed this.
- 17:40So, now, as in going back to the case for...
- 17:45So, we had these false zeroes, right?
- 17:48So now, because of this kernel method,
- 17:50we have an inherent imputation over missing data.
- 17:54So, now we get what we could think of as,
- 17:58instead of taking all of the zeros
- 18:00as they are at face value,
- 18:03we can treat them, or some of them
- 18:04as dropouts, as just missing data.
- 18:09And we just remove those samples now,
- 18:11because we can now work
- 18:13with irregular time series.
- 18:15And because of this kernel method,
- 18:17we can actually work with time signature,
- 18:19all uniquely irregular.
- 18:22We can work with...
- 18:24We can remove the zero valued samples
- 18:26and get a different, differently irregular
- 18:30time series for each of these genes.
- 18:33And so, such an action can probably
- 18:37be informed by imputation techniques like magic,
- 18:40which help you complete,
- 18:42or impute zeros in the dataset.
- 18:44So, instead of imputing the dataset,
- 18:46as you could just use its output
- 18:48to decide whether or not to remove the data from,
- 18:51or remove that zero from this input dataset.
- 18:58So, this is just an illustration
- 19:00of a single generalized Lasso Granger test.
- 19:04So, you have the POU5F1 gene, and it's basically,
- 19:08you see it's the cells corresponding
- 19:11to that, or other details expression
- 19:16along pseudo time.
- 19:18And what you also see is two trendlines
- 19:23predicted using a Lambda of 0.1,
- 19:27which is basically a sparsity constraint of 0.1.
- 19:29So, it would have fewer edges
- 19:32between the regulators and POU5F1.
- 19:36And then a Lambda of 0.02,
- 19:41which has far more regulators.
- 19:43And you can see that both of these predict
- 19:46the trends of POU5F1 when using
- 19:49the past values quite well.
- 19:54So, now that was just one GLG test.
- 19:59Now, what SINGE does, is it performs multiple
- 20:01such GLG tests where you sub-sample
- 20:04the time series different ways
- 20:07to get different irregulars time series again.
- 20:12And you also use diverse hyper-parameters
- 20:14to effectively using these two combinations,
- 20:17slice the cake multiple ways and trying
- 20:20to look at the data.
- 20:22So, the type of barometers
- 20:23that we use are Lambda, which determines
- 20:25the sparsity of the network that we get,
- 20:29or get into metrics that we get.
- 20:31And we have Delta T, which gives us
- 20:36a time resolution of the lags between say,
- 20:40the past regulators
- 20:41and the current target timestamps,
- 20:45and the number of likes that you have.
- 20:47So together, they will tell you how far behind
- 20:51in pseudo time should you be looking to try
- 20:54to predict the expression of the target.
- 20:57And finally, the kernel width,
- 20:59which tells how far, how wide the width should be
- 21:03around the timestamp that you are considering.
- 21:08Now, once we get
- 21:11adjacency matrices from all of these,
- 21:13we get, we considered them as partial networks,
- 21:17and we get ranked lists from each of them.
- 21:20And we aggregate these rank lists
- 21:22using a modified border count.
- 21:24So, border count is something
- 21:25which has been used in election.
- 21:29It's basically an election, I guess,
- 21:31result aggregating strategy,
- 21:34where if you have five candidates,
- 21:36you rank them from one to five,
- 21:39and then the person who has, I guess,
- 21:42the lowest number here over all
- 21:44of the people that voted,
- 21:47they would win the vote.
- 21:49So, the modified border width
- 21:52is basically the same concept,
- 21:53but the only change that we did
- 21:55was we wanted to place more weight
- 22:03to a ranking, which distinguishes
- 22:05between say a one, the first interaction
- 22:10we find with the 10th interaction we find.
- 22:12As opposed to say, the 10,000th interaction
- 22:15we find with the 10,010th interaction
- 22:18that we find.
- 22:19So, that's why the weighting before adding
- 22:23these border weights is one over N squared,
- 22:26as opposed to say, N here.
- 22:33So, yeah, once we aggregate this,
- 22:36we get a final rank list.
- 22:39And so, we had to do in for trajectories,
- 22:43we got gene dynamics from them,
- 22:46and now that results in two different networks.
- 22:49And there's just showing the top 100 edges
- 22:53from Monocle 2 and PAGA Tree.
- 22:55Now, you can obviously see
- 22:56that they look very different.
- 22:59Some of the edges I think, are common,
- 23:02but they can be very, very different.
- 23:05So, now the question is,
- 23:08which of these is right, or better?
- 23:12So, for that we would have
- 23:14to first think of, okay,
- 23:15how do we evaluate this?
- 23:17So, one way to evaluate that would be
- 23:20to do a precision recall evaluation.
- 23:24So, let's say we have this rank list
- 23:25of candidate gene interactions that we just got
- 23:28from SINGE and a gold standard,
- 23:31which knows the truth.
- 23:33As we go down this rank list,
- 23:34the precision metric tells us
- 23:37what fraction of the prediction
- 23:38so far have been correct.
- 23:40And the recall metric tells us
- 23:42how many of the total interactions
- 23:44in the gold standard, which were correct
- 23:46have so far been covered.
- 23:48So, the figure on the right shows
- 23:51a precision recall curve for two rank lists.
- 23:54The ideal precision recall curve
- 23:56would place all the edges in the gold standard
- 23:58at the top of the list.
- 23:59So, that's the dotted line that you see here,
- 24:04and the area under that precision
- 24:06we call curve (mumbles) blue one.
- 24:09A random list in expectation would be flat.
- 24:13So, and it would have a precision
- 24:15recall curve, and the area under
- 24:18that curve would be 0.5.
- 24:20and here, I guess, to make belief orderings.
- 24:27And in this example, we can see
- 24:29that the precision we call curve of A,
- 24:35which I guess, the predictor A is better
- 24:40because it starts off with having more ones,
- 24:45or as in a high precision, and then falls
- 24:48as opposed to B, which rises
- 24:50from a low precision.
- 24:51What it means that A gets more hits
- 24:55in the top of its list as opposed to B,
- 24:57and so on.
- 24:58And so, one way to also evaluate
- 25:01these position we call curves is to just look
- 25:03at the area under the curve, which is so A here
- 25:06is 0.7 and B's 0.52.
- 25:07And that tells us that on an average
- 25:10A ranks edges better as opposed to B.
- 25:16Now, we would like to use near this,
- 25:19and the question is what could we use as
- 25:22a gold standard?
- 25:24Now, this is real biological data
- 25:26that we are using, and for that,
- 25:29we would also need to look into
- 25:32the literature to find validation.
- 25:35So, one good source of information
- 25:37is the escape database curated
- 25:39by the Ma'ayan lab.
- 25:41And this database includes the results
- 25:44of loss of function and gain of experiments
- 25:47done on genes, and also
- 25:49and also ChIP-seq experiments,
- 25:50which identify binding sites
- 25:52of transcription factors.
- 25:54Now, the problem being that even this database
- 25:58is incomplete because the gaps
- 26:01in biological knowledge remain and doesn't,
- 26:04I guess over the time, over time,
- 26:06it would be completed, filled more and more.
- 26:09But when we were doing this evaluation,
- 26:12we had to deal with what was effectively
- 26:14a partial gold standard,
- 26:16or an incomplete gold standard.
- 26:18So, the evaluation that we did was not
- 26:20for all of the genes in the dataset,
- 26:23but only a fraction of the genes.
- 26:28So, we had these two methods
- 26:33and two pseudo times, which we got from that.
- 26:36So, what we wanted, what we did
- 26:38is we compared the performance of SINGE
- 26:43using say, Monocle 2 and the pseudo time,
- 26:46as well as Monocle 2 with only the ordering.
- 26:49And some of the least PAGA Tree
- 26:50fed the pseudo time and PAGA Tree
- 26:52with only the ordering.
- 26:54And so, this is how the precision recall curves
- 26:58of these four methods look.
- 27:01So, we look at the average precision,
- 27:04which is the same thing as the area under
- 27:06the precision recall curve.
- 27:08And we also look at the average precision
- 27:10in the early part of the precision recall curve.
- 27:14And the point for that being that,
- 27:18in say, a usual workflow,
- 27:21you would have a combination method,
- 27:24which would point to some important edges,
- 27:29and then, you would potentially tell
- 27:31a collaborator to try
- 27:34and experimentally validate that.
- 27:36And in that sense, you would be giving
- 27:38them results from the top of your list,
- 27:40as opposed to trying to tell how well
- 27:43the 10,000th edge in the list
- 27:45is placed in the rankings.
- 27:48So, with that in mind, we also look
- 27:50at what's the average early precision
- 27:53of these curves.
- 27:55And for that, we basically say what happened,
- 27:59as to what extent is the precision maintained
- 28:04until 10% of the genes
- 28:06and the gold standard are...
- 28:08Or interactions with the gold standard
- 28:10are regarded in the list that we have.
- 28:15So, the figure to the right shows
- 28:18a scatterplot of these, the average precision
- 28:20and the average early precision
- 28:21for these four methods, for these four options.
- 28:25And what we see is that the...
- 28:27The best performing combination
- 28:29is using Monocle's ordering,
- 28:31but not its pseudo time, and Monocle applying
- 28:36the pseudo time that it order,
- 28:38that it assigns to the cells,
- 28:41actually degrades the performance quite a bit.
- 28:46And both of the PAGA Tree options
- 28:49with, or without pseudo time,
- 28:51are in between these.
- 28:52So, now why would this happen?
- 28:56For example, and let's take
- 28:57an extreme case, right?
- 28:58And okay, before that, there's not necessarily
- 29:04something that's wrong with Monocle,
- 29:06but it's basically that for this dataset,
- 29:09in this instance, the pseudo time values
- 29:12did not necessarily make a lot of sense.
- 29:15So, let's say you have perfectly ordered cells.
- 29:17And for the first half of the cells,
- 29:19you just assign a value very close
- 29:22to zero and the second half,
- 29:23you assign a value very close to one.
- 29:26So, even though the ordering of the cells
- 29:27was quite nice and reliable, just because
- 29:31we ended up assigning a value
- 29:34to the pseudo times, often times,
- 29:36which is completely unrealistic.
- 29:38We might end up losing
- 29:41a lot of information
- 29:42that we otherwise had in the dataset,
- 29:44or in the ordering.
- 29:49So, yeah, as an extended,
- 29:53the ideas from this particular figure, right?
- 29:56So, you have two methods,
- 29:57they're giving you two different...
- 30:00Okay, two methods with their orderings
- 30:01and pseudo times, so basically four cases,
- 30:05and they all give you different rankings,
- 30:08which have different performances
- 30:12in terms of network evaluation.
- 30:14And in a sense, you could say
- 30:19that each of these PAGA Tree inference methods
- 30:22itself with all their inefficiencies
- 30:25and efficiencies are only partially looking
- 30:28at the biological data.
- 30:30So, from that perspective, each
- 30:34of these orderings and pseudo time values
- 30:37can be considered as sources
- 30:39of noisy information,
- 30:40or noisy sources of information.
- 30:42So, instead of trying to just infer
- 30:49one pseudo time trajectory from
- 30:52the dataset and finding the network,
- 30:55or say another, and finding
- 30:56the network from that, we could think
- 30:58of the trajectory inference method itself
- 31:01as an additional hyper parameter
- 31:03on top of the sparsity, and kernel bits,
- 31:06and so on.
- 31:08So, instead of aggregating at this point
- 31:10after just one trajectory inference method,
- 31:12we could just say that maybe
- 31:14we have four trajectory inference methods
- 31:19in the beginning.
- 31:20And after that, we do all
- 31:23of these sub sampling and application
- 31:25of hyper-parameters, and multiple tests.
- 31:28And then, we aggregate over all
- 31:29of these results across
- 31:31trajectory inference methods.
- 31:33So, hopefully what that would do
- 31:34is that would account for all the inefficiencies,
- 31:39or counter then inefficiencies
- 31:40of individual trajectory inference methods,
- 31:43and give us a more robust network at the end.
- 31:49And I have not, I guess, shown
- 31:52our comparisons for the other methods,
- 31:56which obviously isn't in our paper.
- 31:58We are doing better than them.
- 32:00So, but you can have a look at
- 32:03that in the paper if you're interested,
- 32:05because I just wanted to conceptually focus
- 32:08on these ideas a little bit more.
- 32:11So, I guess, one problem with trying
- 32:14to run four different, or five different
- 32:17trajectory inference methods is depending on
- 32:19what kind of data set you have
- 32:20and what kind of biology you are studying,
- 32:23you might not necessarily have
- 32:27to try only four methods.
- 32:29You will probably have
- 32:30to try multiple methods before,
- 32:32which let's say, if you know
- 32:34it's a branching trajectory,
- 32:35you end up seeing a branching trajectory.
- 32:38And each of these methods would have
- 32:41their own input data format,
- 32:44up data formats, visualizations,
- 32:49and all of these other intricacies.
- 32:52And that's where the dynverse project comes
- 32:55to our rescue.
- 32:56So, if anyone is looking to do
- 33:00a lot of trajectory inference methods,
- 33:01I would strongly encourage you to look at that.
- 33:04So, these in this project,
- 33:06they have streamlined the use of, I think,
- 33:1055 trajectory inference methods.
- 33:12So, you don't necessarily need to install
- 33:14each one of them.
- 33:15You just install this project
- 33:16and they run each
- 33:18of these methods using a docker.
- 33:21And so, what it also helps you do
- 33:23is it helps you visualize
- 33:26all of these trajectories and evaluate them using
- 33:31the same, I guess, support scripts
- 33:35and support functions, which they also provide.
- 33:38And in all this, this would make
- 33:42your lives quite easy.
- 33:44And they also have basically a user,
- 33:47a graphical user interface,
- 33:48which helps you prioritize
- 33:52what trajectory inference method to use,
- 33:55depending on what biology you want to study.
- 34:00How many cells you have, what compute power
- 34:02you might have access to, and so on.
- 34:12So, okay just some final comments on the use
- 34:17of, I guess, the utility of trajectory inference
- 34:19and pseudo times for further analysis.
- 34:22And so, first of all, as in trajectories
- 34:25look really nice, they visually,
- 34:28they give us a lot of information.
- 34:31And so, based on what we saw,
- 34:33we did see that there's some,
- 34:39the ordering information
- 34:41and the pseudo time values can help
- 34:43in network inference.
- 34:45The good pseudo times can help a little bit,
- 34:49but if you have exceptionally bad pseudo times,
- 34:51it can hurt a lot as opposed to ordering.
- 34:55And not every dataset is really suitable
- 34:59for trajectory inference.
- 35:00What do I mean by that?
- 35:01So, the dataset that I chose,
- 35:04and I guess a lot of what is...
- 35:08What particular inference methods
- 35:10are built around, as say,
- 35:11stem cell differentiation in general,
- 35:14where it's as in the biology is quite neat
- 35:19to begin with.
- 35:20As in you start off from a single cell type,
- 35:23and a lot of the biology is already known.
- 35:27So, you don't have to worry, you know
- 35:30that it's going to be a branching,
- 35:32or bifurcating, or multi furcating trajectory.
- 35:36So, you know that the quality of the biology,
- 35:38you know what cell states to exist, to expect,
- 35:43and so on, and so forth.
- 35:44You know the markers of each of those.
- 35:46And so, studying something like that
- 35:49is much more easier using trajectory inference,
- 35:53or pseudo time.
- 35:54On the other hand, let's say,
- 35:56if you had a sample from a cancer tumor
- 35:59in that you would find cancer cells,
- 36:02normal cells, a bunch of immune cells,
- 36:06probably 10 to 20 kinds of immune cells,
- 36:10and so on.
- 36:12So, the trajectory inference method
- 36:15usually tracks, or predicts places,
- 36:18cell states and context.
- 36:20Not cell types themselves.
- 36:23So, you wouldn't necessarily be able
- 36:25to reliably run a trajectory inference method
- 36:29across as in using a mix of different cell types,
- 36:33as opposed to cell states.
- 36:35Now, with the stem cell differentiation,
- 36:38the good thing is that the cell states
- 36:41themselves after a point, transition
- 36:43into different cell types,
- 36:45because it's the same cell,
- 36:47or same cell type which transitions
- 36:50through multiple cell types,
- 36:53through these cell states.
- 36:56But that's not the case with cancer biology,
- 36:58where you already start off
- 37:01with a mix of cell types and trajectory inference
- 37:06would not make sense for that mix.
- 37:08What people have tried is isolate,
- 37:11just say a T-cell type, and then try
- 37:16to order, or find the trajectory only
- 37:19for those T-cells.
- 37:21And there has been some success in that.
- 37:23So, you could run trajectory inference
- 37:27for a subset of the dataset, but not necessarily
- 37:30the entire dataset.
- 37:32And so, depending on what biological processes
- 37:38you want to study,
- 37:41there are trajectory inference methods,
- 37:43which may or may not be suitable for it.
- 37:45For example, a number of methods
- 37:47like Monocle and PAGA Tree,
- 37:51they try to find tree-like structures
- 37:56in the trajectories,
- 37:58so they would not be suitable
- 37:59for a cyclic biological process
- 38:03like just maintenance processes in cells.
- 38:07And then, there are other methods
- 38:08which actually try to find cell cycles,
- 38:11and they would not be appropriate
- 38:12for branching processes.
- 38:16And I guess, as a no single
- 38:19trajectory inference method,
- 38:23accurately represents the biology.
- 38:25So, it's all basically
- 38:27some mathematical abstraction
- 38:29of what might be happening in the cells.
- 38:35And yeah, as an if...
- 38:36If at the outset, you know
- 38:37what kind of trajectory to expect, then it helps
- 38:41in trying to
- 38:45at least first really,
- 38:46say whether the trajectory that you're getting
- 38:50and the pseudo times that you get
- 38:52is of any worth.
- 38:55So, just to give you an example.
- 38:58So, we started off with Monocle 2
- 39:00as one of our examples in our paper,
- 39:03and then we wanted to have another method
- 39:05to compare the effects of different
- 39:07trajectory inference methods.
- 39:10And PAGA Tree was not necessarily the first one.
- 39:13We tried a number of other ones,
- 39:14which did not.
- 39:16And we knew what to expect here.
- 39:18We knew that there was stem cell
- 39:21to ectoderm trajectory and endoderm trajectory,
- 39:26or a branch of that.
- 39:28And using basically, just the first,
- 39:35I think we tried four methods
- 39:38and PAGA Tree was basically the fourth method,
- 39:39which gave us that kind of branching trajectory,
- 39:42or branching topology for the biology.
- 39:45And so, none of the methods you try
- 39:49might necessarily mean anything,
- 39:53unless you have some way of validating that.
- 39:57So, at this point, I'm gonna switch
- 39:59to spatial expression,
- 40:04or a spatial data and special analysis.
- 40:06So, if you have any questions
- 40:08about the pseudo time analysis,
- 40:12should we take it now, or?
- 40:19<v Lecturer>Does anybody have any questions</v>
- 40:20on the first half of the presentation here?
- 40:26<v Dr. Deshpande>Oh, we can continue on,</v>
- 40:27then we can come back later.
- 40:34Shall we go on?
- 40:41<v Lecturer>Sounds good.</v>
- 40:42<v Dr. Deshpande>Okay.</v>
- 40:48Okay, so that was all about,
- 40:52say how pseudo time is used in our analysis.
- 40:57And so, the other end of,
- 41:03I guess, not necessarily end,
- 41:04the other perspective
- 41:05is how is space important and how,
- 41:10what kind of data do we have,
- 41:13which give us information about space?
- 41:16So, the spatial context of cells
- 41:18is very important in many biological processes.
- 41:22For example, when immune cells respond
- 41:24to an infection, or a wound, they need
- 41:27to be in physical proximity of their targets.
- 41:31Similarly with, I guess, cancer tumor growth,
- 41:34and the immune response to cancer
- 41:38happen through intracellular signaling.
- 41:40Either through cytokine secretion,
- 41:42or through surface receptors on adjacent cells.
- 41:48Just knowing the relative location
- 41:50of different cell types can also
- 41:52be very informative.
- 41:53For example, in the figure here,
- 41:57the information about the presence
- 41:58of various immune cell types nearest tumor,
- 42:02and the extent of immune deficient
- 42:04in the tumor are essential prognostic markers.
- 42:08And so, single cell RNA-seq,
- 42:13as good as it is, it associates a cell
- 42:15from its tissue, due to which
- 42:18we lose the spatial context of the cell states.
- 42:21But in recent years, we have been able
- 42:24to develop both
- 42:28as in spatial proteomics,
- 42:30which help you to image protein
- 42:35and densities of say, up to 30 markers
- 42:40at single cell resolution in the tissue.
- 42:43As well as spatial transcriptomics,
- 42:46which can measure 20,000 genes at spots
- 42:51in the tissue.
- 42:53And this was named method of the year last year
- 42:57in 2020, yeah, that was last year.
- 43:01So, here's just a workflow
- 43:05of the next Visium technology,
- 43:06which is one of these
- 43:07spatial transcriptomics technologies.
- 43:10So, this includes 5,000 barcoded spots on slide.
- 43:16And these are added to the cells in the...
- 43:21Which are located in those spots.
- 43:24And this helps preserve the spatial context
- 43:26of the cells to the actual sequencing.
- 43:30Now, this technology is not exactly single cell.
- 43:33It still provides a lot of useful spacial detail.
- 43:41So yeah, for explaining this project,
- 43:47I will use the 10x Visium sample,
- 43:51provided by 10x genomics
- 43:53of a breast cancer tissue.
- 43:55So, the figure on the left
- 43:57is an H and E slide, it's hematoxylin
- 44:01and eosin stain slide,
- 44:03which helps pathologists annotate
- 44:08the sample for tumor, and lesions, and so on.
- 44:14And the second image is that slide annotated
- 44:19by a pathologist, and you can see
- 44:22that there are different biology's
- 44:25in this one slide.
- 44:27And for example, the lesion on top
- 44:29is an invasive cancer lesion, which means
- 44:31that it can spread beyond the breast tissue,
- 44:33but the other lesions correspond
- 44:35to DCAs lesions,
- 44:36which are not yet classified as invasive,
- 44:39they could in the future be invasive.
- 44:42Other important annotations are those
- 44:43of immune cells and the stromal cells
- 44:47in between these lesions.
- 44:50For a good clinical outcome, you would hope
- 44:52that immune cells can infiltrate these lesions.
- 44:56And so the figure on the right shows
- 44:59the same H and E slide
- 45:02with overlaid Visium spots.
- 45:05So, each of these spots correspond
- 45:07to one measurement.
- 45:10So, this slide shows a couple of examples
- 45:14of spacial gene expression.
- 45:17So, the figure to the left
- 45:19is the same annotated H and E slide
- 45:21that will help us keep track
- 45:23of the biology in the slide.
- 45:27And so, the first figure, the middle figure,
- 45:30basically it shows the expression of CD8A,
- 45:33which is a marker of cytotoxic T-cells.
- 45:36Now, we see this gene expressed
- 45:37in the blood near the invasive and DCAs lesions,
- 45:42which means that the immune cells
- 45:44are responding to a tumor.
- 45:45However, we see that
- 45:47there's not much infiltration of these cells
- 45:49within the lesions.
- 45:51The second marker is CD14, which is found
- 45:54in macrophages and dendritic cells,
- 45:57and its expression is much higher
- 45:58inside the lesions, which could point
- 46:00to successful infiltration of these cell types.
- 46:04Now, just a reminder, these the measurements
- 46:08that we get from 10x Visium
- 46:10are not exactly single cell, but they're near,
- 46:13near single cell.
- 46:15In a sense that each of these spots
- 46:16is 55 micro meters wide.
- 46:19And depending on what cell type
- 46:22you might have in that spot,
- 46:23it could have anywhere from one to 10 cells.
- 46:27And immune cells are much smaller,
- 46:28so there could be up to 10 immune cells in it,
- 46:30but maybe only one cancer,
- 46:32or epithelial cell in that spot.
- 46:35So, as a result of gene expression
- 46:36of that spot is the average
- 46:38of the cells inside it.
- 46:42Now, our lab has a method called CoGAPS,
- 46:47oesophageal CoGAPS, which is a Bayesian
- 46:50Markov chain Monte Carlo method
- 46:51for nonnegative matrix factorization.
- 46:54And so, as a result of say,
- 46:58the 10x Visium measurement,
- 47:01we now have a high dimensional matrix
- 47:04with 20,000 genes and around 5,000 spots.
- 47:08And what CoGAPS does is it helps
- 47:12to factorize this matrix
- 47:15into two low rank matrices,
- 47:18both of which are non-negative,
- 47:21which correspond to latent patterns in the data.
- 47:25And in the past, we have seen
- 47:27that these two correspond to biology's
- 47:31based on the pattern markers.
- 47:34So, the two matrices that CoGAPS factorizes
- 47:38the dataset into are the amplitude matrix,
- 47:41which has say, 20,000 rows for 20,000 genes
- 47:44and N columns for the end patterns.
- 47:48And this helps us identify groups
- 47:50of co-expressed genes,
- 47:52which correspond to the patterns.
- 47:54And the pattern matrix has N rows
- 47:57and 5,000 columns, and they associate the spots
- 48:01on the sample with patterns.
- 48:04So, because of the nature of the CoGAPS,
- 48:08factorization, and these, the columns
- 48:11of the matrices here, or the rows of the matrices
- 48:14here are not really orthogonal.
- 48:15They are independent, but not orthogonal.
- 48:17So, they could co-exist in spots,
- 48:20or a gene could be present in multiple processes,
- 48:25and multiple patterns,
- 48:25which correspond to processes.
- 48:29So, when we apply CoGAPS to the Visium data,
- 48:37so the first try was basically
- 48:39just five patterns, and when we apply it
- 48:43to try and find five patterns
- 48:47after a factorization, we see that
- 48:50a number of them correspond
- 48:51to the pathology annotations
- 48:54that we see on the figure on the left.
- 48:57So, we find a pattern which corresponds
- 48:59to the immune cells.
- 49:01We find a pattern which corresponds
- 49:04to invasive carcinoma on the top left here.
- 49:07And we also find a pattern which corresponds
- 49:09to the DCAs lesions.
- 49:12And as we increase the dimensionality
- 49:15of CoGAPS factorization, we start seeing more
- 49:18and more tissue heterogeneity.
- 49:20For example, we now see three patterns
- 49:23which are associated with the mesial carcinoma,
- 49:26and we can see that they correspond
- 49:27to different regions in that lesion.
- 49:31And this for example is completely internal,
- 49:33which has no interaction with immune cells.
- 49:37We have a pattern which corresponds
- 49:39to immune cells, we have a pattern
- 49:41which corresponds to the stromal cells.
- 49:43And we also have different patterns
- 49:47which highlight individual DCAs lesions.
- 49:51So, one could say that potentially it's trying,
- 49:53it is finding biology's,
- 49:57which are unique to these DCAs lesions.
- 50:04So, we can analyze the A matrix
- 50:07to identify groups of genes associated
- 50:09with each pattern, and we call these
- 50:11the pattern markers.
- 50:12And these help us identify pathways
- 50:15that are likely expressed in these patterns,
- 50:18or because now, especially in this sample,
- 50:20we see a one to one association
- 50:23between the pattern and the biology,
- 50:25also in the biology, basically.
- 50:29So, let's see, how long do we have.
- 50:35I think we're close to...
- 50:38I'll quickly rush through these.
- 50:40So, the other analysis that we can do is given,
- 50:45let's say two of these patterns,
- 50:49we can try to see how these patterns interact.
- 50:52So, you can see that these patterns
- 50:54have a lot of spatial structure to it,
- 50:58which CoGAPS was not told about.
- 50:59CoGAPS, the parameters that Co-GAPS uses
- 51:01have no special information,
- 51:03and it's still found these spatial structures.
- 51:06So, and we also see that these patterns
- 51:08are adjacent to each other and we want
- 51:10to see how they interact.
- 51:11So, what we do is we find,
- 51:15basically we estimate the kernel density
- 51:18of each of these patterns, which is a function
- 51:22of both the pattern intensity at a spot,
- 51:25as well as the spatial clustering
- 51:28of hyper intensities.
- 51:30And we compare that against
- 51:32another distribution obtained by
- 51:35the density estimation after randomizing
- 51:37the locations of these pattern densities.
- 51:40So, the intensities which are beyond
- 51:43distal distribution are the ones that we...
- 51:48Are the spots which correspond
- 51:49to these outliers are the ones
- 51:51that we count as hotspots of pattern activity.
- 51:55Similarly, we can find the hotspots
- 51:57of immune response.
- 51:59And when we combine both of them,
- 52:02we find regions where cancer is active,
- 52:09regions where immune cells are active,
- 52:11and regions where both of them are active.
- 52:14And this is the interaction region.
- 52:16And in this region, we are trying
- 52:17to find genes which correspond
- 52:19to this interaction between cancer and immune,
- 52:25and which are not necessarily markers of...
- 52:27And regular markers of cancer and immune.
- 52:29So, genes which are specifically related
- 52:32to the non-linear interactions
- 52:34between these patterns.
- 52:38And to that end, basically we hypothesize
- 52:40that since CoGAPS is already
- 52:45an approximation of the dataset
- 52:47with a linear combination of the patterns,
- 52:50the residuals of CoGAPS,
- 52:51of the CoGAPS estimate from the dataset
- 52:55could point us to the non-linear interactions
- 52:58between the patterns.
- 53:01And we are only looking at the region
- 53:06where both of the patterns are active
- 53:09and comparing the residuals of CoGAPS
- 53:13in that region to the residuals
- 53:15in only the cancer region,
- 53:16and only the immune region.
- 53:18And now, this can be done for each of these,
- 53:24I guess, pattern combinations,
- 53:25and we can find what corresponds
- 53:29to pattern interaction
- 53:30between these pairs of patterns.
- 53:32So, for future work, as part
- 53:35of the data collection in clinical trials,
- 53:37we're already collecting both spacial
- 53:42and single cell transcriptomics
- 53:44and proteomics from patients.
- 53:47So, we are trying to integrate all
- 53:50of this into one big dataset,
- 53:54which would represent the tumor microenvironment,
- 53:59which would help us characterize
- 54:03the patient sample as a whole.
- 54:05And we would also like
- 54:07to infer intracellular signaling networks
- 54:10the same way as we were trying to do
- 54:12it using time, but now using space
- 54:14where intracellular signaling is a function
- 54:18of the distance between the cells
- 54:20and the types of neighboring cells
- 54:21for a target cell.
- 54:26And the learnings from these projects
- 54:27would go into a spatial temporal model
- 54:31of tumor growth and response to therapy,
- 54:33which can be used into building
- 54:36a digital patient or digital clone,
- 54:39where we can try to test what therapies
- 54:43might work on what patients.
- 54:48So, these are the people who have been,
- 54:50and of course, 10x Genomics,
- 54:52who were kind enough to give us the sample
- 54:54for studying, as well as my collaborators
- 54:59on this project.
- 55:01Thank you so much.
- 55:02And I can take questions now,
- 55:04sorry for the overshooting time.
- 55:10<v Lecturer>Thank you so much.</v>
- 55:11Do we have any questions to look at?
- 55:22People on Zoom? Yeah, question (mumbles).
- 55:25<v Female Student>Going back</v>
- 55:25to the time series slides.
- 55:28<v ->Mm-hmm.</v>
- 55:29<v Female Student>Can you talk</v>
- 55:29about how you know if you have good,
- 55:31or bad pseudo times?
- 55:32And is there a way to fix bad pseudo times?
- 55:35<v ->So, yeah, as in what I've not shared on here</v>
- 55:39is so, in our experiments,
- 55:43we also, we knew for example,
- 55:46that we were studying...
- 55:47We wanted to study a trajectory which goes
- 55:49from stem cells to neuroectoderm,
- 55:56and we had markers.
- 55:57And I think, some (mumbles) themselves.
- 56:01They have identified markers
- 56:03of stem cells neuroectoderms and endoderm cells.
- 56:09So, if we're looking at the trajectories
- 56:10of the markers along the pseudo time
- 56:13to see if those make sense.
- 56:15For example, a marker which is supposed
- 56:18to be high in stem cells would,
- 56:21should be tapering down to zero
- 56:23along pseudo time, and a marker,
- 56:26which is supposed to be high in neuroectoderm
- 56:30should be increasing with pseudo time.
- 56:34So, we had, I think six oral markers
- 56:38to each of stem cells, neuroectoderm
- 56:41and endoderm cells.
- 56:46And we were trying to confirm the combination
- 56:49that neuroectoderm markers increase
- 56:52with pseudo time, but the other two decrease,
- 56:54or the endoderm shouldn't decrease necessarily,
- 56:58but it shouldn't have
- 57:00a monotonic increase like the neuroectoderm one.
- 57:08And it should not be present in the initial.
- 57:12Does that...
- 57:14So, that was one way to do it, basically.
- 57:21<v Lecturer>Thank you.</v>
- 57:22Any other questions?
- 57:40So, with the combination of many cells,
- 57:41and the spatial stuff, is there any hope
- 57:44of getting a temporal signal out of any of that,
- 57:46or is that (indistinct)?
- 57:50<v ->In spatial did you mean?</v>
- 57:52<v Lecturer>Yeah.</v>
- 57:53<v Dr. Deshpande>So, I think,</v>
- 57:59the issue would be, I guess,
- 58:00not in clinical, I suppose.
- 58:04In a sense that, okay, are you thinking
- 58:06about pseudo temporal, or just clinical?
- 58:08<v Lecturer>Yeah.</v>
- 58:10<v ->Pseudo temporal, I think there might</v>
- 58:11be some possibility,
- 58:12and I've been thinking of
- 58:18as in, we would still have to isolate,
- 58:20I guess, cell types, for example.
- 58:22So, one of the problems with that
- 58:24is that as I mentioned,
- 58:27the spots are not exactly single cell, right?
- 58:31So, especially, let's say if you're trying
- 58:33to do a pseudo temporal ordering
- 58:36of CD8 T-cells,
- 58:39they are more,
- 58:41more likely than not, co-localized
- 58:44with other cell types, which would also,
- 58:49I guess, corrupt the expression
- 58:51that you are seeing.
- 58:53So, that would make it slightly different.
- 58:57We could think of ordering the spot
- 59:01as a whole, basically.
- 59:03And my...
- 59:05I belong to a school of thought that basically,
- 59:07if you have a...
- 59:08And then, so what people try to do
- 59:10with say, this kind of data,
- 59:13this spacial Visium data,
- 59:15where you have say, up to 10 cells,
- 59:17they try to resolve this into cell types.
- 59:23So, they would compare that to, there is I think,
- 59:25one paper called RTCD, or RCTD.
- 59:30RCTD robust cell type decomposition.
- 59:33So, what they do is basically,
- 59:36they take the spatial data,
- 59:38they have a reference single cell data,
- 59:41and they try to assign each spot,
- 59:47or a resolve each spot into a mixture
- 59:51of the cell types that might exist
- 59:54in the single cell data.
- 59:57And that could help you to say,
- 01:00:01identify what the mixture in general is.
- 01:00:04But my as in my thought is that we could
- 01:00:09just think of each spot as some representation
- 01:00:15of the biology in that neighborhood.
- 01:00:17So, each spot could just represent
- 01:00:20a neighborhood, as opposed to trying to find
- 01:00:22what the individual cells are.
- 01:00:25And that would basically abstract out
- 01:00:30the representation and the biology to that
- 01:00:33of the spots.
- 01:00:35And we'll have to think about how to do that,
- 01:00:37but I think there could be some ordering to that,
- 01:00:40but we'll need to see what makes sense.
- 01:00:45And then, for a lot of cells, cell states,
- 01:00:49they are quite well-characterized.
- 01:00:51For example, if you say that a T-cell
- 01:00:53is activated, or a T-cell as naive,
- 01:00:55or exhausted, you know what markers to expect.
- 01:00:59But what would you be able to say
- 01:01:02for spots instead?
- 01:01:05The other thing to think of is,
- 01:01:10especially with say, the proteomics as well,
- 01:01:12where you can get actual single cell
- 01:01:18and distributions, and neighborhood characterization.
- 01:01:22You could think of it as can you,
- 01:01:27so the same thing that...
- 01:01:28The same ideas that were used
- 01:01:31for pseudo temporal ordering of cells,
- 01:01:34can they be used for pseudo temporal
- 01:01:36ordering of neighborhoods?
- 01:01:39For example, if you have a cell neighborhood,
- 01:01:41which as they're presented as whatever,
- 01:01:45the central cell, and it's five neighbors.
- 01:01:49Now, depending on, are they all tumor?
- 01:01:52Then maybe they have...
- 01:01:53They're basically deep in the cancer,
- 01:01:54which has never been visited by an immune cell,
- 01:01:58is that a mix of tumor
- 01:01:59and activated immune cells?
- 01:02:02So, that is basically an active tumor
- 01:02:04immune interaction that's happening.
- 01:02:06Is that exhausted T-cells and tumor,
- 01:02:10where basically the tumor
- 01:02:11has fought back and tried to suppress the...
- 01:02:16Or it's basically sent signals
- 01:02:17to suppress the immune response, and so on.
- 01:02:21So, perhaps there could be
- 01:02:22a trajectory of neighborhoods,
- 01:02:25where you could say that depending on all
- 01:02:29the possible combinations that you expect
- 01:02:31in cellular neighborhoods,
- 01:02:35this current neighborhood is this far along
- 01:02:40that process, or that branch of a process.
- 01:02:44That was a long and winding answer.
- 01:02:47(chuckles) I don't know if
- 01:02:49that necessarily answered it. <v Lecturer>Thank you.</v>
- 01:02:52Thank you, any last questions?
- 01:02:54I wanna be mindful of time.
- 01:02:56Any questions that come to you, or?
- 01:03:06All right, well if not, thank you again.
- 01:03:09(students applaud) We really appreciate that.
- 01:03:11<v Dr. Deshpande>Thank you a lot.</v>
- 01:03:15<v Lecturer>You have a wonderful (indistinct).</v>
- 01:03:16<v ->Mm-hmm.</v>
- 01:03:20(lecturer mumbles indistinctly)
- 01:03:27(students chatter indistinctly)