Skip to Main Content

YSPH Biostatistics Seminar: "Exploring Space and Time for Identifying Gene Interactions Using Single-cell Transcriptomics"

October 05, 2021

Atul Deshpande, PhD, Postdoctoral Researcher, Division of Biostatistics and Bioinformatics, Johns Hopkins University

October 5, 2021

ID
6959

Transcript

  • 00:00<v ->Today it is my honor to introduce,</v>
  • 00:02Dr. Atul Deshpande.
  • 00:04Dr. Deshpande is a postdoctoral researcher
  • 00:07in the lab of Dr. Elana Fertig
  • 00:09in the department of oncology,
  • 00:11at Johns Hopkins University.
  • 00:13He has a PhD in electrical engineering
  • 00:15from the University of Wisconsin-Madison,
  • 00:17and his interests include
  • 00:18the use of time series analysis
  • 00:20and spatial statistics
  • 00:21for modeling biological processes.
  • 00:24He's currently developing analysis techniques
  • 00:26to use single cell and spacial multigenomics
  • 00:28for the characterization of
  • 00:30the tumor microenvironment
  • 00:32and intracellular signaling networks.
  • 00:34Welcome. (students applause)
  • 00:40<v ->Well, thank you so much.</v>
  • 00:41And once I figure out my...
  • 00:48Where my PowerPoint window is,
  • 00:49we can start in earnest.
  • 00:52Okay, yeah, thank you for the kind introduction.
  • 00:55So, I'm Atul Deshpande,
  • 00:57and today the title of my talk is exploring time
  • 01:01and space for identifying gene interactions
  • 01:04using single cell transcriptomics.
  • 01:07So, what do time and space mean
  • 01:10in the context of this talk?
  • 01:13So, they refer to recent technological advances
  • 01:15and the algorithms, which are the foundation
  • 01:17for the projects I will be talking about.
  • 01:20And the first advance is the ability
  • 01:24to measure gene expression in individual cells.
  • 01:27This in turn inspired development
  • 01:29of algorithms that ordered these cells along
  • 01:32the biological trajectory.
  • 01:34Using these algorithms, we can observe changes
  • 01:37in gene expression in
  • 01:39a pseudo temporal reference for pseudo time,
  • 01:43which is a measure of the progress
  • 01:45of the biological process.
  • 01:48The second is a more recent ability
  • 01:50to measure gene expression
  • 01:52within the spatial context of the tissue.
  • 01:54But this we can analyze changes
  • 01:56in gene expression
  • 01:58as cellular neighborhoods change,
  • 02:00or as the tissue type changes.
  • 02:07So, before single cell transcriptomics,
  • 02:10we would usually get one measurement
  • 02:12of gene expression from a collected sample.
  • 02:15And this is now called
  • 02:18bulk RNA-seq in retroactively.
  • 02:22However, as this measurement would just be
  • 02:26an average of the population of cells
  • 02:27in the sample, and it would obscure information
  • 02:31about the different cell types, or different
  • 02:34cell states in the population.
  • 02:36With single-cell RNA-seq,
  • 02:37we can now measure gene expression
  • 02:40in individual cells.
  • 02:41Depending on technology, this can range
  • 02:43from a few hundred cells up to hundreds
  • 02:46of thousands of cells.
  • 02:49And this allows us to observe
  • 02:51the full heterogeneity of the cell population
  • 02:56represented by gene expression.
  • 02:59And using this high dimensional data
  • 03:02that we now have,
  • 03:03we can characterize different cell types
  • 03:05and cell states as gene expression vectors.
  • 03:12So, one drawback of this technique
  • 03:14is the issue of technical dropouts.
  • 03:17Now, this is characterized by observing,
  • 03:21as in us observing a lot
  • 03:23of false zeroes, or zero inflated measurements,
  • 03:26because we are unable to reliably measure
  • 03:29the low iron accounts in individual cells.
  • 03:35Now, the first project
  • 03:39that I will discuss uses
  • 03:43a single cell RNA-seq technology,
  • 03:46or as it's downstream of that.
  • 03:49And it uses also downstream of algorithms,
  • 03:53which order single cell data into trajectories,
  • 03:58which represent the biology
  • 04:01that they might be studying.
  • 04:02For example, let's say if you are...
  • 04:04You have a dataset, which corresponds
  • 04:09to stem cell differentiation,
  • 04:11there are probably now 70 different
  • 04:15trajectory inference methods depending on what
  • 04:17kind of datasets you are studying,
  • 04:21what biology you want to study,
  • 04:23how big the dataset is,
  • 04:25or what the expected trajectory is
  • 04:28of the biology that you're studying maybe.
  • 04:30And they attempt to order these cells based
  • 04:34on the expression of potentially
  • 04:37a few key marker genes, or how, which genes
  • 04:40are differentially expressed along
  • 04:43the biological process.
  • 04:46So, anytime you collect,
  • 04:48let's say a single cell RNA-seq data,
  • 04:51you would find a mix of cells,
  • 04:54and that was the entire motivation
  • 04:56for doing this.
  • 04:57But that mix of cells would have
  • 05:01a range of cell states,
  • 05:03which could correspond to
  • 05:07from the beginning of the biological process,
  • 05:09to the very end of the biological process.
  • 05:12And what these algorithms are trying to do
  • 05:14is they're trying to fit these cells
  • 05:18in their right place, in the biological process.
  • 05:23And once we do that, we can actually observe
  • 05:25the gene expression along this ordering.
  • 05:30And a lot of these methods also assign
  • 05:34a pseudo time to each cell,
  • 05:35which tells you how far along in the biology
  • 05:39they think, or they hypothesize that the cell is.
  • 05:43And so, the question that we wanted
  • 05:45to ask is given this pseudo temporal ordering
  • 05:50of the cells, which gives us
  • 05:54a gene expression dynamics
  • 05:55in the pseudo temporal reference.
  • 05:58Can we use these dynamics
  • 06:03to infer gene regulatory networks?
  • 06:07Or any directed networks from say,
  • 06:10sets of genes to their targets.
  • 06:14And the second question
  • 06:14was whether the assigned pseudo time values help
  • 06:19us in the network inference task.
  • 06:25So, to make the, I guess,
  • 06:31explanation more approachable,
  • 06:34I will just use an example dataset.
  • 06:37And as I explained, the concepts I've...
  • 06:41We will just see what that means
  • 06:44in terms of this dataset.
  • 06:45So, this is a dataset from Semrau et al,
  • 06:50and this is a single cell data
  • 06:53from retinoic acid, driven differentiation.
  • 06:57And in this mouse, embryonic stem cells
  • 07:01differentiate into neuroectoderm
  • 07:02and extraembryonic endoderm cells.
  • 07:06Now the data as collected had nine samples,
  • 07:10one before the differentiation starts
  • 07:12and one after every six hours.
  • 07:15So, you have data collected over 96 hours
  • 07:19from nine samples, and each sample has 384 cells.
  • 07:24So overall, I believe we have something
  • 07:27like you can do the math.
  • 07:29I guess, 2,600 cells or something like that.
  • 07:33So, we chose to apply
  • 07:37two trajectory inference methods to this.
  • 07:39So, the first one is monocle 2,
  • 07:42which is also called Monocle DDR tree, I believe.
  • 07:45And the second one is PAGA Tree.
  • 07:47So, both of these methods identify
  • 07:50a bifurcating trajectory from these cells.
  • 07:53And so, the first one is to the left
  • 07:56where the embryonic stem cells are actually
  • 08:01on the right of...
  • 08:03I'm not sure if people can see my mouse pointer,
  • 08:07but yeah, they're on the right of the trajectory.
  • 08:09And then, towards the bottom left,
  • 08:14you go into a neuroectoderm state
  • 08:16and towards the...
  • 08:19Right, top left, you go into an endoderm state.
  • 08:24And on the right side, the way PAGA Tree
  • 08:27infers trajectory is you have
  • 08:30the embryonic stem cells on the top left.
  • 08:33And then, it identifies
  • 08:35a few more branches than Monocle does.
  • 08:39But both of these
  • 08:40identify branching trajectories.
  • 08:43And in each case we selected
  • 08:48the two branches,
  • 08:49which corresponded to markers, which were,
  • 08:54which ended up being high for neuroectoderm.
  • 08:56So, the trajectories, the sub trajectories
  • 09:00from each method that we've wanted to study
  • 09:03was the embryonic stem cells to neuroectoderm,
  • 09:08using these two methods.
  • 09:11So, this as in, so we had...
  • 09:14We have these two trajectory inference methods,
  • 09:16which assigned their own pseudo times,
  • 09:18and this is the pseudo temporal expression
  • 09:23dynamics for the same gene.
  • 09:25I did not mark which gene it was, but yeah,
  • 09:29so this was for the same gene.
  • 09:30And you can see that the dynamics
  • 09:33that each of these trajectories gives
  • 09:35us is different.
  • 09:37First of all, the main branch,
  • 09:40or sub part of the trajectory that
  • 09:42we are considering has
  • 09:44a different number of cells.
  • 09:46And these cells may not necessarily be common
  • 09:48to both end.
  • 09:49There will be some which are common
  • 09:50to both of these trajectories,
  • 09:52but some others which are completely different.
  • 09:54But also, that the cell ordering itself
  • 09:57that each method based on whatever mathematics
  • 10:01they use, or whatever algorithms they use,
  • 10:05would differ between these two methods.
  • 10:08So, as you see, Monocle has a higher expression
  • 10:12much earlier in the pseudo time,
  • 10:14as opposed to PAGA Tree, which has much later.
  • 10:18And the pseudo times here,
  • 10:20were not exactly 100, they're just nominalized
  • 10:22to 100 just represent progress from 0%
  • 10:25of the biology to 100% of the biology,
  • 10:31or as inferred by that method.
  • 10:34So, now what are the challenges associated
  • 10:37with order single-cell data?
  • 10:39So, the first one is that unlike say,
  • 10:43stock data, or say weather data,
  • 10:47or something like that, you don't necessarily
  • 10:49have a uniform distribution of cells.
  • 10:54And if you're going to do a time series analysis,
  • 10:56that would mean that you do not
  • 10:57have regularly spaced time series,
  • 11:00but you actually
  • 11:01have irregularly space time series.
  • 11:03On top of that, the pseudo time values
  • 11:05that are assigned to the cells
  • 11:07and ordering stem cells is uncertain.
  • 11:13Now, finally, we recall that we had the issue
  • 11:17of zero inflated measurements,
  • 11:19or false zeroes in the meter
  • 11:21because of technical dropouts.
  • 11:26So, the question is how to overcome all
  • 11:29of these drawbacks
  • 11:32to try and find
  • 11:36networks from this time series data.
  • 11:40So, the project that we had,
  • 11:43it resulted in basically
  • 11:45an algorithm called SINGE,
  • 11:46which is single cell inference
  • 11:48of networks from Granger ensembles.
  • 11:50So, this was done at the Morgridge Institute
  • 11:53for Research in Madison, Wisconsin.
  • 11:55And these are my collaborators on this project.
  • 12:01And let's see, okay.
  • 12:03So, the main concept that we build on
  • 12:06is basically the Granger causality test.
  • 12:08It was introduced by Clive Granger in 1960s.
  • 12:14And to give a very simple example
  • 12:16of what it's trying to say is, let's say
  • 12:17if you have two times series X and Y,
  • 12:22now Granger causality tests, whether
  • 12:26the prediction of current values of Y
  • 12:28improves by using past values of X,
  • 12:31in addition to past values of Y.
  • 12:34And if that happens, then we say
  • 12:36that X Granger causes Y.
  • 12:38So, this is basically a lag regression
  • 12:41between X and Y.
  • 12:42So, this has had applications
  • 12:44in econometrics and finance,
  • 12:46and is also being used
  • 12:47in computational neuroscience and biology,
  • 12:51as noted in these examples here.
  • 12:55Now, the multivariate Granger causality test
  • 12:58can be thought of as setting up and solving
  • 13:00a vector, or regression model,
  • 13:02where you have say, P genes, T time points
  • 13:05and L lags.
  • 13:06Where L lags is telling you how many,
  • 13:11say your relationships with the past expressions
  • 13:14you're trying to model.
  • 13:16And once you have that,
  • 13:17you could think of solving this way,
  • 13:21our model by just minimizing
  • 13:24this objective function here.
  • 13:27And that would give you, I guess,
  • 13:28a few edges between the past values
  • 13:31of all of the genes and your target gene.
  • 13:34Okay, maybe I should have explained
  • 13:36this figure first.
  • 13:37So, you have all the regular,
  • 13:39all the possible regulators of a gene,
  • 13:42and then you have a target gene,
  • 13:43and you're trying to identify
  • 13:46what explains what past values
  • 13:49of any of these genes explains
  • 13:51the current values of the target gene.
  • 13:55And if you wanted to have
  • 13:59a sparse representation of this network,
  • 14:02or have an...
  • 14:03Count only a few of the edges,
  • 14:05you would introduce this by CT parameter,
  • 14:08which would ensure that the edges from say,
  • 14:12all of these genes to your target
  • 14:15are not numerous.
  • 14:16And you can explain the biology in a few edges.
  • 14:22Now, to counter the irregularity
  • 14:27of the time series, we use
  • 14:30an idea called Generalized Lasso Granger.
  • 14:33So, what this does is,
  • 14:36I'm not sure, maybe I have...
  • 14:39Yeah, okay, so just to recall, right?
  • 14:44So, you have a pseudo temporal data,
  • 14:46which has irregular time series,
  • 14:48and you have missing values,
  • 14:51which show up as zeros here, right?
  • 14:54So, we want to adapt the Lasso Granger test
  • 15:00for irregular time series.
  • 15:02So, what was previously,
  • 15:05basically coefficients from older samples
  • 15:07in regular time series,
  • 15:09now becomes coefficients from just timestamps
  • 15:15in the past.
  • 15:16Because you might not necessarily have
  • 15:18a sample at that point.
  • 15:21Furthermore, we can rethink basically,
  • 15:28the object to function as originally,
  • 15:33if it was a dot predict between
  • 15:35the coefficients and the values
  • 15:38of the gene expression,
  • 15:41we rethink that as a weighted dot predict,
  • 15:45where basically we...
  • 15:48And this is the description
  • 15:49of the weighted dot predict, where you use
  • 15:51a Gaussian kernel to weight the inputs
  • 15:56pseudo product based on their proximity
  • 15:59to the timestamps that you...
  • 16:02That correspond to these coefficients.
  • 16:04So, these ellipses here show kernels,
  • 16:08I guess, they represent kernels.
  • 16:10They don't necessarily stop at these bandwidths,
  • 16:12but they just keep going
  • 16:13because they're ghosting kernels.
  • 16:16But these just represent the kernels,
  • 16:18where basically, if you have
  • 16:20a timestamp corresponding to coefficient
  • 16:22and you have no sample at that timestamp,
  • 16:25that doesn't necessarily mean
  • 16:26that the input to the gene predict it is zero.
  • 16:31So, basically what you would do is
  • 16:33you would just look at a bin around
  • 16:36that timestamp, and weight input from regulators,
  • 16:42depending on their proximity to this timestamp.
  • 16:46So, if the sample is exactly at
  • 16:51the timestamp that you expect,
  • 16:52you would rate it highly based
  • 16:54on discussion kernel, and the farther
  • 16:56you move away from the timestamp,
  • 16:58the weaker the rate of
  • 17:02that particular sample would be.
  • 17:05So, what this helps us do
  • 17:07is if there are say more than one cells
  • 17:10in close proximity, it would take input
  • 17:14from all of them.
  • 17:15If there are no cells in the close proximity
  • 17:18to at least take input from some cells,
  • 17:20which are farther away, and so on.
  • 17:25So, yeah, as in this works
  • 17:27with irregular time series,
  • 17:28because you don't necessarily have
  • 17:30to expect samples in the past at the timestamps
  • 17:34that you wanted them to.
  • 17:36And yeah, I think we already discussed this.
  • 17:40So, now, as in going back to the case for...
  • 17:45So, we had these false zeroes, right?
  • 17:48So now, because of this kernel method,
  • 17:50we have an inherent imputation over missing data.
  • 17:54So, now we get what we could think of as,
  • 17:58instead of taking all of the zeros
  • 18:00as they are at face value,
  • 18:03we can treat them, or some of them
  • 18:04as dropouts, as just missing data.
  • 18:09And we just remove those samples now,
  • 18:11because we can now work
  • 18:13with irregular time series.
  • 18:15And because of this kernel method,
  • 18:17we can actually work with time signature,
  • 18:19all uniquely irregular.
  • 18:22We can work with...
  • 18:24We can remove the zero valued samples
  • 18:26and get a different, differently irregular
  • 18:30time series for each of these genes.
  • 18:33And so, such an action can probably
  • 18:37be informed by imputation techniques like magic,
  • 18:40which help you complete,
  • 18:42or impute zeros in the dataset.
  • 18:44So, instead of imputing the dataset,
  • 18:46as you could just use its output
  • 18:48to decide whether or not to remove the data from,
  • 18:51or remove that zero from this input dataset.
  • 18:58So, this is just an illustration
  • 19:00of a single generalized Lasso Granger test.
  • 19:04So, you have the POU5F1 gene, and it's basically,
  • 19:08you see it's the cells corresponding
  • 19:11to that, or other details expression
  • 19:16along pseudo time.
  • 19:18And what you also see is two trendlines
  • 19:23predicted using a Lambda of 0.1,
  • 19:27which is basically a sparsity constraint of 0.1.
  • 19:29So, it would have fewer edges
  • 19:32between the regulators and POU5F1.
  • 19:36And then a Lambda of 0.02,
  • 19:41which has far more regulators.
  • 19:43And you can see that both of these predict
  • 19:46the trends of POU5F1 when using
  • 19:49the past values quite well.
  • 19:54So, now that was just one GLG test.
  • 19:59Now, what SINGE does, is it performs multiple
  • 20:01such GLG tests where you sub-sample
  • 20:04the time series different ways
  • 20:07to get different irregulars time series again.
  • 20:12And you also use diverse hyper-parameters
  • 20:14to effectively using these two combinations,
  • 20:17slice the cake multiple ways and trying
  • 20:20to look at the data.
  • 20:22So, the type of barometers
  • 20:23that we use are Lambda, which determines
  • 20:25the sparsity of the network that we get,
  • 20:29or get into metrics that we get.
  • 20:31And we have Delta T, which gives us
  • 20:36a time resolution of the lags between say,
  • 20:40the past regulators
  • 20:41and the current target timestamps,
  • 20:45and the number of likes that you have.
  • 20:47So together, they will tell you how far behind
  • 20:51in pseudo time should you be looking to try
  • 20:54to predict the expression of the target.
  • 20:57And finally, the kernel width,
  • 20:59which tells how far, how wide the width should be
  • 21:03around the timestamp that you are considering.
  • 21:08Now, once we get
  • 21:11adjacency matrices from all of these,
  • 21:13we get, we considered them as partial networks,
  • 21:17and we get ranked lists from each of them.
  • 21:20And we aggregate these rank lists
  • 21:22using a modified border count.
  • 21:24So, border count is something
  • 21:25which has been used in election.
  • 21:29It's basically an election, I guess,
  • 21:31result aggregating strategy,
  • 21:34where if you have five candidates,
  • 21:36you rank them from one to five,
  • 21:39and then the person who has, I guess,
  • 21:42the lowest number here over all
  • 21:44of the people that voted,
  • 21:47they would win the vote.
  • 21:49So, the modified border width
  • 21:52is basically the same concept,
  • 21:53but the only change that we did
  • 21:55was we wanted to place more weight
  • 22:03to a ranking, which distinguishes
  • 22:05between say a one, the first interaction
  • 22:10we find with the 10th interaction we find.
  • 22:12As opposed to say, the 10,000th interaction
  • 22:15we find with the 10,010th interaction
  • 22:18that we find.
  • 22:19So, that's why the weighting before adding
  • 22:23these border weights is one over N squared,
  • 22:26as opposed to say, N here.
  • 22:33So, yeah, once we aggregate this,
  • 22:36we get a final rank list.
  • 22:39And so, we had to do in for trajectories,
  • 22:43we got gene dynamics from them,
  • 22:46and now that results in two different networks.
  • 22:49And there's just showing the top 100 edges
  • 22:53from Monocle 2 and PAGA Tree.
  • 22:55Now, you can obviously see
  • 22:56that they look very different.
  • 22:59Some of the edges I think, are common,
  • 23:02but they can be very, very different.
  • 23:05So, now the question is,
  • 23:08which of these is right, or better?
  • 23:12So, for that we would have
  • 23:14to first think of, okay,
  • 23:15how do we evaluate this?
  • 23:17So, one way to evaluate that would be
  • 23:20to do a precision recall evaluation.
  • 23:24So, let's say we have this rank list
  • 23:25of candidate gene interactions that we just got
  • 23:28from SINGE and a gold standard,
  • 23:31which knows the truth.
  • 23:33As we go down this rank list,
  • 23:34the precision metric tells us
  • 23:37what fraction of the prediction
  • 23:38so far have been correct.
  • 23:40And the recall metric tells us
  • 23:42how many of the total interactions
  • 23:44in the gold standard, which were correct
  • 23:46have so far been covered.
  • 23:48So, the figure on the right shows
  • 23:51a precision recall curve for two rank lists.
  • 23:54The ideal precision recall curve
  • 23:56would place all the edges in the gold standard
  • 23:58at the top of the list.
  • 23:59So, that's the dotted line that you see here,
  • 24:04and the area under that precision
  • 24:06we call curve (mumbles) blue one.
  • 24:09A random list in expectation would be flat.
  • 24:13So, and it would have a precision
  • 24:15recall curve, and the area under
  • 24:18that curve would be 0.5.
  • 24:20and here, I guess, to make belief orderings.
  • 24:27And in this example, we can see
  • 24:29that the precision we call curve of A,
  • 24:35which I guess, the predictor A is better
  • 24:40because it starts off with having more ones,
  • 24:45or as in a high precision, and then falls
  • 24:48as opposed to B, which rises
  • 24:50from a low precision.
  • 24:51What it means that A gets more hits
  • 24:55in the top of its list as opposed to B,
  • 24:57and so on.
  • 24:58And so, one way to also evaluate
  • 25:01these position we call curves is to just look
  • 25:03at the area under the curve, which is so A here
  • 25:06is 0.7 and B's 0.52.
  • 25:07And that tells us that on an average
  • 25:10A ranks edges better as opposed to B.
  • 25:16Now, we would like to use near this,
  • 25:19and the question is what could we use as
  • 25:22a gold standard?
  • 25:24Now, this is real biological data
  • 25:26that we are using, and for that,
  • 25:29we would also need to look into
  • 25:32the literature to find validation.
  • 25:35So, one good source of information
  • 25:37is the escape database curated
  • 25:39by the Ma'ayan lab.
  • 25:41And this database includes the results
  • 25:44of loss of function and gain of experiments
  • 25:47done on genes, and also
  • 25:49and also ChIP-seq experiments,
  • 25:50which identify binding sites
  • 25:52of transcription factors.
  • 25:54Now, the problem being that even this database
  • 25:58is incomplete because the gaps
  • 26:01in biological knowledge remain and doesn't,
  • 26:04I guess over the time, over time,
  • 26:06it would be completed, filled more and more.
  • 26:09But when we were doing this evaluation,
  • 26:12we had to deal with what was effectively
  • 26:14a partial gold standard,
  • 26:16or an incomplete gold standard.
  • 26:18So, the evaluation that we did was not
  • 26:20for all of the genes in the dataset,
  • 26:23but only a fraction of the genes.
  • 26:28So, we had these two methods
  • 26:33and two pseudo times, which we got from that.
  • 26:36So, what we wanted, what we did
  • 26:38is we compared the performance of SINGE
  • 26:43using say, Monocle 2 and the pseudo time,
  • 26:46as well as Monocle 2 with only the ordering.
  • 26:49And some of the least PAGA Tree
  • 26:50fed the pseudo time and PAGA Tree
  • 26:52with only the ordering.
  • 26:54And so, this is how the precision recall curves
  • 26:58of these four methods look.
  • 27:01So, we look at the average precision,
  • 27:04which is the same thing as the area under
  • 27:06the precision recall curve.
  • 27:08And we also look at the average precision
  • 27:10in the early part of the precision recall curve.
  • 27:14And the point for that being that,
  • 27:18in say, a usual workflow,
  • 27:21you would have a combination method,
  • 27:24which would point to some important edges,
  • 27:29and then, you would potentially tell
  • 27:31a collaborator to try
  • 27:34and experimentally validate that.
  • 27:36And in that sense, you would be giving
  • 27:38them results from the top of your list,
  • 27:40as opposed to trying to tell how well
  • 27:43the 10,000th edge in the list
  • 27:45is placed in the rankings.
  • 27:48So, with that in mind, we also look
  • 27:50at what's the average early precision
  • 27:53of these curves.
  • 27:55And for that, we basically say what happened,
  • 27:59as to what extent is the precision maintained
  • 28:04until 10% of the genes
  • 28:06and the gold standard are...
  • 28:08Or interactions with the gold standard
  • 28:10are regarded in the list that we have.
  • 28:15So, the figure to the right shows
  • 28:18a scatterplot of these, the average precision
  • 28:20and the average early precision
  • 28:21for these four methods, for these four options.
  • 28:25And what we see is that the...
  • 28:27The best performing combination
  • 28:29is using Monocle's ordering,
  • 28:31but not its pseudo time, and Monocle applying
  • 28:36the pseudo time that it order,
  • 28:38that it assigns to the cells,
  • 28:41actually degrades the performance quite a bit.
  • 28:46And both of the PAGA Tree options
  • 28:49with, or without pseudo time,
  • 28:51are in between these.
  • 28:52So, now why would this happen?
  • 28:56For example, and let's take
  • 28:57an extreme case, right?
  • 28:58And okay, before that, there's not necessarily
  • 29:04something that's wrong with Monocle,
  • 29:06but it's basically that for this dataset,
  • 29:09in this instance, the pseudo time values
  • 29:12did not necessarily make a lot of sense.
  • 29:15So, let's say you have perfectly ordered cells.
  • 29:17And for the first half of the cells,
  • 29:19you just assign a value very close
  • 29:22to zero and the second half,
  • 29:23you assign a value very close to one.
  • 29:26So, even though the ordering of the cells
  • 29:27was quite nice and reliable, just because
  • 29:31we ended up assigning a value
  • 29:34to the pseudo times, often times,
  • 29:36which is completely unrealistic.
  • 29:38We might end up losing
  • 29:41a lot of information
  • 29:42that we otherwise had in the dataset,
  • 29:44or in the ordering.
  • 29:49So, yeah, as an extended,
  • 29:53the ideas from this particular figure, right?
  • 29:56So, you have two methods,
  • 29:57they're giving you two different...
  • 30:00Okay, two methods with their orderings
  • 30:01and pseudo times, so basically four cases,
  • 30:05and they all give you different rankings,
  • 30:08which have different performances
  • 30:12in terms of network evaluation.
  • 30:14And in a sense, you could say
  • 30:19that each of these PAGA Tree inference methods
  • 30:22itself with all their inefficiencies
  • 30:25and efficiencies are only partially looking
  • 30:28at the biological data.
  • 30:30So, from that perspective, each
  • 30:34of these orderings and pseudo time values
  • 30:37can be considered as sources
  • 30:39of noisy information,
  • 30:40or noisy sources of information.
  • 30:42So, instead of trying to just infer
  • 30:49one pseudo time trajectory from
  • 30:52the dataset and finding the network,
  • 30:55or say another, and finding
  • 30:56the network from that, we could think
  • 30:58of the trajectory inference method itself
  • 31:01as an additional hyper parameter
  • 31:03on top of the sparsity, and kernel bits,
  • 31:06and so on.
  • 31:08So, instead of aggregating at this point
  • 31:10after just one trajectory inference method,
  • 31:12we could just say that maybe
  • 31:14we have four trajectory inference methods
  • 31:19in the beginning.
  • 31:20And after that, we do all
  • 31:23of these sub sampling and application
  • 31:25of hyper-parameters, and multiple tests.
  • 31:28And then, we aggregate over all
  • 31:29of these results across
  • 31:31trajectory inference methods.
  • 31:33So, hopefully what that would do
  • 31:34is that would account for all the inefficiencies,
  • 31:39or counter then inefficiencies
  • 31:40of individual trajectory inference methods,
  • 31:43and give us a more robust network at the end.
  • 31:49And I have not, I guess, shown
  • 31:52our comparisons for the other methods,
  • 31:56which obviously isn't in our paper.
  • 31:58We are doing better than them.
  • 32:00So, but you can have a look at
  • 32:03that in the paper if you're interested,
  • 32:05because I just wanted to conceptually focus
  • 32:08on these ideas a little bit more.
  • 32:11So, I guess, one problem with trying
  • 32:14to run four different, or five different
  • 32:17trajectory inference methods is depending on
  • 32:19what kind of data set you have
  • 32:20and what kind of biology you are studying,
  • 32:23you might not necessarily have
  • 32:27to try only four methods.
  • 32:29You will probably have
  • 32:30to try multiple methods before,
  • 32:32which let's say, if you know
  • 32:34it's a branching trajectory,
  • 32:35you end up seeing a branching trajectory.
  • 32:38And each of these methods would have
  • 32:41their own input data format,
  • 32:44up data formats, visualizations,
  • 32:49and all of these other intricacies.
  • 32:52And that's where the dynverse project comes
  • 32:55to our rescue.
  • 32:56So, if anyone is looking to do
  • 33:00a lot of trajectory inference methods,
  • 33:01I would strongly encourage you to look at that.
  • 33:04So, these in this project,
  • 33:06they have streamlined the use of, I think,
  • 33:1055 trajectory inference methods.
  • 33:12So, you don't necessarily need to install
  • 33:14each one of them.
  • 33:15You just install this project
  • 33:16and they run each
  • 33:18of these methods using a docker.
  • 33:21And so, what it also helps you do
  • 33:23is it helps you visualize
  • 33:26all of these trajectories and evaluate them using
  • 33:31the same, I guess, support scripts
  • 33:35and support functions, which they also provide.
  • 33:38And in all this, this would make
  • 33:42your lives quite easy.
  • 33:44And they also have basically a user,
  • 33:47a graphical user interface,
  • 33:48which helps you prioritize
  • 33:52what trajectory inference method to use,
  • 33:55depending on what biology you want to study.
  • 34:00How many cells you have, what compute power
  • 34:02you might have access to, and so on.
  • 34:12So, okay just some final comments on the use
  • 34:17of, I guess, the utility of trajectory inference
  • 34:19and pseudo times for further analysis.
  • 34:22And so, first of all, as in trajectories
  • 34:25look really nice, they visually,
  • 34:28they give us a lot of information.
  • 34:31And so, based on what we saw,
  • 34:33we did see that there's some,
  • 34:39the ordering information
  • 34:41and the pseudo time values can help
  • 34:43in network inference.
  • 34:45The good pseudo times can help a little bit,
  • 34:49but if you have exceptionally bad pseudo times,
  • 34:51it can hurt a lot as opposed to ordering.
  • 34:55And not every dataset is really suitable
  • 34:59for trajectory inference.
  • 35:00What do I mean by that?
  • 35:01So, the dataset that I chose,
  • 35:04and I guess a lot of what is...
  • 35:08What particular inference methods
  • 35:10are built around, as say,
  • 35:11stem cell differentiation in general,
  • 35:14where it's as in the biology is quite neat
  • 35:19to begin with.
  • 35:20As in you start off from a single cell type,
  • 35:23and a lot of the biology is already known.
  • 35:27So, you don't have to worry, you know
  • 35:30that it's going to be a branching,
  • 35:32or bifurcating, or multi furcating trajectory.
  • 35:36So, you know that the quality of the biology,
  • 35:38you know what cell states to exist, to expect,
  • 35:43and so on, and so forth.
  • 35:44You know the markers of each of those.
  • 35:46And so, studying something like that
  • 35:49is much more easier using trajectory inference,
  • 35:53or pseudo time.
  • 35:54On the other hand, let's say,
  • 35:56if you had a sample from a cancer tumor
  • 35:59in that you would find cancer cells,
  • 36:02normal cells, a bunch of immune cells,
  • 36:06probably 10 to 20 kinds of immune cells,
  • 36:10and so on.
  • 36:12So, the trajectory inference method
  • 36:15usually tracks, or predicts places,
  • 36:18cell states and context.
  • 36:20Not cell types themselves.
  • 36:23So, you wouldn't necessarily be able
  • 36:25to reliably run a trajectory inference method
  • 36:29across as in using a mix of different cell types,
  • 36:33as opposed to cell states.
  • 36:35Now, with the stem cell differentiation,
  • 36:38the good thing is that the cell states
  • 36:41themselves after a point, transition
  • 36:43into different cell types,
  • 36:45because it's the same cell,
  • 36:47or same cell type which transitions
  • 36:50through multiple cell types,
  • 36:53through these cell states.
  • 36:56But that's not the case with cancer biology,
  • 36:58where you already start off
  • 37:01with a mix of cell types and trajectory inference
  • 37:06would not make sense for that mix.
  • 37:08What people have tried is isolate,
  • 37:11just say a T-cell type, and then try
  • 37:16to order, or find the trajectory only
  • 37:19for those T-cells.
  • 37:21And there has been some success in that.
  • 37:23So, you could run trajectory inference
  • 37:27for a subset of the dataset, but not necessarily
  • 37:30the entire dataset.
  • 37:32And so, depending on what biological processes
  • 37:38you want to study,
  • 37:41there are trajectory inference methods,
  • 37:43which may or may not be suitable for it.
  • 37:45For example, a number of methods
  • 37:47like Monocle and PAGA Tree,
  • 37:51they try to find tree-like structures
  • 37:56in the trajectories,
  • 37:58so they would not be suitable
  • 37:59for a cyclic biological process
  • 38:03like just maintenance processes in cells.
  • 38:07And then, there are other methods
  • 38:08which actually try to find cell cycles,
  • 38:11and they would not be appropriate
  • 38:12for branching processes.
  • 38:16And I guess, as a no single
  • 38:19trajectory inference method,
  • 38:23accurately represents the biology.
  • 38:25So, it's all basically
  • 38:27some mathematical abstraction
  • 38:29of what might be happening in the cells.
  • 38:35And yeah, as an if...
  • 38:36If at the outset, you know
  • 38:37what kind of trajectory to expect, then it helps
  • 38:41in trying to
  • 38:45at least first really,
  • 38:46say whether the trajectory that you're getting
  • 38:50and the pseudo times that you get
  • 38:52is of any worth.
  • 38:55So, just to give you an example.
  • 38:58So, we started off with Monocle 2
  • 39:00as one of our examples in our paper,
  • 39:03and then we wanted to have another method
  • 39:05to compare the effects of different
  • 39:07trajectory inference methods.
  • 39:10And PAGA Tree was not necessarily the first one.
  • 39:13We tried a number of other ones,
  • 39:14which did not.
  • 39:16And we knew what to expect here.
  • 39:18We knew that there was stem cell
  • 39:21to ectoderm trajectory and endoderm trajectory,
  • 39:26or a branch of that.
  • 39:28And using basically, just the first,
  • 39:35I think we tried four methods
  • 39:38and PAGA Tree was basically the fourth method,
  • 39:39which gave us that kind of branching trajectory,
  • 39:42or branching topology for the biology.
  • 39:45And so, none of the methods you try
  • 39:49might necessarily mean anything,
  • 39:53unless you have some way of validating that.
  • 39:57So, at this point, I'm gonna switch
  • 39:59to spatial expression,
  • 40:04or a spatial data and special analysis.
  • 40:06So, if you have any questions
  • 40:08about the pseudo time analysis,
  • 40:12should we take it now, or?
  • 40:19<v Lecturer>Does anybody have any questions</v>
  • 40:20on the first half of the presentation here?
  • 40:26<v Dr. Deshpande>Oh, we can continue on,</v>
  • 40:27then we can come back later.
  • 40:34Shall we go on?
  • 40:41<v Lecturer>Sounds good.</v>
  • 40:42<v Dr. Deshpande>Okay.</v>
  • 40:48Okay, so that was all about,
  • 40:52say how pseudo time is used in our analysis.
  • 40:57And so, the other end of,
  • 41:03I guess, not necessarily end,
  • 41:04the other perspective
  • 41:05is how is space important and how,
  • 41:10what kind of data do we have,
  • 41:13which give us information about space?
  • 41:16So, the spatial context of cells
  • 41:18is very important in many biological processes.
  • 41:22For example, when immune cells respond
  • 41:24to an infection, or a wound, they need
  • 41:27to be in physical proximity of their targets.
  • 41:31Similarly with, I guess, cancer tumor growth,
  • 41:34and the immune response to cancer
  • 41:38happen through intracellular signaling.
  • 41:40Either through cytokine secretion,
  • 41:42or through surface receptors on adjacent cells.
  • 41:48Just knowing the relative location
  • 41:50of different cell types can also
  • 41:52be very informative.
  • 41:53For example, in the figure here,
  • 41:57the information about the presence
  • 41:58of various immune cell types nearest tumor,
  • 42:02and the extent of immune deficient
  • 42:04in the tumor are essential prognostic markers.
  • 42:08And so, single cell RNA-seq,
  • 42:13as good as it is, it associates a cell
  • 42:15from its tissue, due to which
  • 42:18we lose the spatial context of the cell states.
  • 42:21But in recent years, we have been able
  • 42:24to develop both
  • 42:28as in spatial proteomics,
  • 42:30which help you to image protein
  • 42:35and densities of say, up to 30 markers
  • 42:40at single cell resolution in the tissue.
  • 42:43As well as spatial transcriptomics,
  • 42:46which can measure 20,000 genes at spots
  • 42:51in the tissue.
  • 42:53And this was named method of the year last year
  • 42:57in 2020, yeah, that was last year.
  • 43:01So, here's just a workflow
  • 43:05of the next Visium technology,
  • 43:06which is one of these
  • 43:07spatial transcriptomics technologies.
  • 43:10So, this includes 5,000 barcoded spots on slide.
  • 43:16And these are added to the cells in the...
  • 43:21Which are located in those spots.
  • 43:24And this helps preserve the spatial context
  • 43:26of the cells to the actual sequencing.
  • 43:30Now, this technology is not exactly single cell.
  • 43:33It still provides a lot of useful spacial detail.
  • 43:41So yeah, for explaining this project,
  • 43:47I will use the 10x Visium sample,
  • 43:51provided by 10x genomics
  • 43:53of a breast cancer tissue.
  • 43:55So, the figure on the left
  • 43:57is an H and E slide, it's hematoxylin
  • 44:01and eosin stain slide,
  • 44:03which helps pathologists annotate
  • 44:08the sample for tumor, and lesions, and so on.
  • 44:14And the second image is that slide annotated
  • 44:19by a pathologist, and you can see
  • 44:22that there are different biology's
  • 44:25in this one slide.
  • 44:27And for example, the lesion on top
  • 44:29is an invasive cancer lesion, which means
  • 44:31that it can spread beyond the breast tissue,
  • 44:33but the other lesions correspond
  • 44:35to DCAs lesions,
  • 44:36which are not yet classified as invasive,
  • 44:39they could in the future be invasive.
  • 44:42Other important annotations are those
  • 44:43of immune cells and the stromal cells
  • 44:47in between these lesions.
  • 44:50For a good clinical outcome, you would hope
  • 44:52that immune cells can infiltrate these lesions.
  • 44:56And so the figure on the right shows
  • 44:59the same H and E slide
  • 45:02with overlaid Visium spots.
  • 45:05So, each of these spots correspond
  • 45:07to one measurement.
  • 45:10So, this slide shows a couple of examples
  • 45:14of spacial gene expression.
  • 45:17So, the figure to the left
  • 45:19is the same annotated H and E slide
  • 45:21that will help us keep track
  • 45:23of the biology in the slide.
  • 45:27And so, the first figure, the middle figure,
  • 45:30basically it shows the expression of CD8A,
  • 45:33which is a marker of cytotoxic T-cells.
  • 45:36Now, we see this gene expressed
  • 45:37in the blood near the invasive and DCAs lesions,
  • 45:42which means that the immune cells
  • 45:44are responding to a tumor.
  • 45:45However, we see that
  • 45:47there's not much infiltration of these cells
  • 45:49within the lesions.
  • 45:51The second marker is CD14, which is found
  • 45:54in macrophages and dendritic cells,
  • 45:57and its expression is much higher
  • 45:58inside the lesions, which could point
  • 46:00to successful infiltration of these cell types.
  • 46:04Now, just a reminder, these the measurements
  • 46:08that we get from 10x Visium
  • 46:10are not exactly single cell, but they're near,
  • 46:13near single cell.
  • 46:15In a sense that each of these spots
  • 46:16is 55 micro meters wide.
  • 46:19And depending on what cell type
  • 46:22you might have in that spot,
  • 46:23it could have anywhere from one to 10 cells.
  • 46:27And immune cells are much smaller,
  • 46:28so there could be up to 10 immune cells in it,
  • 46:30but maybe only one cancer,
  • 46:32or epithelial cell in that spot.
  • 46:35So, as a result of gene expression
  • 46:36of that spot is the average
  • 46:38of the cells inside it.
  • 46:42Now, our lab has a method called CoGAPS,
  • 46:47oesophageal CoGAPS, which is a Bayesian
  • 46:50Markov chain Monte Carlo method
  • 46:51for nonnegative matrix factorization.
  • 46:54And so, as a result of say,
  • 46:58the 10x Visium measurement,
  • 47:01we now have a high dimensional matrix
  • 47:04with 20,000 genes and around 5,000 spots.
  • 47:08And what CoGAPS does is it helps
  • 47:12to factorize this matrix
  • 47:15into two low rank matrices,
  • 47:18both of which are non-negative,
  • 47:21which correspond to latent patterns in the data.
  • 47:25And in the past, we have seen
  • 47:27that these two correspond to biology's
  • 47:31based on the pattern markers.
  • 47:34So, the two matrices that CoGAPS factorizes
  • 47:38the dataset into are the amplitude matrix,
  • 47:41which has say, 20,000 rows for 20,000 genes
  • 47:44and N columns for the end patterns.
  • 47:48And this helps us identify groups
  • 47:50of co-expressed genes,
  • 47:52which correspond to the patterns.
  • 47:54And the pattern matrix has N rows
  • 47:57and 5,000 columns, and they associate the spots
  • 48:01on the sample with patterns.
  • 48:04So, because of the nature of the CoGAPS,
  • 48:08factorization, and these, the columns
  • 48:11of the matrices here, or the rows of the matrices
  • 48:14here are not really orthogonal.
  • 48:15They are independent, but not orthogonal.
  • 48:17So, they could co-exist in spots,
  • 48:20or a gene could be present in multiple processes,
  • 48:25and multiple patterns,
  • 48:25which correspond to processes.
  • 48:29So, when we apply CoGAPS to the Visium data,
  • 48:37so the first try was basically
  • 48:39just five patterns, and when we apply it
  • 48:43to try and find five patterns
  • 48:47after a factorization, we see that
  • 48:50a number of them correspond
  • 48:51to the pathology annotations
  • 48:54that we see on the figure on the left.
  • 48:57So, we find a pattern which corresponds
  • 48:59to the immune cells.
  • 49:01We find a pattern which corresponds
  • 49:04to invasive carcinoma on the top left here.
  • 49:07And we also find a pattern which corresponds
  • 49:09to the DCAs lesions.
  • 49:12And as we increase the dimensionality
  • 49:15of CoGAPS factorization, we start seeing more
  • 49:18and more tissue heterogeneity.
  • 49:20For example, we now see three patterns
  • 49:23which are associated with the mesial carcinoma,
  • 49:26and we can see that they correspond
  • 49:27to different regions in that lesion.
  • 49:31And this for example is completely internal,
  • 49:33which has no interaction with immune cells.
  • 49:37We have a pattern which corresponds
  • 49:39to immune cells, we have a pattern
  • 49:41which corresponds to the stromal cells.
  • 49:43And we also have different patterns
  • 49:47which highlight individual DCAs lesions.
  • 49:51So, one could say that potentially it's trying,
  • 49:53it is finding biology's,
  • 49:57which are unique to these DCAs lesions.
  • 50:04So, we can analyze the A matrix
  • 50:07to identify groups of genes associated
  • 50:09with each pattern, and we call these
  • 50:11the pattern markers.
  • 50:12And these help us identify pathways
  • 50:15that are likely expressed in these patterns,
  • 50:18or because now, especially in this sample,
  • 50:20we see a one to one association
  • 50:23between the pattern and the biology,
  • 50:25also in the biology, basically.
  • 50:29So, let's see, how long do we have.
  • 50:35I think we're close to...
  • 50:38I'll quickly rush through these.
  • 50:40So, the other analysis that we can do is given,
  • 50:45let's say two of these patterns,
  • 50:49we can try to see how these patterns interact.
  • 50:52So, you can see that these patterns
  • 50:54have a lot of spatial structure to it,
  • 50:58which CoGAPS was not told about.
  • 50:59CoGAPS, the parameters that Co-GAPS uses
  • 51:01have no special information,
  • 51:03and it's still found these spatial structures.
  • 51:06So, and we also see that these patterns
  • 51:08are adjacent to each other and we want
  • 51:10to see how they interact.
  • 51:11So, what we do is we find,
  • 51:15basically we estimate the kernel density
  • 51:18of each of these patterns, which is a function
  • 51:22of both the pattern intensity at a spot,
  • 51:25as well as the spatial clustering
  • 51:28of hyper intensities.
  • 51:30And we compare that against
  • 51:32another distribution obtained by
  • 51:35the density estimation after randomizing
  • 51:37the locations of these pattern densities.
  • 51:40So, the intensities which are beyond
  • 51:43distal distribution are the ones that we...
  • 51:48Are the spots which correspond
  • 51:49to these outliers are the ones
  • 51:51that we count as hotspots of pattern activity.
  • 51:55Similarly, we can find the hotspots
  • 51:57of immune response.
  • 51:59And when we combine both of them,
  • 52:02we find regions where cancer is active,
  • 52:09regions where immune cells are active,
  • 52:11and regions where both of them are active.
  • 52:14And this is the interaction region.
  • 52:16And in this region, we are trying
  • 52:17to find genes which correspond
  • 52:19to this interaction between cancer and immune,
  • 52:25and which are not necessarily markers of...
  • 52:27And regular markers of cancer and immune.
  • 52:29So, genes which are specifically related
  • 52:32to the non-linear interactions
  • 52:34between these patterns.
  • 52:38And to that end, basically we hypothesize
  • 52:40that since CoGAPS is already
  • 52:45an approximation of the dataset
  • 52:47with a linear combination of the patterns,
  • 52:50the residuals of CoGAPS,
  • 52:51of the CoGAPS estimate from the dataset
  • 52:55could point us to the non-linear interactions
  • 52:58between the patterns.
  • 53:01And we are only looking at the region
  • 53:06where both of the patterns are active
  • 53:09and comparing the residuals of CoGAPS
  • 53:13in that region to the residuals
  • 53:15in only the cancer region,
  • 53:16and only the immune region.
  • 53:18And now, this can be done for each of these,
  • 53:24I guess, pattern combinations,
  • 53:25and we can find what corresponds
  • 53:29to pattern interaction
  • 53:30between these pairs of patterns.
  • 53:32So, for future work, as part
  • 53:35of the data collection in clinical trials,
  • 53:37we're already collecting both spacial
  • 53:42and single cell transcriptomics
  • 53:44and proteomics from patients.
  • 53:47So, we are trying to integrate all
  • 53:50of this into one big dataset,
  • 53:54which would represent the tumor microenvironment,
  • 53:59which would help us characterize
  • 54:03the patient sample as a whole.
  • 54:05And we would also like
  • 54:07to infer intracellular signaling networks
  • 54:10the same way as we were trying to do
  • 54:12it using time, but now using space
  • 54:14where intracellular signaling is a function
  • 54:18of the distance between the cells
  • 54:20and the types of neighboring cells
  • 54:21for a target cell.
  • 54:26And the learnings from these projects
  • 54:27would go into a spatial temporal model
  • 54:31of tumor growth and response to therapy,
  • 54:33which can be used into building
  • 54:36a digital patient or digital clone,
  • 54:39where we can try to test what therapies
  • 54:43might work on what patients.
  • 54:48So, these are the people who have been,
  • 54:50and of course, 10x Genomics,
  • 54:52who were kind enough to give us the sample
  • 54:54for studying, as well as my collaborators
  • 54:59on this project.
  • 55:01Thank you so much.
  • 55:02And I can take questions now,
  • 55:04sorry for the overshooting time.
  • 55:10<v Lecturer>Thank you so much.</v>
  • 55:11Do we have any questions to look at?
  • 55:22People on Zoom? Yeah, question (mumbles).
  • 55:25<v Female Student>Going back</v>
  • 55:25to the time series slides.
  • 55:28<v ->Mm-hmm.</v>
  • 55:29<v Female Student>Can you talk</v>
  • 55:29about how you know if you have good,
  • 55:31or bad pseudo times?
  • 55:32And is there a way to fix bad pseudo times?
  • 55:35<v ->So, yeah, as in what I've not shared on here</v>
  • 55:39is so, in our experiments,
  • 55:43we also, we knew for example,
  • 55:46that we were studying...
  • 55:47We wanted to study a trajectory which goes
  • 55:49from stem cells to neuroectoderm,
  • 55:56and we had markers.
  • 55:57And I think, some (mumbles) themselves.
  • 56:01They have identified markers
  • 56:03of stem cells neuroectoderms and endoderm cells.
  • 56:09So, if we're looking at the trajectories
  • 56:10of the markers along the pseudo time
  • 56:13to see if those make sense.
  • 56:15For example, a marker which is supposed
  • 56:18to be high in stem cells would,
  • 56:21should be tapering down to zero
  • 56:23along pseudo time, and a marker,
  • 56:26which is supposed to be high in neuroectoderm
  • 56:30should be increasing with pseudo time.
  • 56:34So, we had, I think six oral markers
  • 56:38to each of stem cells, neuroectoderm
  • 56:41and endoderm cells.
  • 56:46And we were trying to confirm the combination
  • 56:49that neuroectoderm markers increase
  • 56:52with pseudo time, but the other two decrease,
  • 56:54or the endoderm shouldn't decrease necessarily,
  • 56:58but it shouldn't have
  • 57:00a monotonic increase like the neuroectoderm one.
  • 57:08And it should not be present in the initial.
  • 57:12Does that...
  • 57:14So, that was one way to do it, basically.
  • 57:21<v Lecturer>Thank you.</v>
  • 57:22Any other questions?
  • 57:40So, with the combination of many cells,
  • 57:41and the spatial stuff, is there any hope
  • 57:44of getting a temporal signal out of any of that,
  • 57:46or is that (indistinct)?
  • 57:50<v ->In spatial did you mean?</v>
  • 57:52<v Lecturer>Yeah.</v>
  • 57:53<v Dr. Deshpande>So, I think,</v>
  • 57:59the issue would be, I guess,
  • 58:00not in clinical, I suppose.
  • 58:04In a sense that, okay, are you thinking
  • 58:06about pseudo temporal, or just clinical?
  • 58:08<v Lecturer>Yeah.</v>
  • 58:10<v ->Pseudo temporal, I think there might</v>
  • 58:11be some possibility,
  • 58:12and I've been thinking of
  • 58:18as in, we would still have to isolate,
  • 58:20I guess, cell types, for example.
  • 58:22So, one of the problems with that
  • 58:24is that as I mentioned,
  • 58:27the spots are not exactly single cell, right?
  • 58:31So, especially, let's say if you're trying
  • 58:33to do a pseudo temporal ordering
  • 58:36of CD8 T-cells,
  • 58:39they are more,
  • 58:41more likely than not, co-localized
  • 58:44with other cell types, which would also,
  • 58:49I guess, corrupt the expression
  • 58:51that you are seeing.
  • 58:53So, that would make it slightly different.
  • 58:57We could think of ordering the spot
  • 59:01as a whole, basically.
  • 59:03And my...
  • 59:05I belong to a school of thought that basically,
  • 59:07if you have a...
  • 59:08And then, so what people try to do
  • 59:10with say, this kind of data,
  • 59:13this spacial Visium data,
  • 59:15where you have say, up to 10 cells,
  • 59:17they try to resolve this into cell types.
  • 59:23So, they would compare that to, there is I think,
  • 59:25one paper called RTCD, or RCTD.
  • 59:30RCTD robust cell type decomposition.
  • 59:33So, what they do is basically,
  • 59:36they take the spatial data,
  • 59:38they have a reference single cell data,
  • 59:41and they try to assign each spot,
  • 59:47or a resolve each spot into a mixture
  • 59:51of the cell types that might exist
  • 59:54in the single cell data.
  • 59:57And that could help you to say,
  • 01:00:01identify what the mixture in general is.
  • 01:00:04But my as in my thought is that we could
  • 01:00:09just think of each spot as some representation
  • 01:00:15of the biology in that neighborhood.
  • 01:00:17So, each spot could just represent
  • 01:00:20a neighborhood, as opposed to trying to find
  • 01:00:22what the individual cells are.
  • 01:00:25And that would basically abstract out
  • 01:00:30the representation and the biology to that
  • 01:00:33of the spots.
  • 01:00:35And we'll have to think about how to do that,
  • 01:00:37but I think there could be some ordering to that,
  • 01:00:40but we'll need to see what makes sense.
  • 01:00:45And then, for a lot of cells, cell states,
  • 01:00:49they are quite well-characterized.
  • 01:00:51For example, if you say that a T-cell
  • 01:00:53is activated, or a T-cell as naive,
  • 01:00:55or exhausted, you know what markers to expect.
  • 01:00:59But what would you be able to say
  • 01:01:02for spots instead?
  • 01:01:05The other thing to think of is,
  • 01:01:10especially with say, the proteomics as well,
  • 01:01:12where you can get actual single cell
  • 01:01:18and distributions, and neighborhood characterization.
  • 01:01:22You could think of it as can you,
  • 01:01:27so the same thing that...
  • 01:01:28The same ideas that were used
  • 01:01:31for pseudo temporal ordering of cells,
  • 01:01:34can they be used for pseudo temporal
  • 01:01:36ordering of neighborhoods?
  • 01:01:39For example, if you have a cell neighborhood,
  • 01:01:41which as they're presented as whatever,
  • 01:01:45the central cell, and it's five neighbors.
  • 01:01:49Now, depending on, are they all tumor?
  • 01:01:52Then maybe they have...
  • 01:01:53They're basically deep in the cancer,
  • 01:01:54which has never been visited by an immune cell,
  • 01:01:58is that a mix of tumor
  • 01:01:59and activated immune cells?
  • 01:02:02So, that is basically an active tumor
  • 01:02:04immune interaction that's happening.
  • 01:02:06Is that exhausted T-cells and tumor,
  • 01:02:10where basically the tumor
  • 01:02:11has fought back and tried to suppress the...
  • 01:02:16Or it's basically sent signals
  • 01:02:17to suppress the immune response, and so on.
  • 01:02:21So, perhaps there could be
  • 01:02:22a trajectory of neighborhoods,
  • 01:02:25where you could say that depending on all
  • 01:02:29the possible combinations that you expect
  • 01:02:31in cellular neighborhoods,
  • 01:02:35this current neighborhood is this far along
  • 01:02:40that process, or that branch of a process.
  • 01:02:44That was a long and winding answer.
  • 01:02:47(chuckles) I don't know if
  • 01:02:49that necessarily answered it. <v Lecturer>Thank you.</v>
  • 01:02:52Thank you, any last questions?
  • 01:02:54I wanna be mindful of time.
  • 01:02:56Any questions that come to you, or?
  • 01:03:06All right, well if not, thank you again.
  • 01:03:09(students applaud) We really appreciate that.
  • 01:03:11<v Dr. Deshpande>Thank you a lot.</v>
  • 01:03:15<v Lecturer>You have a wonderful (indistinct).</v>
  • 01:03:16<v ->Mm-hmm.</v>
  • 01:03:20(lecturer mumbles indistinctly)
  • 01:03:27(students chatter indistinctly)