YSPH Biostatistics Seminar: “Enhancing Biostatistics and Health Informatics Research Through Collaborative Cloud-Based Data Science Tools"
October 12, 2023Information
Stephen Larson, CEO and Co-founder, MetaCellAdria Haimann, Business Development Executive, MetaCell
October 10, 2023
ID10841
To CiteDCA Citation Guide
- 00:02<v ->All right.</v>
- 00:04In the interest of time, let's go ahead and get started.
- 00:08Hey everybody,
- 00:09thank you so much for coming today and this week seminar.
- 00:14It's my pleasure to introduce Stephen Larsson
- 00:16and Adria Haimann from Metacell.
- 00:20This is a few words of context here.
- 00:24We've talked about, we've had people,
- 00:26we started this semester with somebody from the hospital.
- 00:28We've had people from academia,
- 00:30we've had people from pharmaceutical companies.
- 00:33And so very excited to present something different.
- 00:37So Metacell is a company that works
- 00:40in sort of the research space.
- 00:43Near and dear to my heart.
- 00:44They've been, from their beginning, I think,
- 00:46very active in the computational neuroscience community.
- 00:52We both contributed to a project called NetPyNE
- 00:56for building models of computational neurons.
- 01:01But more broadly, they work in the greater
- 01:04health informatics space.
- 01:06And they're going to tell us a little bit
- 01:10about how we can enhance biostatistics
- 01:12and health informatics research
- 01:13through collaborative cloud-based data science tools.
- 01:16So let's welcome them.
- 01:20<v ->Thank you very much. Good afternoon everyone.</v>
- 01:22I can see some of the back of your heads,
- 01:24so I can imagine that I'm also, you know,
- 01:26virtually looking at your faces.
- 01:28Thanks so much for having us.
- 01:30I'm Adria Haimann and I work alongside Stephen at MetaCell.
- 01:33And as already mentioned, today we're gonna share with you
- 01:35some insights into how academics are using cloud-based
- 01:39collaboration tools to enhance their research.
- 01:42But before I kind of begin with this,
- 01:43I wanna provide you with some context.
- 01:45So, 10 years ago I was in your position,
- 01:48I was studying health economics
- 01:50at the London School of Economics,
- 01:52and I had joined a research team
- 01:54at the European Observatory for Health.
- 01:56And I was relatively new to this field
- 01:57and kind of found myself in a Catch 22
- 02:00that maybe you can relate to.
- 02:02So I wanted to know how can someone or a student or postdoc
- 02:05or researcher discover the best way to collaborate
- 02:08on their research and use new tools
- 02:10if you have fairly minimal experience,
- 02:12neither academia or in industry.
- 02:14So that's essentially what we want to show you today
- 02:17and what we'd love to share with you,
- 02:19if you could go to the next slide,
- 02:21which is kind of a collection of key topics
- 02:24of how researchers are doing just that,
- 02:27while also getting the most out of their data.
- 02:29So during this seminar,
- 02:31we're gonna cover different methods that you can share
- 02:33data analysis and introduce you to a specific cloud-based
- 02:36collaboration platform
- 02:38that we've created called Cloud Workspaces.
- 02:41And then we'll run you through some examples
- 02:43of how researchers are using this platform,
- 02:45as well as how we've formed an industry partnership.
- 02:48And then lastly, we wanna show you kind of other ways
- 02:50that this tool can be used in academic settings.
- 02:53And then of course, we'll open it up to you guys
- 02:55and encourage you to ask us questions
- 02:57on any of these topics.
- 02:59So I'll hand over to Stephen now.
- 03:02<v ->Thanks Adria for that great introduction.</v>
- 03:04And hello to all of you.
- 03:07I currently see you as tiny, tiny pixels on my screen
- 03:12because of the way this is viewed.
- 03:13So as much as I'd love to be there in person
- 03:16and looking into the whites of your eyes,
- 03:17I'm not gonna get that chance.
- 03:18But, I think we have a really good robust discussion
- 03:23for you guys that I hope you'll find very interesting.
- 03:27And thank you very much again to Robert for the invitation.
- 03:30So similar backstory on myself,
- 03:35I went through undergraduate training at MIT
- 03:40in computer science, did a master's in AI
- 03:43before it was cool again,
- 03:45and then shipped off to UCSD for a PhD
- 03:51in neuroscience with a computational specialization.
- 03:54So very much familiar with the academic experience
- 03:59and I'm really excited to share with you
- 04:06some of the things that I've learned since leaving academia.
- 04:09And one of those things
- 04:10has been to start this company, MetaCell,
- 04:14which I basically started as I was wrapping up my PhD
- 04:16and I kind of realized that I wanted to serve science
- 04:22in a different way than was gonna be possible
- 04:27just within the confines of academia
- 04:29because I realized that I was a builder
- 04:31and to build software that could,
- 04:36software tools that could be useful to, you know,
- 04:40tools that I would wanted to have had as myself,
- 04:43a graduate student.
- 04:44I would need to kind of put a professional team of folks
- 04:48together that, you know, really came outta industry
- 04:51and that are kind of high hard to higher end academia.
- 04:54So the story of this slide is, since then,
- 04:58all the different great groups
- 04:59that we've had a chance to work with,
- 05:01and you'll see a really kind of motley crew of logos
- 05:05that are present here from, you know,
- 05:08really, really big pharma companies
- 05:12like Yale, you guys are on here,
- 05:14other universities that we've had the chance to work with,
- 05:18and then biotech companies,
- 05:21med device companies that we work with some,
- 05:25some of the US lots internationally.
- 05:29And realizing that, you know,
- 05:31the core thing that unifies all the work
- 05:34that we've been doing over time is the way
- 05:36that sort of math and computation can help us
- 05:40understand the life sciences.
- 05:41So hence I come to you today in a biostatistics seminar
- 05:46to talk about, you know,
- 05:47some of the other pieces of the puzzle
- 05:50that go into advancing the life sciences in that way.
- 05:56So, let's start with a really simple, simple example, right?
- 06:04So let's say you're doing some kind of analysis
- 06:08on some kind of bio data, okay?
- 06:13Perhaps in the statistics context, you're using SaaS.
- 06:17In a computational neuroscience context,
- 06:20you may be using Python and the Python suite of tools.
- 06:26Some in the statistics field are using R open source,
- 06:29you know, statistics packages.
- 06:30Whatever it is, you've got some data, you know,
- 06:33maybe you're analyzing it on behalf of yourself,
- 06:35maybe you're analyzing on behalf of your lab,
- 06:37the group that you're working with.
- 06:38Maybe you're analyzing it in terms of a company.
- 06:41Whatever it is,
- 06:42you wanna share that data analysis with somebody else.
- 06:44You're probably gonna have to gather
- 06:47some history of those commands together.
- 06:50Maybe it's packaged up as a script, maybe not.
- 06:53You're gonna send that file
- 06:54to somebody else very often.
- 06:57And then you're also gonna wanna somehow
- 06:59collect the outputs of that, right?
- 07:01The figures, the diagrams, the summary statistics,
- 07:05the result of T-tests, you know,
- 07:08things like this, right?
- 07:09And send that output somewhere, right?
- 07:12So, you know, that is a problem time immemorial.
- 07:16And you know, as long as I've been, you know,
- 07:20working in this space still, you know,
- 07:23it's very common to just do this
- 07:25and it's maybe send this over email, right?
- 07:29It's still a practice that I'm sure you know, happens.
- 07:32And so, and that's probably just fine, you know,
- 07:35in many small circumstances.
- 07:37But as that scales up, there's problems of reproducibility,
- 07:42there's problems of, you know,
- 07:44keeping track of who sent what.
- 07:46Email is not a great file management system.
- 07:48So we've been thinking a lot over the course of our company,
- 07:55which is, we've been around now,
- 07:56this is our 13th year about how, you know,
- 08:00the cloud and the internet basically can come into that
- 08:02in any better way than sending email along.
- 08:05And so we've thought a lot about, you know,
- 08:08what starts to happen when there's a computer that lives
- 08:11in the cloud that multiple people can jump into and join.
- 08:15And what is, you know, how does that work in general?
- 08:18It's something that we're not only just us doing, right?
- 08:22This is an idea that's been there for a while.
- 08:24Anybody familiar with like, say Python Notebooks, right,
- 08:27are aware of this idea.
- 08:29There's tools like Google Colab,
- 08:31and then we've even been talking to major universities,
- 08:34like we've been having a conversation
- 08:35with Harvard Medical School,
- 08:37where they've been working collaboration with Amazon
- 08:39to kind of work together with them to set up computers
- 08:43that are in the cloud.
- 08:44Similarly, of course, there's gonna be what happens with,
- 08:49at like, at your local university
- 08:50with your local computing infrastructure.
- 08:52Typically that's based around supercomputers that are there
- 08:56for doing like really powerful computations or calculations.
- 08:59Things that are very data intensive.
- 09:01A workspace in the cloud is sort of in between.
- 09:02So it's kind of like, you know,
- 09:05just a laptop that isn't your physical laptop,
- 09:09but it's like a laptop that's somewhere else in the cloud
- 09:11that you can log into and do some analysis with.
- 09:14And it basically lives as long as you wanna do that analysis
- 09:16and then it goes away
- 09:18if you don't need that analysis anymore
- 09:20or it can stay there as long as your lab is around, right?
- 09:22And then go away if you don't need it anymore.
- 09:25So the idea is then in this story,
- 09:27instead of just gathering the history of commands,
- 09:29sending the file and sending the output of the file,
- 09:31what if, right you could do all that in the context
- 09:34of a computer that multiple people
- 09:37can join and look at, right?
- 09:39Work in that same environment.
- 09:40When you log out,
- 09:41it's exactly where you left it, right?
- 09:43Like if you know your computer gets misplaced
- 09:47or you drop it, you know, off a bridge into a river,
- 09:50like, doesn't matter 'cause
- 09:51all this stuff is preserved, right?
- 09:54So, how does that idea start to change the basic practice
- 09:57of interacting with data and doing analysis like this
- 10:02if you were to change that one variable okay?
- 10:05So that's sort of the starting premise for our chat today.
- 10:09So, you know, what that might look like is, you know,
- 10:13a session one-on-one or two-on-one with multiple people
- 10:16where you get, you know, perhaps one of you in the future.
- 10:22In the case that we've been doing in our company,
- 10:24one of our staff members, who has experience
- 10:28in doing a different kind of data analysis.
- 10:32In our case, we work on a variety of problems,
- 10:36but one of the major ones we worked on
- 10:37is like the imaging of calcium signals
- 10:42in neural tissue okay?
- 10:45But you know, you might be on a call like this one and just
- 10:49the same way that you might meet with your lab members on a
- 10:50Zoom call, you might meet with someone
- 10:54with experience in data analysis or biostatistics
- 10:56that is not in your lab or not in your even organization.
- 11:01It might be somewhere remote,
- 11:02maybe at another university or in a company like ours.
- 11:06But what they might get as the experience of that is
- 11:13jointly logging into this workspace that lives in the cloud.
- 11:17And if SaaS is the thing you wanna use,
- 11:20you might find a whole SaaS instance there
- 11:22in a desktop that you can log into.
- 11:25But the point being that multiple people now can type on it
- 11:27as opposed to like physically handing your laptop around
- 11:30in the lab or even just screen sharing it
- 11:33in some kind of a lab meeting, right?
- 11:35It's actually allowing for people to jump into the same
- 11:38application and literally like trade off
- 11:40on like typing commands into it.
- 11:43Kind of like what you get with a Google Document
- 11:46or a Google Spreadsheet, right?
- 11:48That real-time collaboration,
- 11:49but now for any kind of application.
- 11:52So that's one experience you might have.
- 11:54Not just SaaS, right?
- 11:56So a Jupyter Notebook, as I mentioned before,
- 11:58is another thing that you can use.
- 11:59And those of you who might be using,
- 12:01again, the more open source technologies,
- 12:03if you might be using R Statistics or using Python
- 12:05or whatnot, you'd be familiar with, you know,
- 12:08a Jupyter Notebook.
- 12:11So it's based around, you know,
- 12:13this idea of putting a computer in the cloud,
- 12:16multiple folks logging into it,
- 12:18and then being able to sort of transport
- 12:21your expertise around the world.
- 12:25Because in addition to the knowledge of doing analysis
- 12:31being shipped around,
- 12:32data can also come into this workspace
- 12:34as an intermediate space that's private to a given lab,
- 12:39but allows for a different kind of model on sharing data
- 12:43where it sort of stays under the control of the lab,
- 12:47you know, whoever puts it there can take it back,
- 12:49that kind of thing.
- 12:51Okay so we've been exploring this model
- 12:54and we've also been talking to other organizations
- 12:57and universities about this model and how to use it,
- 13:00how to implement it, right?
- 13:02As I mentioned, we've been talking to folks like
- 13:05at Harvard Medical School that partner with Amazon
- 13:08to bring these sorts of instances into their
- 13:11labs and what can be done with it.
- 13:13So I'm gonna wanna talk a little bit
- 13:14about like some of those details,
- 13:16and I'm saying it here in the context of our product,
- 13:19but I'm not trying to sell you anything.
- 13:20I'm really trying to talk about it
- 13:21more in the context of what can be done.
- 13:24So thinking about it, like,
- 13:28so I mentioned SaaS as an example.
- 13:29I mentioned Jupyter Notebooks as an example,
- 13:31but there might be other kinds of software
- 13:34that are more particular to a use case,
- 13:36like MATLAB's another one that could be installed.
- 13:38But there might be even more specific software
- 13:40that might need to be set up or run.
- 13:44Sometimes, for example, survey software
- 13:47where you might collect data from a very particular kind of
- 13:52survey system and you need something to work with it.
- 13:54So imagine that,
- 13:55like for the use case that you might have, right,
- 13:58you could have a workspace that is set up
- 14:02so that all that software comes pre-built
- 14:03once you set it up.
- 14:05Much like, you know, having laptops
- 14:07that have come pre-configured with a certain set of tools,
- 14:10but instead of handing out physical laptops,
- 14:12it's on the cloud.
- 14:14The virtual collaboration,
- 14:15I think I've gone through a lot, the multiple workspace,
- 14:18I think I mentioned also.
- 14:20Data security I kinda mentioned, you know,
- 14:23anybody who's doing data analysis
- 14:26with anybody who has, you know,
- 14:29talking to somebody that they weren't the ones
- 14:30to collect it, I'm sure has run into challenges
- 14:32where folks are reticent to, you know, share data.
- 14:37So that's why in this context,
- 14:38it's really important to note that like, you know,
- 14:41we can lock that environment down
- 14:42and make sure that only the people that can log into it
- 14:44have access to it, that's a really important point.
- 14:47So it's not really like the data
- 14:49are going out of somebody's control.
- 14:51Again, they're kept in a place
- 14:52where anybody who wants to can remove
- 14:53that data again and delete it.
- 14:57And then if there were to be very computationally aggressive
- 15:01things to do, it's very easy to scale it up.
- 15:05And that's something that folks also like.
- 15:10So how, you know, how are ways that this kind of workspace
- 15:14can support biostatistics research
- 15:17and data analysis in general.
- 15:18So I mentioned data science as a service
- 15:20a little bit in this example.
- 15:22So this would be the case where any organization
- 15:26who say doesn't have biostatistics
- 15:29or data science expertise local to them
- 15:32might be interested in sort of renting time
- 15:36or having some part-time person come in to help with that.
- 15:40And that's a model that we've seen work well
- 15:42both for labs and for companies.
- 15:44One way in which labs really like it is new PIs
- 15:49with a startup package that just, you know,
- 15:51first few weeks into their appointment
- 15:54with an R one, right, no staff yet.
- 15:57Nobody, but they're coming in with data from their previous,
- 16:03you know, from their postdoc basically.
- 16:06And what do they do, right?
- 16:07They need to write grants, they need to like hire staff,
- 16:10they need to do all these things.
- 16:12So we've actually found labs are very happy
- 16:15in that circumstance just to get going, you know,
- 16:19to be like, "Hey, I have this data,
- 16:20I haven't analyzed it yet.
- 16:21I really wanna put in my grant proposals.
- 16:23I just need somebody to kind of sit with me virtually
- 16:27and run through this data,
- 16:30so that I can get these figures
- 16:33made and get my grant out, right?"
- 16:34And I just don't have time
- 16:36to bring on a full person to do that.
- 16:37So data sciences service can be very useful for that.
- 16:40Data standardization and sharing as a service.
- 16:42So, you know, I'm not sure how much it's affecting folks
- 16:46in the room, but the NIIH over time
- 16:48has gotten increasingly serious about making data sharing
- 16:55happen for real for real,
- 16:56and not for fake for real, right?
- 16:58And so this year in particular,
- 17:01a new policy from NIIH has come out, DMS policy,
- 17:05where they're really, really asking for even, you know,
- 17:09grant proposals to have a whole data management
- 17:11strategy figured out upon submission.
- 17:15And even, you know, saying you need to set aside
- 17:19some budget for that
- 17:20'cause it turns out data sharing doesn't happen for free,
- 17:22doesn't happen for free, you know,
- 17:24for PIs for their time, right?
- 17:26So that's also something where, okay,
- 17:29I don't have the expertise to figure out
- 17:30which of the billion databases I might share my data in.
- 17:34Could somebody come in and help do that?
- 17:36Well how do you do that?
- 17:37You know, when I did work in the neuroinformatics
- 17:41space as a graduate student
- 17:43and I was trying to help figure out for neuroscientists
- 17:47how to get data that they had, you know, collected
- 17:50in a very laborious process of experimental collection,
- 17:55was trying to help them share their data
- 17:57'cause they wanted to comply with these policies
- 17:59even back then, you know, very frequently I would
- 18:04get the challenge of like,
- 18:05"Yeah, it's in a hard drive under my desk, right?
- 18:08Physical hard drive sitting under my desk, right?"
- 18:10Like, okay, so you can go pick it up and like take it away
- 18:14and do something with it.
- 18:15But you know, they don't have the expertise, you know,
- 18:19locally to even know, okay, now we're gonna plug it in
- 18:22and we gotta look through it
- 18:23and like, oh, the PhD student is left three years ago.
- 18:27And like, how do I do that?
- 18:27So the idea of, okay, if all we can do is like take that
- 18:31hard drive from under the desk
- 18:33and like plug it in the cloud, share it on Dropbox,
- 18:37okay, something like this or you know,
- 18:39have a conduit to get it to the cloud,
- 18:41share that folder in a workspace online
- 18:43and then have somebody else that does this all the time
- 18:47like go through all that and do their best to start,
- 18:49you know, documenting what they find,
- 18:51maybe raising questions that they might find, you know,
- 18:54to present to the PI,
- 18:55"Hey, I know your PhD student left three years ago,
- 18:58but you know, can you tell me a little bit
- 18:59about this experimental methodology?"
- 19:01There's now at least a hope that you can start,
- 19:03you know, standardizing that data,
- 19:05sharing it in a better way,
- 19:06making the NIIH not come kick down your door
- 19:09with the data sharing police force
- 19:11that I'm sure they're setting up now.
- 19:14Okay probably not.
- 19:16Okay a third way is through workshops.
- 19:21And I'll have some specific examples
- 19:23a little bit later about this one.
- 19:25But if you think about, you know,
- 19:27the experience of either physically traveling
- 19:30or doing what we're doing here
- 19:31and then being exposed to software, right?
- 19:36It's one thing to have slides show
- 19:37you pretty pictures of what software looks like.
- 19:39And it's another thing to say basically like,
- 19:43"Hey, log into, like go right now on your laptops
- 19:47and go hit this address"
- 19:50and like, here's your login and like while I'm explaining it
- 19:53to you, check it out, play with it, right?
- 19:57So we've actually found that also to be a really valuable
- 20:00way to do an extra level of education and demonstration,
- 20:05especially for tools built in academia,
- 20:09which generally have a pretty small audience, right?
- 20:11Not a lot of people use them maybe necessarily,
- 20:14or it's like a very niche community.
- 20:16So the total number of humans is not great.
- 20:18So to have the ability right now in a live session
- 20:21to be like, let me show you this software you log in right
- 20:24now, play with it can move the needle a lot on getting folks
- 20:27to use stuff that that there will really be tools
- 20:31that they will actually help them a lot.
- 20:33And then lastly, you know,
- 20:35collaborations between labs, right?
- 20:38Hey, we just set up a consortia,
- 20:40it's a five lab consortia
- 20:41and we're all studying this thing, right?
- 20:44It's a collaboration between the folks that are generating
- 20:46the data and the folks are gonna analyze the data.
- 20:48Okay, great, we got this really smart set of mathematicians
- 20:50who are gonna do all these great statistics, awesome.
- 20:53How do you get the data from point A to point B?
- 20:55Well email, right?
- 20:58So what if you can improve that, right?
- 21:01Or you know, the context of, you know,
- 21:04we also find companies wanna collaborate with each other's
- 21:06and then universities and companies wanna collaborate
- 21:08with each other also, right?
- 21:10So in ways that I haven't already listed,
- 21:13but just collaborations of whatever variety.
- 21:17So when it comes down to those things, right,
- 21:19it's one step better than just sharing on Dropbox
- 21:22and being like, here are the data, go check it out
- 21:24'cause you're keeping the analysis all together, right?
- 21:29It adds a layer of reproducibility
- 21:31to those kinds of collaborations,
- 21:32which are hard to match in addition to all the other things,
- 21:36all the great best practices for reproducibility.
- 21:40Okay so that's four ways to use cloud workspaces
- 21:43support biostatistics research.
- 21:47So let's, you know, I think I've kind of walked through this
- 21:51example already verbally,
- 21:52but I did have a slide specifically for it.
- 21:54So like this happens in research all the time.
- 21:57There's a lab that needs a particular analysis completed
- 22:00and they don't have the expertise in lab.
- 22:01What can be done?
- 22:02So typically the alternatives are, you know,
- 22:04bring in some student or a postdoc or collaborate
- 22:07with a lab that has some mathematical expertise
- 22:09to perform analysis.
- 22:11But that can be quite time consuming, you know,
- 22:13that might not deliver the results you're looking for.
- 22:16Secondly, right for folks who might, you know,
- 22:20be in a position, like I mentioned
- 22:21with early lab set up, right?
- 22:25Engaging some part-time data scientists from industry
- 22:27could help work on particular problems as needed.
- 22:31And that's interesting both perhaps
- 22:33from the perspective of me as a company,
- 22:35but also maybe interesting for yourselves
- 22:38thinking about a path through industry
- 22:41where you might be able to do biostatistics
- 22:45for multiple organizations at once, not just one at a time.
- 22:50And then it's also interesting,
- 22:51as I mentioned from the perspective folks
- 22:53that have the problem that need to get the analysis done.
- 22:57Okay so some case studies, does this happen?
- 23:03I sort of mentioned abstractly, it does,
- 23:05but these are five cases that we've worked on in our company
- 23:10and they are, many of them have a,
- 23:14well they all have the theme
- 23:15of being calcium imaging data, okay?
- 23:18So here, you know, swap out biostatistics
- 23:20for looking at data that comes from a microscope.
- 23:23But at the end of the day,
- 23:25that data from a microscope is basically a video stream,
- 23:31generally black and white images
- 23:33that then have to be post-processed.
- 23:36And from that video stream there's a spatial component
- 23:39of looking at a field of neurons under a microscope
- 23:44and a time component.
- 23:46Like how did those, you know,
- 23:49neurons activity change over time.
- 23:51But there's a lot of like statistical challenges
- 23:54that have to go into that.
- 23:55You need to separate the neurons out from each other, okay?
- 23:58They kind of overlapped on each other.
- 24:00So looking at a video stream, you're not always sure, right?
- 24:04If I'm looking at one neuron or two neurons.
- 24:06So you have to do some spatial analysis
- 24:08to separate those out.
- 24:09And then you wanna do some sort of peak finding over time.
- 24:13What you kind of wanna extract out is a time series
- 24:15of however many neurons you've detected
- 24:17in your field of view
- 24:19and then start to do some additional analysis.
- 24:21And that additional analysis will be based on
- 24:24the specifics of the experimental setup
- 24:26and like, you know, what part of brain were you looking at?
- 24:30What was your protocol that you applied
- 24:33and what kind of expectations
- 24:37do you have about the time series that you extracted?
- 24:41So these organizations that we work with, I guess, you know,
- 24:45four out of five are universities.
- 24:48So DGIST is Institute of Science and Technology
- 24:51in South Korea, McGill University in Canada,
- 24:58University of Penn, UPenn and University of Alabama.
- 25:04And then Maze, which is a small pharma company
- 25:09in San Francisco and they're all doing calcium imaging work.
- 25:14And I think we served all of these organizations
- 25:18within the same span of about six months.
- 25:22Each one of them had brought different data to the table.
- 25:27They're all generally in this form of video data
- 25:29with the calcium imaging to extract.
- 25:33All five of them were served
- 25:34by the same data scientist on our side,
- 25:38gentleman whose picture you saw earlier
- 25:41but they had very different scientific protocols, right?
- 25:44So it wasn't necessary that one person full-time
- 25:47over six months worked on each of these projects, right?
- 25:50Instead we have one individual,
- 25:52who's able to jump from project to project
- 25:54and check back in with multiple PIs/business leaders,
- 26:01managers to check in on the results of that, right?
- 26:05And that person never left their home, right?
- 26:08So our company is also fully remote, which is nice.
- 26:13And so I think that's a really powerful demonstration
- 26:17of what's possible for this kind of analysis,
- 26:19whereby, you know, essentially organizations
- 26:25in multiple different countries
- 26:27and different continent in one case, right,
- 26:29can all be served by the same person doing roughly
- 26:33having roughly the same skillset of data analysis
- 26:36but working on data that addresses very different scientific
- 26:40questions all at the same time.
- 26:43Okay, so that's a thing.
- 26:47And, in each one of these, I should say
- 26:49been done in this collaboration model that I mentioned
- 26:51where there's one workspace per organization, right?
- 26:57So each organization has their own workspace,
- 26:59they log into it, they can see the results
- 27:01of the data science work that happens.
- 27:04They have all in one way or the other,
- 27:06put data into the workspace, right?
- 27:09And, they've all sort of been able to pull figures back out
- 27:13again and direct the flow of analysis in the direction
- 27:19that they wanted through Zoom calls,
- 27:22like the one that I mentioned
- 27:23generally on like a weekly basis
- 27:25or every couple weeks check in.
- 27:28So yeah, a little bit more about the team behind that
- 27:34in terms of thinking about like what it takes
- 27:35to make that happen.
- 27:37While there is a little bit of like finding those labs
- 27:39and figuring out that they have that problem,
- 27:42which are not taken care of
- 27:45by the individuals on this screen.
- 27:46But I mentioned, I mentioned Phil, the PhD;
- 27:50another PhD, who's worked with us
- 27:52as data scientist is Marcus.
- 27:55And then kind of orchestrating behind the scenes,
- 27:57the standing up of these workspaces
- 27:59is a software architect, Zoran.
- 28:04Phil in the New York area, New York City area.
- 28:07Marcus is in China and Zoran is in the Netherlands.
- 28:13So again, interesting to think about the different
- 28:16geographies where folks come from being able to serve people
- 28:19in different geographies,
- 28:21but all of them when it comes to a project,
- 28:23like the center organizing node is a workspace.
- 28:27That is the thing that helps
- 28:28coordinate a lot of this together.
- 28:31There are a few other technologies that help.
- 28:34Those of you familiar with like a Kanban board
- 28:37or just really any kind of task driven software,
- 28:39you know, you can bring that to bear as well.
- 28:42So one of the ways you can organize work a little bit better
- 28:44than just sending emails back and forth
- 28:46is to encapsulate each task,
- 28:50break each task down into a card on a Kanban board.
- 28:53We like the tool called Trello,
- 28:56but there's lots of them out there
- 28:58that can be used for such things.
- 29:00And then, you know, one card per task
- 29:02is a nice way to organize things.
- 29:04And then using a practice from software engineering,
- 29:07you can actually sort of estimate
- 29:09in roughly how many hours, you know,
- 29:12the data scientists might think it would take
- 29:15to do a given task
- 29:16and then use that as a way to figure out
- 29:18like how long it's gonna take
- 29:20to do a certain kind of analysis.
- 29:21This is a practice that we actually use
- 29:23across my company for all sorts of tasks,
- 29:25not just data science,
- 29:26really organizing kind of everything that we do
- 29:28on the basis of making cards like this
- 29:31and moving things across.
- 29:32And I'm still surprised
- 29:33how many organizations don't use this.
- 29:36I have lots of friends in academia
- 29:38that do this just for their labs.
- 29:39You guys might do this in your labs, I don't know.
- 29:40But for organizing oneself,
- 29:44even if you do meet in person,
- 29:46having this sort of set up in the cloud
- 29:48can be very helpful for organizing work.
- 29:52Not sure how new or not new this is
- 29:54to those of you in the room, but something we use.
- 29:57And then of course there's Slack,
- 29:58which I think has pretty good adoption amongst academia.
- 30:03We do find almost every lab that we talk to
- 30:06pretty much is on Slack or some version of it.
- 30:10Companies are using Microsoft Teams,
- 30:12which I personally like less,
- 30:13but you know, but we use that too.
- 30:17But basically, you know,
- 30:20one thing that we do that maybe others don't do
- 30:23is to connect a Kanban board like
- 30:26the one that you saw to spit out notifications
- 30:28in a Slack channel at the same time,
- 30:31which can be really nice if you are a Slack based person
- 30:35to just like be able to see how tasks are changing
- 30:37and evolving in the feed,
- 30:40which then doesn't require an extra conversation, right?
- 30:42Like "Hey, so we agreed on Monday that you were gonna,
- 30:45you know, do that t-test on this survey data,
- 30:50how's that going right?"
- 30:52Well if they've moved that card,
- 30:55which was like T-test on survey data from the to-do column
- 30:58to the doing column,
- 30:59a little notification's gonna pop up in Slack.
- 31:02And then when they write a comment like, "Yep, you know,
- 31:04I ran the test and wasn't statistically significant,"
- 31:07then that's gonna pop up also.
- 31:09That comment will then be relayed into Slack.
- 31:11So then when you go back to check in,
- 31:13you don't have to ask that question.
- 31:13It's like, "Yep, I saw that it happened
- 31:15and by the way I saw that it happened on Tuesday,
- 31:18you know, now it's Wednesday, you know.
- 31:20I forgot to check back in with you about it."
- 31:23So like that idea of asynchronous work can happen
- 31:25in this cloud-based context also, which again,
- 31:29like we use also in all other parts
- 31:31of our company can be really helpful
- 31:33for moving projects along in lots of ways.
- 31:37So yeah I've told you a lot
- 31:42about a particular example then of doing work.
- 31:44I wanna call Adria back in here
- 31:47to extend a little bit more in a partnership example
- 31:52that we've had some experience with.
- 31:53So back to you Adria.
- 31:55<v ->Thanks, so one thing that Stephen mentioned was, you know,</v>
- 31:58another challenge we might face is,
- 32:00okay, where do we go find people who have data that
- 32:03they might need help with?
- 32:04And we were thinking about where does data come from, right?
- 32:08And so one area that data's generated
- 32:12from is through devices and manufacturers
- 32:15make devices that are sitting in labs.
- 32:17So we thought of the idea of let's have discussions
- 32:20with these manufacturers
- 32:21and see if we could form some sort of partnership.
- 32:24Now when you're forming a partnership in industry,
- 32:27you need to think about why that would benefit both sides
- 32:29in order to kind of engage your perspective partner
- 32:33as to why they should talk to you right?
- 32:34So one thing that we identified was that
- 32:37a key aim of manufacturers
- 32:39is to provide additional support
- 32:41to their customers or make sure,
- 32:43hey, I have a customer or a lab that has data
- 32:45and then what if there's an aspect of their data
- 32:48they don't know how to do something
- 32:51or they don't know what to do,
- 32:52maybe they'll stop using my device down the line
- 32:54because the data's just not useful to them at this point
- 32:57'cause they're lacking a skillset.
- 32:59So we thought of an idea whereby
- 33:01we could approach device manufacturers
- 33:03and kind of explain what Stephen explained
- 33:05about our data science as a service offering and say,
- 33:09"Hey look, we could form a partnership with you,
- 33:11whereby as an offering, in addition to extending a warranty
- 33:15on your device, you could offer custom analysis support
- 33:19or data science support to any interested customers,
- 33:22whereby they could use cloud workspaces
- 33:24to put their data that they're collecting
- 33:26and then they could work with someone like Phil
- 33:28to solve a challenge that they might have."
- 33:31And so we actually successfully
- 33:33did form such a partnership quite recently.
- 33:36And if you go to the next slide,
- 33:38you'll see, so we are now working
- 33:40with a company called Neurophotometrics.
- 33:43They produce a device that does the imaging
- 33:46that Stephen previously described.
- 33:48And what our partnership involves is we essentially offer
- 33:53cloud workspaces as a solution to their customers,
- 33:56whereby when they collect their data,
- 33:59they can then work on our cloud workspaces alongside Phil
- 34:02or ourselves and we can work with them
- 34:03to solve any challenges they might need.
- 34:06Now who are these customers of Neurophotometrics?
- 34:08They are a bunch of different labs kind of
- 34:11all over the world as well.
- 34:12Mostly academics, some in industry as well.
- 34:14And so it's that way for us as an organization
- 34:17to kind of find potential labs
- 34:20we didn't even know had the challenge.
- 34:22And then it's also solving the problem
- 34:25for NeuroPhotometrics of how do you keep your
- 34:26customers happy if you don't really offer a service
- 34:29they're already kind of asking of you
- 34:31as a follow-on for providing this device.
- 34:33So, so far the partnership is fairly new.
- 34:37It seems to be working quite well so far
- 34:40and we're meeting new people
- 34:41and already getting kind of more projects
- 34:43like Stephen described for Phil to work on.
- 34:45So we'll see how it goes.
- 34:46But this is just one way to show you
- 34:47that it's not just about kind
- 34:49of solving a problem for a customer,
- 34:51it's about where do you find your customers
- 34:53and that could be through an industry partnership.
- 34:57<v ->Awesome, thanks for that.</v>
- 35:02So I mentioned one other model earlier, which is workshops.
- 35:08I think I talked about that example for a bit.
- 35:11And we have done a few of them actually as well
- 35:17in the computational neuroscience space.
- 35:18So now the space near and dear
- 35:21to our work with Robert.
- 35:25So one of those projects was a collaboration
- 35:28actually Brown University on something
- 35:31called the Human Neocortical Neurosolver.
- 35:34We have kind of a neuroscience bias in the company.
- 35:38We like doing those sorts of things.
- 35:39So we did a workshop also.
- 35:44We helped facilitate a workshop
- 35:46that allowed a software tool
- 35:49that came out of this particular collaboration to be shown.
- 35:56And, let me show you a little bit more.
- 36:00So in this case, I'm actually gonna switch
- 36:04away from the Human Neocortical Neurosolver
- 36:05and also show you an example with NetPyNE,
- 36:07which is the thing that Robert mentioned earlier
- 36:09that we work with as well.
- 36:11It's similar to HNN.
- 36:13In both cases there's a computational model
- 36:15of a neuron, okay?
- 36:16Just think of like, you know,
- 36:18a spatial model of a neuron that has a cell body
- 36:22and has an axon and dendrite, that kind of thing.
- 36:25And you wanna simulate something about it.
- 36:28And so you have a specialized piece of software
- 36:34that knows how to look at the model of a neuron,
- 36:38the way that it's shaped
- 36:40and how to get signals out of it basically, right?
- 36:44So in collaboration with NetPyNE also a software platform
- 36:50called Open Source Brain at UCL
- 36:52that we've been partnering with for a while.
- 36:54You might have something that looks like this.
- 36:58So what you can do in a workshop context
- 37:03with something like a workspace that's really exciting,
- 37:05as I mentioned to you before is have people
- 37:07put hands on with the software itself.
- 37:09And this is one of those pictures
- 37:11from one of those workshop that we did,
- 37:14I think this one was specifically NetPyNE
- 37:16where you can kind of see what everybody's looking at.
- 37:18So everybody brought laptops in, right?
- 37:20And they're able to launch in this case
- 37:23they're literally, you can see several of 'em,
- 37:25like this one up in front and this one over here,
- 37:27they literally have exactly the same screen up
- 37:29that is being shown, you know, in the screen share,
- 37:33not because they're logged into a Zoom,
- 37:34but 'cause they're actually logged into essentially
- 37:37a workspace environment where they can also like, you know,
- 37:40change parameters around.
- 37:41So you can get this hands-on tutorial effect
- 37:43in a workshop, in this context.
- 37:46That is kind of hard to do any other way
- 37:50if you don't have that.
- 37:53If it's deployed as web-based software,
- 37:55that makes it a little bit easier.
- 37:56But if it's not, you know,
- 37:57if it's something that's traditionally supposed
- 37:59to be on a desktop,
- 37:59then this is kind of the only way to do something like that.
- 38:03And this was at a academic conference,
- 38:06I think CNS that gets held.
- 38:09So yeah, from all that today then
- 38:15kind of wrapping up the part where I just,
- 38:17we just talk at you and I hope those questions
- 38:20that you guys have, what do we sort of talk about today?
- 38:23Like how can some cloud-based data science tools
- 38:26help enhance the ability to do biostatistics
- 38:29health informatics research?
- 38:31I've been, you know, leaning on some examples
- 38:32that are heavily neuroscience based,
- 38:34but we kind of think that that's not the thing
- 38:36that's particular to this, right?
- 38:37It's still, you know, as I started at the beginning,
- 38:40you know, doing some analysis, you know,
- 38:42sharing the results of the commands
- 38:45that we're using in the analysis
- 38:47and then sharing the output of that analysis, right?
- 38:48Like that's where we began.
- 38:50I think that's common to every technique.
- 38:51We're bringing some kind of science and math
- 38:53to bear on some data, right?
- 38:55So what we're finding is that, you know,
- 38:57by using cloud-based platforms
- 38:59really can help us facilitate collaborative research,
- 39:02allowing colleagues to share data and work together.
- 39:05You can help labs efficiently gain access
- 39:08to additional data science support if that's desirable.
- 39:10That they, you know, otherwise might struggle to get
- 39:14or is just kind of unaffordable.
- 39:15Doesn't make sense 'cause there's too much of a person.
- 39:19And then finally in the last example, right,
- 39:21you can facilitate, you know,
- 39:23distance workshops that allow much more immediate
- 39:26hands-on experience with certain software.
- 39:29So with all that, I will thank you all for listening
- 39:36to us for a full 40 minutes
- 39:38and happy to take any questions that you have on this
- 39:41or any other thing I can help directly.
- 39:44Thank you very much.
- 39:46<v ->Thank you so much.</v>
- 39:50Does anybody have any questions for our presenters?
- 39:57I'll start if there's no questions.
- 40:01So data science is a service growth industry.
- 40:07People want jobs.
- 40:10What's your take on the industry on that?
- 40:13<v ->We are about 18 months into our exploration of the market.</v>
- 40:22We have seen growth so far.
- 40:25We think there's more to go.
- 40:28I showed you those five labs,
- 40:30I think in total maybe served certainly more than a dozen,
- 40:35I wanna say maybe like 15 and like labs plus companies or so
- 40:3815, 16, in those 18 months.
- 40:43We had to figure out lots of other stuff along the way.
- 40:45But we think there's a need, you know, like I mentioned
- 40:52and folks that have the skillset to, you know,
- 40:56provide that data science service
- 40:58that are continually in demand.
- 41:00So I'm gonna say yes, it's growing.
- 41:04We're always wondering in industry how fast, you know,
- 41:08that's always the question,
- 41:10but it's definitely not shrinking.
- 41:13<v Robert>Alright, that's an exciting option.</v>
- 41:18<v Participant>Yeah just really quick,</v>
- 41:20what happens with authorship?
- 41:22If you work with the lab very closely on a project,
- 41:26they come out with a really good publication.
- 41:31How do you deal with that in this industry?
- 41:36<v ->Yeah, great question. Thank you.</v>
- 41:40So as a company,
- 41:44we don't require to have our data scientists listed
- 41:51as co-authors on papers.
- 41:55I think from an ethical perspective
- 42:02in the case where the contribution that the data scientist
- 42:05has made are very significant
- 42:09you know, sometimes PIs have asked the question to us,
- 42:13you know, what sort of acknowledgement
- 42:15would you like of the data scientist?
- 42:18And if the PI feels that, say, you know,
- 42:21someone who has a PhD who works with us
- 42:23has done enough work that it merits authorship,
- 42:27they're free to add that person.
- 42:28We don't require that.
- 42:30Otherwise, you know, an acknowledgements nice always right?
- 42:33But also not required.
- 42:37I think, you know, sometimes the nature
- 42:40of the contribution really matters.
- 42:42So, you know, as a company it's a little bit
- 42:47like how much do you acknowledge
- 42:49the vendor of your microscope, right?
- 42:53You might say, okay, I did this on a Nikon microscope
- 42:56or you know, but you might write that more
- 42:58as a method section.
- 42:59And then if like a technician came out
- 43:00and like helped you calibrate it,
- 43:02you're probably not gonna give
- 43:03that person an authorship either.
- 43:05But you might acknowledge them if they did extensive help
- 43:07that like led to some novel process.
- 43:10So on the whole, it's a case by case conversation
- 43:15that scales based on the level of the contribution,
- 43:17but it's not the first thing that we think of.
- 43:19It's not like, "Hey, because we did anything for you,
- 43:21please put us on a paper."
- 43:23Definitely don't do it that way.
- 43:24It's more the opposite, which is like, you know,
- 43:27we're gonna do a thing for you.
- 43:28Probably, you don't need to cite us.
- 43:30But if it gets up to a certain point
- 43:33and we kind of mutually agree that that's appropriate,
- 43:35then we're happy to discuss that.
- 43:41<v ->Thank you for sharing Stephen.</v>
- 43:42So I have a quick question too.
- 43:44So if you're running on data sets,
- 43:47one cell may take really long time to run,
- 43:50then how do you solve the concurrency issue?
- 43:53Let's say there's multiple people collaborating online
- 43:56that when the cell is running,
- 44:00what if some other, another party just clicked stop
- 44:04or doing something random?
- 44:06How do you solve the issue that people are on the same page
- 44:08when something takes really long time to run?
- 44:13<v ->Yeah, great question.</v>
- 44:14So a few ways,
- 44:18one nice thing about a cloud workspace is that
- 44:22we can expand the number of processors
- 44:25and the amount of memory kind of
- 44:28behind the scenes transparently.
- 44:31So basically you can like log out of the workspace
- 44:35and in five minutes log back into the workspace
- 44:38and we've like doubled the processing speed
- 44:40and like doubled the memory.
- 44:42So we tend to keep our default instance
- 44:45at like a reasonable like laptop,
- 44:47like probably not a high end.
- 44:49And then when we discover cases like what you're talking
- 44:52about where like, yeah, no, that cell requires a lot
- 44:56and we kind of know a little bit in advance,
- 44:57like we're gonna wanna run that a lot, right?
- 44:59We might do this, which was we might
- 45:01like just beef it up, right?
- 45:03And that's cool that we can do that.
- 45:07And then the question becomes like,
- 45:10does that need to run, you know, 24/7,
- 45:12does it need to run every day,
- 45:13every week, every month right?
- 45:15We think a little bit about that
- 45:16because then there's some additional costs on our side.
- 45:18If you're gonna do it for like an afternoon,
- 45:20it's like really not, it's not worth making any additional,
- 45:24you know, requests of somebody.
- 45:27But there's another part of your question I wanna get at
- 45:28too, which is like maybe overriding each other, right?
- 45:33So that can happen.
- 45:34And that's a little bit like software specific.
- 45:38So like in a Jupyter Notebook, you could,
- 45:43if you don't coordinate a little bit with your lab member,
- 45:45like overwrite something in one cell at one time, right?
- 45:49The other person didn't notice.
- 45:50So for that, we have some best practices, you know.
- 45:54By far the most common, you know, example that we see is,
- 45:59is like two or fewer people collaborating,
- 46:01but if it were three or four,
- 46:03we'd probably recommend that they do a best practice
- 46:05of like, you know, while you're doing work that's separate
- 46:08and you're not like talking to each other,
- 46:10do work on separate copies of the thing, right?
- 46:13And then come together in a meeting
- 46:15and like put it back together, right?
- 46:17Usually is the better practice if you're say,
- 46:20working on a Jupyter Notebook,
- 46:22and you know, communicate, you know,
- 46:25using some other method like a meeting like this.
- 46:28So yeah so those are the two aspects.
- 46:30On the one side, if it's computation intensive,
- 46:32we can make it bigger.
- 46:33If it's actually about people writing each other,
- 46:35we recommend some best practices
- 46:37for communicating outside of the workspace.
- 46:42<v ->Other questions?</v>
- 46:47All right, I have one more question.
- 46:50So like in the old days,
- 46:53people would buy a nice computer for their lab or maybe a
- 46:57couple of nice computers and like then everybody
- 47:00would log in at that and it was a one-time cost, right?
- 47:05And so how have you found, I don't know,
- 47:09I mean, so it's a very different model for
- 47:14both academia industry, wherever that's trying
- 47:18to transition from this one time cost
- 47:21where now, you know, you might still be using this computer
- 47:2410 years later for good and ill
- 47:29versus sort of this continuous cloud-based thing.
- 47:34I don't know,
- 47:35do you have any words of wisdom on this transition?
- 47:39Because it seems like, you know, you pay
- 47:42for a cloud computer and if it's on constantly,
- 47:46it eats up a lot of money.
- 47:48<v ->Yeah, yeah.</v>
- 47:49So really good question.
- 47:53So I think and-
- 47:54<v ->Lose control of your data also, which to some extent,</v>
- 47:58like somebody else has your data.
- 48:00<v ->In theory, yes.</v>
- 48:02But you know, I think some of this is just like a journey
- 48:06and a transition that, you know, scientists are making.
- 48:09Those of us, like yourself,
- 48:11we're more software engineer minded,
- 48:13have been comfortable with the idea of say, you know,
- 48:16like all of our company's data, for example,
- 48:18is kind of in Google's clouds,
- 48:21Google's workspace technically.
- 48:22None of it is sitting under my desk, right?
- 48:25But we've gotten a level of comfort about data ownership
- 48:28based on essentially trust and agreements
- 48:32and our understanding of how certain sections
- 48:34of disk are like cordoned off, you know, for ourselves
- 48:38and lying on some of those best practices.
- 48:40But to get to the heart of your question,
- 48:44I think the best metaphor is like
- 48:45buying a house versus renting an apartment, right?
- 48:48So, you know, going down to Apple
- 48:51and picking up a laptop or Dell or whatever you wanna use,
- 48:55right, is that's the buy model.
- 48:56And we're super comfortable with that.
- 48:58The cloud model is more the like renting the apartment.
- 49:01And certainly people make the choice,
- 49:03you know, not to rent sometimes
- 49:05because it's like, doesn't work out economically, right?
- 49:07It's like, "Hey, I'm throwing money away."
- 49:09Sometimes people throw, right?
- 49:11But what is the advantage of renting, right?
- 49:13The advantage of renting is, you know,
- 49:16if a thing breaks in your rented apartment,
- 49:17it's not on you to go pay extra money to go fix it.
- 49:20That's on the person who owns it.
- 49:21Similarly, if something breaks with your cloud workspace,
- 49:24you know, you call us and you're like,
- 49:26"Hey, this thing didn't work,
- 49:27please fix it, right?"
- 49:29And then there's this scaling thing, right?
- 49:31Which is like, if you go back to Apple and you're like,
- 49:32"Actually can you add like double the CPU
- 49:37and double the memory?"
- 49:39They'll be like, yes, you can pay us for that,
- 49:41but it's gonna take a while, right?
- 49:43And it's not gonna happen flexibly and scalably.
- 49:44So I think it fits into a different space, right?
- 49:48Obviously these two come together,
- 49:50I'm talking to you on a physical laptop that I own, right?
- 49:52But I'm also using cloud instances to do things.
- 49:56So I think it's like, it fits into this niche where like,
- 50:00actually the most useful computer for this purpose,
- 50:03this collaborative purpose
- 50:05is a rented one, right rather than an owned one.
- 50:08And you know, maybe that means when I'm not using it,
- 50:11I'm not paying for it at all, basically, right?
- 50:13Like, if I'm like paused on this collaboration,
- 50:15then I'm like actually not paying for it at all,
- 50:17but then I can bring 'em back and six months and start
- 50:18paying for it again.
- 50:20So this is what I hope that folks take away is like,
- 50:22it opens up a lot of new possibilities.
- 50:24And the ones that we've gotten
- 50:26are certainly not the only ones.
- 50:27There's just like lots more
- 50:28that you can imagine or envision.
- 50:32But, but yeah, it's a mindset change
- 50:35and it's one that I think, you know,
- 50:37requires some adapting, yeah.
- 50:42<v ->All right. Thank you so much.</v>
- 50:44<v ->I have a question for you guys</v>
- 50:45if there's not another question for me.
- 50:48<v ->There's a question on the screen.</v>
- 50:51<v ->Sorry, I have a question.</v>
- 50:54I think piggy-backing off of that question-
- 50:58<v ->Hi hello. Hi Noelle.</v>
- 51:00<v ->Actually Hi.</v>
- 51:02I used to like physical like pieces of data
- 51:08and like having physical hard drives.
- 51:10So like what is the security for data that's on the cloud?
- 51:16<v ->Yeah, so folks like,</v>
- 51:24we ourselves build these cloud instances
- 51:30on the back of three major providers,
- 51:32whose names you'll recognize,
- 51:33Amazon, Google, and Microsoft okay?
- 51:37Those are the big three cloud providers
- 51:40and they make a guarantee to us
- 51:43and then we make a guarantee to our customers
- 51:46about the data protection.
- 51:47So it's kind of like a layer cake.
- 51:49And the foundation of it begins with, do you trust Amazon?
- 51:52Do you trust Google? Do you trust Microsoft?
- 51:53Some people say yes, some people say no,
- 51:56but fundamentally they are the ones that, you know,
- 51:59build data centers, right where the physical aspect
- 52:04of these computers actually live.
- 52:05So, you know, this virtual computer,
- 52:07maybe if you go and like,
- 52:09"Hey, show me the hard drive where this lives."
- 52:12You're gonna go out to like, I don't know,
- 52:14Washington State near some power plant basically,
- 52:18where it's very economical to set this up, right?
- 52:21So they then guarantee like,
- 52:25how do you know that that's safe, right?
- 52:27Well they guarantee that they're following industry
- 52:30standards to secure those facilities, to lock them down,
- 52:35to like continually maintain and manage the networks
- 52:41that are there to patch the servers
- 52:44that they're using to keep ahead of any security faults.
- 52:47So there's one layer of this
- 52:49where we rely on these big providers to do their jobs.
- 52:52And despite the last 15, 20 years of like hacks
- 52:57that you've heard about whatnot that happened in industry,
- 53:00these three providers so far have managed to avoid
- 53:03being hacked in any major way.
- 53:05Like you've not heard of like Amazon getting hacked,
- 53:08Google getting hacked, Microsoft getting hacked.
- 53:10If tomorrow Amazon gets hacked, then yeah,
- 53:13we're all worried okay?
- 53:14And then we probably would need to shift around.
- 53:16But so there's a fundamental guarantee
- 53:19that like all cloud kind of relies on
- 53:21and it's like good to talk about it
- 53:23because like we all have to kind of trust these,
- 53:27you know, these large providers.
- 53:29But they also invest,
- 53:31I'd say millions or hundreds of millions of dollars
- 53:34in computer security.
- 53:35Like if you're in the field of computer security,
- 53:38like, you know these guys because they are sort
- 53:41of world leaders in this sort of thing.
- 53:44Microsoft, you know, notably was involved in doing some
- 53:48forensic analysis on like Russian hacking back in 2016.
- 53:52Like they were some of the first people to notice
- 53:55that a state actor like Russia was on the scene
- 53:58doing the various things, taking over computers.
- 54:00So generally the community of software engineers
- 54:05that do cloud work know these things
- 54:07and kind of rely on Google, Amazon, and Microsoft
- 54:11to like make these investments in computer security.
- 54:14And notably like, I don't go like set up my own data center
- 54:18because I know that I would have to invest millions
- 54:21of dollars in having an equivalently good computer security
- 54:25team to like watch out for Russia,
- 54:27who by the way also invests hundreds of millions of dollars
- 54:30to try to hack these things.
- 54:31So, the world of computer security is a problem.
- 54:35So there's that level of trust, okay?
- 54:37And then on top of that, you have to trust one more level,
- 54:39which is the group that like sets up the workspace.
- 54:41So you kinda have to trust, like if it's from us,
- 54:43you have to kind of trust us that we're not screwing
- 54:45something up on top of all of those protections
- 54:48'cause it is possible to do that at the level of like,
- 54:51you know, Jupyter Notebook that our logins are well used.
- 54:55So we also invest in using industry standard
- 54:59like login protocols, so that only the people that we say
- 55:02can log in can log in, right?
- 55:04There's a layer of software security there that, you know,
- 55:07we have to be on top of patching at one level also.
- 55:11So these are all the things that make that secure.
- 55:13And the last thing would be like,
- 55:15do you or don't you trust us to like not to,
- 55:18to not go in and do something nefarious with your data
- 55:21even though we're the only ones that can control it.
- 55:23So you trust that nobody else can get into it,
- 55:25but do you trust us?
- 55:26And then that becomes,
- 55:27yeah a question of like, you know,
- 55:29going back and checking your references, you know,
- 55:32talking to other PIs, making sure that something nefarious
- 55:35hasn't happened, you know, there.
- 55:37And you probably wanna gain some confidence on that.
- 55:39But what we've found is that organizations
- 55:42are getting more and more comfortable with that.
- 55:43Dropbox is a publicly traded company,
- 55:46lots of people put stuff on Dropbox.
- 55:48When you put something on Dropbox,
- 55:49you're essentially trusting Dropbox.
- 55:51Dropbox is also built on one of these
- 55:53three providers same way, right?
- 55:55So it's that kind of idea
- 55:57that takes some getting used to but you know,
- 56:01becomes increasingly useful to do this kind of work on.
- 56:05And we see large banks and large pharma companies
- 56:07having taken their time to also adopt cloud
- 56:10large financial institutions.
- 56:13But over time there's been increasing comfort
- 56:15as some of these security questions
- 56:17have been, you know, asked and answered.
- 56:20So bit of a long answer,
- 56:22but thank you for the question 'cause it's important.
- 56:27<v ->Alright, thanks so much.</v>
- 56:28In the interest of time,
- 56:29I think we're gonna have to stop it here, thanks again.
- 56:32Really appreciate. (audio garbles)
- 56:37<v ->Thank you guys. Thank you all for your time.</v>
- 56:40<v ->Have a great day.</v>