Skip to Main Content

Data Science Pathway

The Biostatistics data science pathway combines rigorous statistical training with the development of advanced computational skills to solve the public health challenges of tomorrow. Required courses cover epidemiology, regression models, databases, machine learning and more. Students will become familiar with data science programming tools (e.g. R, Python, SQL and NoSQL databases). Data science pathway graduates can find careers analyzing large volumes of health data in government (e.g. public health agencies), hospitals, industry (e.g. pharmaceutical companies) or research.

Students pursuing this pathway will graduate with the key skills of any Biostatistician. Unlike the traditional pathway, data science pathway students will have more experience using computational techniques to store, manipulate and analyze large volumes and varieties of data. This pathway trains biostatisticians; as such, it emphasizes the development and application of rigorous statistical theory to extensive health data sets, as opposed to the application of the latest computational techniques that are prioritized in the health informatics masters. The focus on health applications differentiates this pathway from the MS in Data Science and Statistics.

Requirements - MS Data Science Pathway

The MS Biostatistics Data Science Pathway degree requires a total of 15-course units from the curriculum below (Public Health Primer, BIS 525/526 and PUBH 100/101 are not for credit). Course substitutions must be approved by the student advisor and the DGS. Electives not listed below must be approved by the BIS Data Science Pathway Director.

Full-time students must carry a minimum of 4 course units each semester. Course schedules with more than 5 courses for credit will not be approved. If students have fewer than 4 required courses to take in their last term, it is acceptable to register for just the courses needed to fulfill the degree requirements.

2025-26 Matriculation

All courses count as 1 credit unless otherwise noted.

MS Required Courses (8 course units)

  • Public Health Primer – 0 units
  • BIS 525 Seminar in Biostatistics and Journal Club - 0 units
  • BIS 526 Seminar in Biostatistics and Journal Club - 0 units
  • BIS 623 Advanced Regression Models [or S&DS 6120 Linear Models] *BIS 623 is a 1st year class only; BIS 623 is a prerequisite for BIS 630
  • BIS 628 Longitudinal and Multilevel Data Analysis
  • BIS 630 Applied Survival Analysis [or BIS 643 Theory of Survival Analysis] *BIS 630 is a 1st year class only; BIS 623 is a prerequisite for BIS 630
  • BIS 678 Statistical Practice I - Capstone Experience *2nd year class only
  • BIS 681 Statistical Practice II- Capstone Experience *2nd year class only
  • PUBH 508 Foundations of Epidemiology and Public Health
  • S&DS 5410 Probability Theory [or S&DS 6000 Advanced Probability or S&DS 5510 Stochastic Process]
  • S&DS 5420 Theory of Statistics [or S&DS 6100 Statistical Inference]
  • BIS 695 Summer Internship in Biostatistical Research - 0 units
  • PUBH 100 (Fall); PUBH 101 (Spring) Professional Skills Series - 0 units
*Students entering the program with an MPH or relevant graduate degree may be exempt from this requirement.

Biostatistics/Statistics/Computer Science Electives: Minimum of 3 of the following REQUIRED

  • BENG 544 Fundamentals of Medical Imaging
  • BIS 536 Measurement Error and Missing Data
  • BIS 537 Statistical Methods for Causal inference
  • BIS 540 Fundamentals of Clinical Trials
  • BIS 550/CB&B 7500 Topics in Biomed Informatics and Data Science
  • BIS 555 Machine Learning and Biomedical Data **
  • BIS 567 Bayesian Statistics
  • BIS 629 Advanced Methods for Implementation and Prevention Science
  • BIS 634 Computational Methods for Informatics **
  • BIS 645 Statistical Methods in Human Genetics
  • BIS 646 Nonparametric Statistical Methods & their Applications
  • BIS 692/S&DS 6450 Statistical Methods in Computational Biology
  • CB&B 5620 Modeling Biological Systems II
  • CB&B 7520 Biomedical Data Science: Mining and Modeling
  • CPSC 5150 Law and Large Language Models
  • CPSC 5190 Full Stack Programming
  • CPSC 5260 Building Distributed Systems
  • CPSC 5390 Software Engineering
  • CPSC 5650 Theory of Distributed Systems
  • CPSC 5770 Natural Language Processing
  • CPSC 5880 AI Foundation Models
  • CPSC 6400 Topics in Numerical Computation
  • CPSC 6420 Modern Challenges in Statistics: A Computational Perspective
  • EMD 553 Transmission Dynamic Modeling of Infectious Diseases
  • S&DS 5410 Probability Theory with Applications (Cannot fulfill elective if taken as a requirement)
  • S&DS 5510 Stochastic Processes (Cannot fulfill elective if taken as a substitute for S&DS 541)
  • S&DS 5660 Deep Learning for Science
  • S&DS 6110 Selected Topics in Statistical Decision Theory
  • S&DS 6610 Data Analysis
  • S&DS 6640 Information Theory

** These courses can only be counted to fulfill the requirement of one category (they cannot be counted twice in fulfillment of requirements.)

Electives in Machine Learning: Minimum of 1 of the following REQUIRED

  • BIS 555 Machine Learning and Biomedical Data **
  • BIS 568 Applied Machine Learning in Healthcare
  • BIS 634 Computational Methods for Informatics **
  • BIS 691 Theory of Generalized Linear Models
  • CB&B 555/ AMTH 553/CPSC 553 Unsupervised Learning for Big Data
  • CB&B 6663/ CPSC 5520/AMTH 5520 Deep Learning Theory and Applications
  • CPSC 5690 Randomized Algorithms
  • CPSC 5830 Deep Learning on Graph-Structured Data
  • CPSC 6440 Geometric and Topological Methods in Machine Learning
  • CPSC 6700 Topics in Natural Language Processing
  • S&DS 5170 Applied Machine Learning and Causal Inference
  • S&DS 5380 (Bayesian) Probability and Statistics
  • S&DS 5620 Computational Tools for Data Science
  • S&DS 5650 Introductory Machine Learning
  • S&DS 5690 Numerical Linear Algebra: Deterministic and randomized algorithms
  • S&DS 6310 Optimization and Computation
  • S&DS 6320 Advanced Optimization Techniques
  • S&DS 6650 Intermediate Machine Learning
  • S&DS 674/ ENV 781 Applied Spatial Statistics
  • S&DS 6840 Statistical Inference on Graphs
  • S&DS 6850 Theory of Reinforcement Learning
  • S&DS 6860 High-dimensional phenomena in statistics and learning

** These courses can only be counted to fulfill the requirement of one category (they cannot be counted twice in fulfillment of requirements.)

Electives in Databases: Minimum of 1 of the following REQUIRED

  • BIS 550/CB&B 7500 Topics in Biomed Informatics and Data Science **
  • BIS 638 Clinical Database Management Systems and Ontologies
  • BIS 679 Advanced Statistical Programming in SAS & R
  • CPSC 5370 Introduction to Database Systems
  • MGT 656 Management of Software Development
  • MGT 660 Advanced Management of Software Development
  • MGT 858 Database Systems

Additional Electives: 2 REQUIRED

Take two additional course units from the electives listed above. Other courses from YSPH or another department must be approved by the Data Science Pathway Director.

Other Courses

BIS 649/BIS 650 Master’s Thesis:

If chosen, BIS 650 replaces BIS 678 in the spring of the 2nd year. Students doing a thesis must present their research in a public seminar to graduate: ** These courses can only be counted to fulfill the requirement of one category (they cannot be counted twice in fulfillment of requirements.)

Please note, the following electives are not approved and will not count towards the graduation requirements:

  • CPSC 5660 Blockchain and Cryptocurrency
  • S&DS 506 Introduction to Statistics: Data Analysis
  • S&DS 6170 Advances in Large Language Models: Theory and Applications Seminar
  • S&DS 5300 Data Exploration and Analysis
  • S&DS 5630 Multivariate Statistical Methods for the Social Sciences
  • MGT 505 Introduction to Marketing
  • MGT 544 Investment Management
  • MGT 556 Big Data and Customer Analytics
  • MGT 575 Social Media Analytics
  • MGT 595 Quantitative Investing
  • MGT 992 Health Care Strategy

MS Competencies in Biostatistics

  1. Select from a variety of analytical tools to test statistical hypotheses, interpret results of statistical analyses and use these results to make relevant inferences from data.
  2. Design efficient computer programs for study management, statistical analysis, as well as presentation using R, SAS and other programming languages.
  3. Demonstrate oral and written communication and presentation skills to effectively communicate and disseminate results to professional audiences.