Data Science Pathway

The Biostatistics data science pathway combines rigorous statistical training with the development of advanced computational skills to solve the public health challenges of tomorrow. Required courses cover epidemiology, regression models, databases, machine learning and more. Students will become familiar with data science programming tools (e.g. R, Python, SQL and NoSQL databases). Data science pathway graduates can find careers analyzing large volumes of health data in government (e.g. public health agencies), hospitals, industry (e.g. pharmaceutical companies) or research.

Students pursuing this pathway will graduate with the key skills of any Biostatistician. Unlike the traditional pathway, data science pathway students will have more experience using computational techniques to store, manipulate and analyze large volumes and varieties of data. This pathway trains biostatisticians; as such, it emphasizes the development and application of rigorous statistical theory to extensive health data sets, as opposed to the application of the latest computational techniques that are prioritized in the health informatics masters. The focus on health applications differentiates this pathway from the MS in Data Science and Statistics.

Requirements - Data Science Pathway

The M.S. Biostatistics Standard Pathway degree requires a total of 16-course units from the curriculum below (BIS 525/526 and EPH 100/101 are not for credit). Course substitutions must be approved by the student’s advisor and the DGS. Electives not listed below must be approved by the BIS Data Science Pathway Director.

Full-time students must carry a minimum of 4 course units each semester. Course schedules with more than 5 courses for credit will not be approved. If students have fewer than 4 required courses to take in their last term, it is acceptable to register for just the courses needed to fulfill the degree requirements.

2024-25 Matriculation

All courses count as 1 credit unless otherwise noted.

MS Required Courses (10 course units)

BIS 525 Seminar in Biostatistics and Journal Club - 0 units
BIS 526 Seminar in Biostatistics and Journal Club - 0 units
BIS 620 Data Science Software Systems
BIS 623 Advanced Regression Models [or S&DS 612 Linear Models] *BIS 623 is a 1st year class only; BIS 623 is a prerequisite for BIS 630
BIS 628 Longitudinal and Multilevel Data Analysis
BIS 630 Applied Survival Analysis [or BIS 643 Theory of Survival Analysis] *BIS 630 is a 1st year class only; BIS 623 is a prerequisite for BIS 630
BIS 678 Statistical Practice I *2nd year class only
BIS 687 Data Science Statistical Practice- Capstone Experience *2nd year class only
EPH 509 Fundamentals of Epidemiology
EPH 608 Frontiers of Public Health*
S&DS 541 Probability Theory [or S&DS 600 Advanced Probability or S&DS 551 Stochastic Process]
S&DS 542 Theory of Statistics [or S&DS 610 Statistical Inference]
BIS 695 Summer Internship in Biostatistical Research - 0 units
EPH 100 (Fall); EPH 101 (Spring) Professional Skills Series - 0 units

*Students entering the program with an MPH or relevant graduate degree may be exempt from this requirement.

Biostatistics/Statistics/Computer Science Electives: Minimum of 2 of the following REQUIRED

BIS 536 Measurement Error and Missing Data
BIS 537 Statistical Methods for Causal inference
BIS 540 Fundamentals of Clinical Trials
BIS 550/CB&B 750 Topics in Biomed Informatics and Data Science
BIS 555 Machine Learning and Biomedical Data **
BIS 567 Bayesian Statistics
BIS 629 Advanced Methods for Implementation and Prevention Science
BIS 634 Computational Methods for Informatics **
BIS 645 Statistical Methods in Human Genetics
BIS 646 Nonparametric Statistical Methods & their Applications
BIS 662 Computational Statistics **
BIS 692/S&DS 645 Statistical Methods in Computational Biology
CB&B 562 Modeling Biological Systems II
CB&B 752 Biomedical Data Science: Mining and Modeling
CPSC 519 Full Stack Programming
CPSC 526 Building Distributed Systems
CPSC 539 Software Engineering
CPSC 565 Theory of Distributed Systems
CPSC 577 Natural Language Processing
CPSC 588 AI Foundation Models
CPSC 640 Topics in Numerical Computation
CPSC 642 Modern Challenges in Statistics: A Computational Perspective
EMD 553 Transmission Dynamic Modeling of Infectious Diseases
HPM 573 Advanced Topics in Modeling Health Care Decisions
S&DS 541 Probability Theory with Applications (Cannot fulfill elective if taken as a requirement)
S&DS 551 Stochastic Processes (Cannot fulfill elective if taken as a substitute for S&DS 541)
S&DS 611 Selected Topics in Statistical Decision Theory
S&DS 661 Data Analysis
S&DS 664 Information Theory

Electives in Machine Learning: Minimum of 1 of the following REQUIRED

BIS 555 Machine Learning and Biomedical Data **
BIS 568 Applied Machine Learning in Healthcare
BIS 634 Computational Methods for Informatics **
BIS 662 Computational Statistics **
BIS 691 Theory of Generalized Linear Models
CB&B 555/ AMTH 553/CPSC 553 Unsupervised Learning for Big Data
CB&B 663/ CPSC 552/AMTH 552 Deep Learning Theory and Applications
CPSC 569 Randomized Algorithms
CPSC 583 Deep Learning on Graph-Structured Data
CPSC 644 Geometric and Topological Methods in Machine Learning
CPSC 670 Topics in Natural Language Processing
S&DS 517 Applied Machine Learning and Causal Inference
S&DS 538 (Bayesian) Probability and Statistics
S&DS 562 Computational Tools for Data Science
S&DS 565 Introductory Machine Learning
S&DS 569 Numerical Linear Algebra: Deterministic and randomized algorithms
S&DS 631 Optimization and Computation
S&DS 632 Advanced Optimization Techniques
S&DS 665 Intermediate Machine Learning
S&DS 674/ F&ES 781 Applied Spatial Statistics
S&DS 684 Statistical Inference on Graphs
S&DS 685 Theory of Reinforcement Learning
S&DS 686 High-dimensional phenomena in statistics and learning

Electives in Databases: Minimum of 1 of the following REQUIRED

BIS 550/CB&B 750 Topics in Biomed Informatics and Data Science **
BIS 638 Clinical Database Management Systems and Ontologies
BIS 679 Advanced Statistical Programming in SAS & R
CPSC 537 Introduction to Database Systems
MGT 656 Management of Software Development
MGT 660 Advanced Management of Software Development

Additional Electives: 2 REQUIRED

Take two additional course units from the electives listed above. Other courses from YSPH or another department must be approved by the Data Science Pathway Director.

Other Courses

BIS 649/BIS 650 Master’s Thesis:

If chosen, BIS 650 replaces BIS 678 in the spring of the 2nd year. Students doing a thesis must present their research in a public seminar to graduate ** These courses can only be counted to fulfill the requirement of one category (they cannot be counted twice in fulfillment of requirements.)

Please note, the following electives are not approved and will not count towards the graduation requirements:

CPSC 566 Blockchain and Cryptocurrency
S&DS 506 Introduction to Statistics: Data Analysis
S&DS 617 Applied Machine Learning and Causal Inference Research Seminar
S&DS 530 Data Exploration and Analysis
S&DS 563 Multivariate Statistical Methods for the Social Sciences
MGT 556 Big Data and Customer Analytics
MGT 575 Social Media Analytics
MGT 595 Quantitative Investing