Skip to Main Content

Data Science Pathway

The Biostatistics data science pathway combines rigorous statistical training with the development of advanced computational skills to solve the public health challenges of tomorrow. Required courses cover epidemiology, regression models, databases, machine learning and more. Students will become familiar with data science programming tools (e.g. R, Python, SQL and NoSQL databases). Data science pathway graduates can find careers analyzing large volumes of health data in government (e.g. public health agencies), hospitals, industry (e.g. pharmaceutical companies) or research.

Students pursuing this pathway will graduate with the key skills of any Biostatistician. Unlike the traditional pathway, data science pathway students will have more experience using computational techniques to store, manipulate and analyze large volumes and varieties of data. This pathway trains biostatisticians; as such, it emphasizes the development and application of rigorous statistical theory to extensive health data sets, as opposed to application of the latest computational techniques that are prioritized in the health informatics masters. The focus on health applications differentiates this pathway from the MS in Data Science and Statistics.

Students must choose this pathway at the start of their two-year program.

Requirements - Data Science Pathway

2021-22 Matriculation
The M.S. degree requires a total of 16 course units. The M.S in Biostatistics requires the student to complete or acquire an exemption from the following courses. Full time students must carry a minimum of 4 course units each semester. Course substitutions (other than those listed) must be approved by the academic advisor, the Data Science Pathway Director and the DGS.

MS Required Courses (10 course units)

  • BIS 525 Seminar in Biostatistics and Journal Club - 0 units
  • BIS 526 Seminar in Biostatistics and Journal Club - 0 units
  • BIS 620 Data Science Software Systems - 1 unit
  • BIS 623 Advanced Regression Models [or S&DS 612 Linear Models] - 1 unit
  • BIS 628 Longitudinal and Multilevel Data Analysis - 1 unit
  • BIS 630 Applied Survival Analysis [or BIS 643 Theory of Survival Analysis] - 1 unit
  • BIS 678 Statistical Practice – Capstone Experience - 1 Unit
  • BIS 687 Data Science Statistical Practice- Capstone Experience - 1 unit
  • EPH 508 Foundations of Epidemiology and Public Health - 1 unit
  • EPH 608 Frontiers of Public Health * - 1 unit
  • EPH 600 Research Ethics and Responsibilities - 0 units
  • S&DS 541 Probability Theory [or S&DS 600 Advanced Probability or S&DS 551 Stochastic Process] - 1 unit
  • S&DS 542 Theory of Statistics [or S&DS 610 Statistical Inference] - 1 unit
  • BIS 695 Summer Internship in Biostatistical Research - 0 units
  • EPH 100/101 Professional Skills Series - 0 units

MS Electives in Biostatistics (minimum 2 course units)

  • BIS 555 Machine Learning and Biomedical Data - 1 unit
  • BIS 557 Computational Statistics - 1 unit
  • BIS 634 Computational Methods for Informatics - 1 unit
  • BIS 646 Nonparametric Statistical Methods and their Applications - 1 unit

Electives in Machine Learning (1 course unit)

Take one or more of the following (if not taken from list above):
  • BIS 555 Machine Learning and Biomedical Data - 1 unit
  • BIS 557 Computational Statistics - 1 unit
  • BIS 634 Computational Methods for Informatics - 1 unit
  • BIS 646 Nonparametric Statistical Methods and their Applications - 1 unit
  • S&DS 565 Introductory Machine Learning - 1 unit
  • S&DS 563 Multivariate Statistical Methods for the Social Sciences - 1 unit
  • S&DS 631 Optimization and Computation – 1 unit
  • CB&B 555 Unsupervised Learning for Big Data - 1 unit
  • CB&B 567 Topics in Deep Learning: Methods & Biomedical Applications - 1 unit
  • CB&B 663 Deep Learning Theory and Applications - 1 unit
  • CB&B 745 Advanced Topics in Machine Learning - 1 unit

Electives in Databases (1 course unit)

Take one or both of the following:
  • BIS 638 Clinical Database Management Systems and Ontologie - 1 unit
  • CPSC 537 Introduction to Database Systems - 1 unit

Electives (2 course units)

Take two additional course units from either the machine learning list, the databases list, or in BIS, CB&B or S&DS. Other courses from YSPH, CPSC, or another department may be acceptable if given permission from the Data Science Pathway Director.

Other Courses

  • BIS 649/BIS 650 Master’s Thesis Research - 2 units
    Students choosing this option must present their research in a public seminar to graduate

*Students entering the program with an MPH or relevant graduate degree may be exempt from this requirement.

Students should take no more than 5 courses for credit each semester (BIS 525/526, EPH 600, EPH 100/101 are not for credit). Courses listed without a notation in the “term taken” column can be taken in either year of the program if prerequisites are met and with advisor approval.

rev. 7.28.2021