Skip to Main Content

INFORMATION FOR

Data Science Pathway

The Biostatistics data science pathway combines rigorous statistical training with the development of advanced computational skills to solve the public health challenges of tomorrow. Required courses cover epidemiology, regression models, databases, machine learning and more. Students will become familiar with data science programming tools (e.g. R, Python, SQL and NoSQL databases). Data science pathway graduates can find careers analyzing large volumes of health data in government (e.g. public health agencies), hospitals, industry (e.g. pharmaceutical companies) or research.

Students pursuing this pathway will graduate with the key skills of any Biostatistician. Unlike the traditional pathway, data science pathway students will have more experience using computational techniques to store, manipulate and analyze large volumes and varieties of data. This pathway trains biostatisticians; as such, it emphasizes the development and application of rigorous statistical theory to extensive health data sets, as opposed to application of the latest computational techniques that are prioritized in the health informatics masters. The focus on health applications differentiates this pathway from the MS in Data Science and Statistics.

Students must choose this pathway at the start of their two-year program.

Requirements - Data Science Pathway

2022-23 Matriculation
The M.S. degree requires a total of 16 course units. The M.S in Biostatistics requires the student to complete or acquire an exemption from the following courses. Full time students must carry a minimum of 4 course units each semester. Course substitutions (other than those listed) must be approved by the academic advisor, the Data Science Pathway Director and the DGS

MS Required Courses (10 course units)

  • BIS 525 Seminar in Biostatistics and Journal Club - 0 units
  • BIS 526 Seminar in Biostatistics and Journal Club - 0 units
  • BIS 620 Data Science Software Systems - 1 unit
  • BIS 623 Advanced Regression Models [or S&DS 612 Linear Models] - 1 unit
  • BIS 628 Longitudinal and Multilevel Data Analysis - 1 unit
  • BIS 630 Applied Survival Analysis [or BIS 643 Theory of Survival Analysis] - 1 unit
  • BIS 678 Statistical Practice – Capstone Experience - 1 Unit
  • BIS 687 Data Science Statistical Practice- Capstone Experience - 1 unit
  • EPH 509 Fundamentals of Epidemiology - 1 unit
  • EPH 608 Frontiers of Public Health * - 1 unit
  • EPH 600 Research Ethics and Responsibilities - 0 units
  • S&DS 541 Probability Theory [or S&DS 600 Advanced Probability or S&DS 551 Stochastic Process] - see note - 1 unit
  • S&DS 542 Theory of Statistics [or S&DS 610 Statistical Inference] - 1 unit
  • BIS 695 Summer Internship in Biostatistical Research - 0 units
  • EPH 100/101 Professional Skills Series - 0 units

MS Electives in Biostatistics (minimum 2 course units)

  • BIS 536 Measurement Error and Missing Data - 1 unit
  • BIS 550/CB&B 750 Topics in Biomed Informatics and Data Science - 1 unit
  • BIS 555 Machine Learning and Biomedical Data** - 1 unit
  • BIS 557 Computational Statistics** - 1 unit
  • BIS 634 Computational Methods for Informatics **- 1 unit
  • BIS 645 Statistical Methods in Human Genetics - 1 unit
  • BIS 646 Nonparametric Statistical Methods and their Applications** - 1 unit
  • S&DS 611 Selected Topics in Statistical Decision Theory - 1 unit
  • S&DS 661 Data Analysis - 1 unit
  • CPSC 539 Software Engineering - 1 unit
  • CPSC 577 Natural Language Processing - 1 unit
  • CB&B 752 Biomedical Data Science: Mining and Modeling - 1 unit

Electives in Machine Learning (1 course unit)

Take one or more of the following (if not taken from list above):
  • BIS 555 Machine Learning and Biomedical Data - 1 unit
  • BIS 557 Computational Statistics** - 1 unit
  • BIS 568 Applied Machine Learning in Healthcare - 1 unit
  • BIS 634 Computational Methods for Informatics** - 1 unit
  • BIS 646 Nonparametric Statistical Methods and their Applications** - 1 unit
  • S&DS 517 Applied Machine Learning and Casual Inference - 1 unit
  • S&DS 565 Introductory Machine Learning - 1 unit
  • S&DS 631 Optimization and Computation – 1 unit
  • S&DS 632 Advanced Optimization Techniques - 1 unit
  • S&DS 665 Intermediate Machine Learning - 1 unit
  • CB&B 555 Unsupervised Learning for Big Data - 1 unit
  • CB&B 567 Topics in Deep Learning: Methods & Biomedical Applications - 1 unit
  • CB&B 663 Deep Learning Theory and Applications - 1 unit
  • CB&B 745 Advanced Topics in Machine Learning - 1 unit
  • CPSC 552 Deep Learning Theory and Applications - 1 unit

Electives in Databases (1 course unit)

Take one or more of the following:
  • BIS 550/CB&B 750 Topics in Biomed Informatics and Data Science** - 1 unit
  • BIS 638 Clinical Database Management Systems and Ontologies - 1 unit
  • BIS 679 Advanced Statistical Programming in SAS & R - 1 unit
  • CPSC 537 Introduction to Database Systems - 1 unit
  • MGT 660 Advanced Management of Software Development - 1 unit

Electives (2 course units)

Take two additional course units from either the machine learning list, the databases list, or in BIS, CB&B or S&DS. Other courses from YSPH, CPSC, or another department may be acceptable if given permission from the Data Science Pathway Director.

Other Courses

  • BIS 649/BIS 650 Master’s Thesis Research - 2 units
    Students choosing this option must present their research in a public seminar to graduate

Students should take 4 courses for credit each semester (BIS 525/526, EPH 600, EPH 100/101 are not for credit). Course schedules with more than 5 courses for credit will not be approved. Courses listed without a notation in the “term taken” column can be taken in either year of the program if prerequisites are met and with advisor approval.

* Students entering the program with an MPH or relevant graduate degree may be exempt from this requirement.

** These courses can only be counted to fulfill the requirement of one category (they cannot be counted twice in fulfillment of requirements)

Rev. 09.08.22