6.7930[6.871]/HST.956: Machine Learning for Healthcare

Instructors: Peter Szolovits, Manolis Kellis
Teaching Assistants: Hussein Mozannar, Eric Lehman
Graduate level; Units 4-0-8 (counts as an AUS2, AAGS, and II subject; also a EECS AI TQE)
Lectures: Tuesdays & Thursdays, 2:30-4:00pm Eastern Time, 35-225
Recitations (required): Friday, 3:00-4:00pm, 4-270
Prerequisite: 6.3900[6.036] or 6.7900[6.867] or 9.520/6.7910[6.860] or 6.8611/6.8610[6.806/6.864] or 6.4102/6.4100[6.438/6.034] or equivalent machine learning class from other schools
(Subscripted bracketed numbers are the class numbers before this year's mass renumbering of all EECS classes.)
Office Hours: Monday 11:00am-12:00pm (32-370), Friday 4:00-5:00pm (36-156)
Contact staff: mlhc2023@googlegroups.com

Course Description | Schedule | Problem sets | Projects | Late Policy | Prior Years


Course description

Introduces students to machine learning in healthcare, including the nature of clinical data and the use of machine learning for risk stratification, disease progression modeling, precision medicine, diagnosis, subtype discovery, and improving clinical workflows. Topics include causality, interpretability, algorithmic fairness, time-series analysis, graphical models, deep learning and transfer learning. Guest lectures by clinicians from the Boston area and course projects with real clinical data emphasize subtleties of working with clinical data and translating machine learning into clinical practice.


The schedule is tentative, and will be updated as the class proceeds.

Please visit Canvas page to access all slide decks, recitation materials, and more.

Class Date Lecture & Materials Assignments
Overview of Clinical
Care & Data
1 Tue, Feb 7
Introduction: What makes healthcare unique? [slides]
Week 1 reading :

Thu, Feb 9
Overview of Clinical Care [slides]
Tue, Feb 14
Overview of Clinical Data Science [slides]
Week 2 reading

Thu, Feb 16
Cautionary Tales; Discussion of Project Ideas; Guest speaker Dr. Leo Anthony Celi [slides]
ML with Clinical Text,
Imaging, Physiological,
and Administrative Data

Tue, Feb 21
No class -- Monday schedule of classes
5 Thu, Feb 23
Risk Stratification from Structured Health Data [slides]
Razavian N, Blecker S, Schmidt AM, Smith- McLallen A, Nigam S, Sontag D (2015) Population-level prediction of type 2 diabetes from claims data and analysis of risk factors. Big Data 3:4, 277-287, DOI: 10.1089/big.2015.0020.
Population-Level Prediction of Type 2 Diabetes From Claims Data and Analysis of Risk Fa

6 Tue, Feb 28
Risk Stratification (continued); Physiological Time-Series [slides]

Thu, Mar 2
Intro to Clinical NLP [slides]
Beam AL, Kompa B, Schmaltz A, Fried I, Weber G, Palmer N, et al. Clinical Concept Embeddings Learned from Massive Sources of Multimodal Medical Data. In: Biocomputing 2020. Kohala Coast, Hawaii, USA: WORLD SCIENTIFIC; 2019. p. 295-306.

Tue, Mar 7
Contemporary Clinical NLP Methods [slides]

Thu, Mar 9
Survival Analysis, Censoring, Proportional Hazard Models [slides]

Tue, Mar 14
Cancelled because of storm shutdown

Causal Inference
Thu, Mar 16
Human-AI Collaboration in Clinical ML [slides]
Tue, Mar 21
Causal Inference, Conditional Treatment Effects [slides]
Thu, Mar 23
Causal Inference, continued, and Intro to Reinforcement Learning [slides]

Mar 27-31
Spring Break -- No classes

Tue, Apr 4
Dataset Shift [slides]

Real World Deployment Challenges
Thu, Apr 6
Regulation, Law and Deployment; Guest lecture from BU/MIT Technology Law Clinic (Chris Conley, Sophie Volpe, Lucas Batties) [slides for FDA and SaMD] [slides for FDA and CDSS]

Tue, Apr 11
Learning with Noisy Labels, Unsupervised Learning Applications, Weak Supervision [slides]

Thu, Apr 13
Privacy and Confidentiality [slides]

Tue, Apr 18
Interpretability [slides]

Thu, Apr 20
Genomics, Genetics, Cohort data, personalized predictions, Polygenic risk scores, rare variant prediction, pre-natal testing, ethics [slides]

Tue, Apr 25
Intro to ML for Medical Imaging: algorithms & applications; Guest lecture by Alex Goehler, Novartis [slides]

Thu, Apr 27
Fairness; Guest lecture by Marzyeh Ghassemi
Tue, May 2
Genetics + EHR integration through eQTLs, patient subtyping, functional genomics, deep learning for multi-modal integration [slides]
Thu, May 4
Visualization and User Interfaces; Guest lecture by Arvind Satyanarayan [slides]

Tue, May 9
Deep learning for drug design, and integration with patient modeling, and target identification

Thu, May 11
Rare disease vs. common disease matching and drug prioritization

Tue, May 16
Student Project Presentations -- Grier Room (34-401), using posters

May 19 9am-noon
Final Exam -- Johnson Athletic Center, Track level

Problem sets

The problem sets pdfs are available here (not that some data for the problem sets is not publicly available): Some of the recitations are available here:


We will release the project guidelines and detailed information in Canvas.

Late Policy

(starting for pset1 onwards)


Prior years of this course