6.7930_[6.871]/HST.956: Machine Learning for Healthcare

Graduate level; Units 4-0-8 (counts as an AUS2, AI+D_AUS, AAGS, and II subject; also a EECS AI TQE)
Instructors: Peter Szolovits, David Sontag
Teaching Assistants: Ilker Demirel, Sophie Guo
Lectures: Tuesdays & Thursdays, 2:30-4:00pm Eastern Time, 4-270
Recitations (required): Friday, 3:00-4:00pm, 4-270
Prerequisites: 6.3900_[6.036] or 6.7900_[6.867] or 9.520/6.7910_[6.860] or 6.8611/6.8610_{[6.806/6.864]} or 6.4102/6.4100_{[6.438/6.034]} or equivalent machine learning class. (Subscripted bracketed numbers are the class numbers before the recent mass renumbering of all EECS classes.)

Office Hours:

When	Where	Who
Tues 4:30-5:30	26-314	Sophie
Wed 12:00-1:00	26-314	Ilker

Contact staff: mlhc25@mit.edu

Announcements

This week's reading submission is due Wednesday (4/16)! Remember to submit your questions for our fireside chat guest. (Refer to readings for Thursday 4/17 lecture)
The final exam is scheduled to be held in 4-270 on Fri 05/16/2025, 9:00 AM–12:00 PM.
Welcome to the class! We look forward to seeing you on the first day.
Please sign up for Piazza. All course communication, including questions to the instructors, should go through Piazza.
Take this prerequisite self-assessment to ensure you have the relevant background for the class.
Recitation will be required. This class will have a final exam, but no midterm.
We expect you to attend all lectures and recitations, and will do a virtual roll call for each using the following links:

When readings are posted for a lecture, we expect you to provide us a brief summary of that reading by the day after the corresponding lecture. Three relevant bullet points are sufficient. Please submit these at the following link:

Reading summary submission

Course description

Introduces students to machine learning in healthcare, including the nature of clinical data and the use of machine learning for risk stratification, disease progression modeling, precision medicine, diagnosis, subtype discovery, and improving clinical workflows. Topics include large language models, causality, interpretability, algorithmic fairness, time-series analysis, graphical models, deep learning, transfer learning, genomics, and computational biology. Guest lectures by clinicians from the Boston area and course projects with real clinical data emphasize subtleties of working with clinical data and translating machine learning into clinical practice.

Schedule

The schedule for classes is under revision and is still in draft form.

	Class	Date	Lecture & Materials	Assignments
Overview of Clinical Care & Data	1	Tue, Feb 4	Introduction: What makes healthcare unique? Reading (Due Fri, 2/7 1:00pm ET): AI in health and medicine Helpful optional readings: The AI Revolution in Medicine: GPT-4 and Beyond Peter Lee, Carey Goldberg, Isaac Kohane, 2023 (Available for free digitally via MIT Libraries)	PS 0 out
	2	Thu, Feb 6	Overview of Clinical Care Reading (Due Fri, 2/7 1:00pm ET): Machine Learning in Medicine
	3	Tue, Feb 11	Overview of Clinical Data Science Reading: No Reading (start Thursday's reading)	PS 1 out
ML with Clinical Text, Imaging, Physiological, and Administrative Data	4	Thu, Feb 13	ML for Risk Stratification: focus on structured EMR data Reading (Due Fri 1pm ET): Factors Driving Provider Adoption of the TREWS Machine Learning-Based Early Warning System and its Effects on Sepsis Treatment Timing
		Tue, Feb 18	No class -- Monday schedule of classes
	5	Thu, Feb 20	Risk Stratification and Physiological Time-Series Reading: Constrained transformer network for ECG signal processing and arrhythmia classification
	6	Tue, Feb 25	LLMs 1: differential diagnosis, question answering, treatment planning Reading: Towards Expert-Level Medical Question Answering with Large Language Models
	7	Thu, Feb 27	LLMs 2: information extraction and summarization Required Reading: Retrieval-Augmented Generation–Enabled GPT-4 for Clinical Trial Screening Optional Reading: Manual vs AI-Assisted Prescreening for Trial Eligibility Using Large Language Models—A Randomized Clinical Trial	PS 2 out
	8	Tue, Mar 4	Guest Lecture by Leo Celi: From AI Bias to AI by Us, For All of Us
	9	Thu, Mar 6	Survival Analysis, Censoring, Proportional Hazard Models Reading: Deep Cox Mixtures for Survival Regression Optional Reading: Survival Analysis Part I: Basic concepts and first analyses Survival Analysis Part II: Multivariate data analysis – an introduction to concepts and methods
Causal Inference	10	Tue, Mar 11	Causal inference 1: Causal graphs, potential outcomes, covariate adjustment Reading: Using Big Data to Emulate a Target Trial When a Randomized Trial Is Not Available Optional Reading: A Distillation Approach to Data Efficient Individual Treatment Effect Estimation
	11	Thu, Mar 13	Causal inference 2: Assumptions for causal inference, inverse propensity weighting Reading: Use of Machine Learning to Assess the Management of Uncomplicated Urinary Tract Infection Optional Reading: Causal Reasoning and Large Language Models: Opening a New Frontier for Causality	PS 3 out
	12	Tue, Mar 18	Causal inference 3: Policy learning and dynamic treatment regimes Reading: Guideline-Based Physical Activity and Survival Among US Men With Nonmetastatic Prostate Cancer Optional Reading: Guidelines for reinforcement learning in healthcare
	13	Thu, Mar 20	Dataset and temporal shift: detection and mitigation Reading: Large-Scale Study of Temporal Shift in Health Insurance Claims
Real World Deployment Challenges		Mar 24-28	Spring Break -- No classes
	14	Tue, Apr 1	Multi-modal modeling of text and imaging data Reading: Learning Transferable Visual Models From Natural Language Supervision
	15	Thu, Apr 3	Guest lecture: Faisal Mahmood (Computational Pathology) Reading: A multimodal generative AI copilot for human pathology. Optional readings: Towards a general-purpose foundation model for computational pathology. Multimodal prototyping for cancer survival prediction.
	16	Tue, Apr 8	Interpretability & explainability Reading: Interpretable Machine Learning: A Guide for Making Black Box Models Explainable Read Chapters 5 and 9	PS4 out
	17	Thu, Apr 10	Regulation of AI in healthcare Reading: Skating the Line Between General Wellness Products and Regulated Devices: Strategies and Implications Read Introduction and Section 1
	18	Tue, Apr 15	Human-AI collaboration in decision making
	19	Thu, Apr 17	Fireside chat with Brian Anderson (CEO of Coalition for Health AI) Readings: (Submit before lecture, by Wed, Apr 16) A Nationwide Network of Health AI Assurance Laboratories. The CHAI Applied Model Card.	PS 4 Due
	20	Tue, Apr 22	Privacy: differential privacy, federated learning, synthetic data Reading (Submit by 4/25) Federated learning for predicting clinical outcomes in patients with COVID-19.
	21	Thu, Apr 24	Guest lecture from Nathan Silberman (CTO, Artera AI; previously PathAI and Butterfly Network). See Canvas for slides. Optional Readings: Are Open Source Pathology Foundation Models ready for the clinic?. Discrepancy Ratio: Evaluating Model Performance When Even Experts Disagree on the Truth.
	22	Tue, Apr 29	Disease subtyping & progression modeling Reading: Cluster Analysis and Clinical Asthma Phenotypes.
	23	Thu, May 1	Guest Lecture: Marzyeh Ghassemi (The Pulse Of Ethical Machine Learning) Reading: Ethical machine learning in healthcare.
	24	Tue, May 6	Guest Lecture: Jim Collins (Machine Learning Approaches to Antibiotic Discovery) Reading: A Deep Learning Approach to Drug Discovery
	25	Thu, May 8	Guest Lecture: Manolis Kellis
	26	Tue, May 13	Student Project Presentations (2:30-5pm in 32-G882, Hewlett; open to MIT community)
		Fri, May 16	Final Exam (9am-12pm in 4-270)

Reading Assignments

Many of the lectures are associated with related papers that should help you think about the lecture topic. For each reading assignment, you are expected to submit a brief summary of the three most important ideas of the paper, as short bullet points.

Problem sets

The problem sets pdfs will be available here (not that some data for the problem sets is not publicly available):

Projects

We are recruiting a group of doctors with interesting clinical problems to mentor teams of students who will work on them. We will form teams and match to problems/mentors a few weeks into the class. We will release the project guidelines and detailed information in Canvas.

Grading

40% course project
30% homework (~4 problem sets; both theory & practice)
20% final exam
5% participation – note: class attendance is required (with obvious exceptions for illness, etc.)
5% reading responses

Late Policy (starting for pset1 onwards)

[4 "slack" days] We understand that sometimes things outside one's control prevent submitting by the deadline. As such, each student is given 4 "slack" days that they can use throughout the semester (e.g. you could submit four psets one day late each or you could submit one pset three days late and another one day late) without a late penalty. The days do not subdivide into sub-day units: 2 hours late would spend one of the slack days without 22 hours of "rollover". In your pdf writeup, specify how many slack days you are using (they cannot be used retroactively).
[10% off per unexcused late day.] If you submit a pset 3 days late and use 1 slack day, then this is 2 unexcused late days, which translates to 20% off your homework.
[Max 4 days late.] Homework will not be accepted after 4 days late, regardless of how many slack days are used, absent communication from S3 or OGE.
[write on homework] In order to use a slack day, students must include it in writing on their submission pdf. Otherwise, TAs will assume no slack days used and deduct 10% for each late day.

Scenarios:

Sam uses 2 slack days on HW3. This is the first time Sam has used any slack days. Sam now has 2 remaining slack days and receives her homework score with no penalty.
Jamie uses 1 slack day on HW3 but submits 52 hours after the deadline. Therefore Jamie is 3 days late (rounded up) and receives 20% off the graded homework. This is the first time Jamie has used any slack days, so Jamie now has 1 slack day remaining.
Cory has never used any slack days, but submits HW5 100 hours late with no note / communication from S3 or OGE. This is more than 4 days after the deadline, so the work is not accepted, and Cory receives a zero.

Use of Generative AI

You may choose to use generative AI tools to help think through problems, but not to produce solutions to the problem sets. In your problem set solutions, you must indicate any external resources consulted, including Generative AI. The course staff reserves the right to ask you to explain the rationale for any answer if it appears that it is not original to you. Pedagogically, you will learn much more from working through any problem on your own, and this will be reflected in your final exam scores, which will be closed-book and without access to any Generative AI.

6.7930[6.871]/HST.956: Machine Learning for Healthcare