Home

Welcome to HST 953/6.8850!

Clinical Data Learning, Visualization, and Deployments

HST 953/6.8850: Clinical Data Learning

Artificial intelligence (AI) has the potential to transform healthcare worldwide. bearing promises of increased accuracy, efficiency, and cost-effectiveness, in areas as diverse as drug discovery, clinical diagnosis, and disease management. Furthermore, AI has been promoted as a tool that could expand the reach of quality healthcare to traditionally underserved patients and regions. But even with appropriate representation of marginalized communities with high quality data, the social patterning of the data generation process can still produce AI that is bound to preserve and even scale existing disparities in care with resulting inequities in patient outcomes. Creating algorithms from the digital exhaust of flawed human systems by AI developers who are not cognizant of the backstory of the data, risks cementing inequities as permanent fixtures in healthcare delivery systems. This course will introduce students to a portfolio of methodologies that learn patterns from the data. More importantly, it will explore data issues which if not addressed will have profound consequences on downstream prediction, classification and optimization tasks.

Instructors
Marzyeh Ghassemi
Leo Anthony Celi
Adam Rodman
Ned McCague

Teaching Assistant
Alessandro Hammond
Omar Dahleh

Course Email: hst953faculty@mit.edu

Credits: 12 MIT credits; 5 Harvard credits

Fridays from 9:30 AM - 12PM
E25-117

Useful Links

Syllabus

OH Zoom link

Overview

HST.953/6.8850 is a course about the practical considerations for operationalizing machine learning in healthcare settings.

The course will involve three homework assignments (one each on dataset creation, machine learning/visualization and implementation) followed by a course project proposal and presentation.

All students are required to complete human subjects training and submit proof of access for MIMIC-III and the eICU-CRD databases.
All students regardless of their enrollment status are expected to join a project group and contribute to a final project.

HST.953/6.8850 is not intended to teach graduate machine learning or visualization skills to students, and we expect that students will have some working knowledge of both in order to complete homework assignments and the project.

We recommend the following courses, or some equivalent experience with subject matter in ML, visualization and HCI:

Recommended Courses:
CS Grad ML 6.867
CS Grad ML in Health 6.S897 / HST.956
CS Grad Visualization 6.813

We will start with a primer on machine learning concepts including but not limited to cross-validation, data leakage, benchmarks, performance metrics, and fairness evaluation. Publicly available high-resolution datasets (not registries) will be leveraged.

Other semester activities:

1. The Bias-athon is designed to address and mitigate biases in artificial intelligence (AI) systems. This workshop will leverage interdisciplinarity to identify, understand, and develop strategies to understand biases in clinical AI datasets. Participants will engage in hands-on sessions where they explore various types of biases, such as measurement bias, and variation in the degree of monitoring from social determinants of care, and their impact on AI performance.

2. A prompt-athon and red teaming will focus on enhancing the effectiveness and reducing the bias of large language models. This workshop is designed for clinicians who are already or who are thinking of using these tools for summarizing patient course, drafting content for progress notes and letters to other providers and to the patients, and soliciting differential diagnoses, treatment recommendations and prognostication. Participants will be introduced to various prompt engineering techniques that can leverage the power of this technology. Through collaborative exercises, attendees will experiment with different types of prompts, analyze the outputs, and refine their strategies to achieve better results. The event will also include discussions on the challenges of prompt design, such as avoiding ambiguity and ensuring context-appropriateness.

3. The Health AI Systems Thinking for Equity (HASTE) Policy Workshop is organized to explore the regulatory and ethical frameworks surrounding the use of AI technologies. Sessions will cover a range of topics, including transparency and accountability, power structures and the political economy that drives the impact of AI. Participants will engage in brainstorming and dialogue and propose solutions to complex policy issues. The goal is to engender a systems thinking mindset among developers and users of AI to improve population health.

Grading

Weekly Reflections: the weekly reflections, corresponding to Week 2 - Week 10, will be done as a Canvas discussion, are due before class, and are worth 1 point (1.67% of your grade) per week. This means that reflections are worth a total of 15% of your grade.

Three Problem Sets: Problem sets 1, 2, and 3 are each worth 10 points, or 16.67% of your grade. This means problem sets are worth a total of 50% of your grade.

Course Final Project: The submission of the project teams is worth 1 point (1.67% of your grade), the final project presentation is worth 10 points (16.67% of your grade) and the final project write up is worth 10 points (16.67% of your grade).

Plagiarism: Student code submissions may be submitted by the instructors to a plagiarism detection tool for a review of similarity and detection of possible plagiarism. Submissions will be used solely for the purpose of detecting similarity, and are not retained indefinitely on the server; typically results are deleted after 14 days but may be removed sooner. For more information on the tool used, refer to https://theory.stanford.edu/~aiken/moss/.

Schedule

Week	Date	Lecture	Materials	Assignments
1/DATA	Sept 6, 2024	9:30 - 10:10 Dr. Marzyeh Ghassemi "Ethical Machine Learning in Health" 10:20 - 11:20 Course Staff "Course Overview" --- BREAK --- 11:30 - 12:30 Catherine Bielick "The problems in healthcare"	[Slides] Readings: A Review of Challenges and Opportunities in Machine Learning for Health Ethical Machine Learning in Healthcare Do no harm: a roadmap for responsible machine learning for health care	PSET 1 Out Group Project Assigned
2/DATA	Sept 13, 2024	9:40 - 10:40 Mohammad Mamdani on Challenges of implementation --- BREAK --- 10:50 - 11:50 Adam Rodman on Project Implementation --- BREAK --- 12:00 - 12:30 Jack Gallifant on Ethical Considerations in data collection and use	[Slides] Readings: MIMIC-III, a freely accessible critical care databaseLinks to an external site. Reproducibility in machine learning for health research: Still a ways to goLinks to an external site. Extra Material: Deep Dive Into MIMIC IV [Video]Links to an external site. [Code]Links to an external site. Going through a study done in MIMIC-IV [Video]Links to an external site. [Code]Links to an external site.	Reflection 1 Due
3/DATA	Sept 20, 2024	Lecture by Adam Rodman and Jack Gallifant	[Slides] Readings: Recurrent Neural Networks for Multivariate Time Series with Missing ValuesLinks to an external site. Making the Most of Text Semantics to Improve Biomedical Vision--Language ProcessingLinks to an external site. Multimodal fusion with deep neural networks for leveraging CT imaging and electronic health record: a case-study in pulmonary embolism detection	Reflection 2 Due
4/ML.VIS	Sept 27, 2024	9:30 - 10:40 Adam Rodman - "Representation learning in ML and Health" --- BREAK --- 10:50 - 11:50 Takashi and Rodrigo on TRIPOD-AI exercise	Readings: Data-Driven Healthcare: Challenges and Opportunities for Interactive VisualizationLinks to an external site. CarePre: An Intelligent Clinical Decision Assistance SystemLinks to an external site. PhenoLines: Phenotype Comparison Visualizations for Disease Subtyping via Topic ModelsLinks to an external site. An algorithmic approach to reducing unexplained pain disparities in underserved populationsLinks to an external site.	PSET 1 Due Reflection 3 Due
5/ML.VIS	Oct 4, 2024	9:30 - 10:40 Adam Rodman on Model evaluation: health system --- BREAK --- 10:50 - 12 Tom Pollard on Data Sharing	Lecture Slides Links to an external site. Readings: Generating high-fidelity synthetic patient data for assessing machine learning healthcare softwareLinks to an external site. Synthetic Data – Anonymisation Groundhog DayLinks to an external site.	PSET 2 Out Final Project Signup Due Reflection 4 Due
6/ML.VIS	Oct 11, 2024	9:30 - 10:40 Matthew McDermott on Model evaluation: ML perspective --- BREAK --- 10:50 - 11:50 Leo Anthony Celi on HASTE policy camp	Lecture Slides Links to an external site. Readings: Leveraging Factored Action Spaces for Efficient Offline Reinforcement Learning in HealthcareLinks to an external site. Clinician-in-the-Loop Decision Making: Reinforcement Learning with Near-Optimal Set-Valued PoliciesLinks to an external site. Moving towards vertically integrated artificial intelligence developmentLinks to an external site.	Reflection 5 Due
7/IMP	Oct 18, 2024	9:40 - 10:40 Jim Smit on Causal machine learning --- BREAK --- 10:50 - 12 pm Shalmali Joshi on Causal machine learning	Lecture Slides Links to an external site. Readings: Interpretable Operations Research for High-Stakes Decisions: Designing the Greek COVID-19 Testing SystemLinks to an external site. Efficient and targeted COVID-19 border testing via reinforcement learningLinks to an external site. Evaluation of machine learning solutions in medicineLinks to an external site. Problems in the deployment of machine-learned models in health careLinks to an external site. Implementing machine learning in medicineLinks to an external site.	Reflection 6 Due
8/IMP	Oct 25, 2024	9:30 - 10:40 Leo Celi --- BREAK --- 10:50 - 11:10 Liam McCoy --- BREAK --- 11:10-11:50 Midpoint Presentation	Lecture Slides Links to an external site. mid-point presentation is scheduled for Friday 25^th October 2024. Machine Learning in Medicine. The Clinician and Dataset Shift in Artificial Intelligence. Minimum information about clinical artificial intelligence modeling: the MI-CLAIM checklist.	PSET 2 Due PSET 3 Out Reflection 7 Due
9/IMP	Nov 1, 2024	9:30 - 10:40 Amol Verma and Gabe Brat --- BREAK --- 10:50-11:50 Team Work on Projects	Lecture Slides Readings: Why Doctors Hate Their Computers PhenoLines presentation. Overview of UK Biobank Ethical Machine Learning in Healthcare	Reflection 8 Due
10/IMP	Nov 8, 2024	9:30 - 10:40 Heather Mattie --- BREAK --- 10:50 - 11:50 Ned McCague	Lecture Slides #1 Links to an external site. Lecture Slides #2 Links to an external site. Readings: What Artificial Intelligence Means for Health Care Links to an external site. Revolutionizing healthcare: the role of artificial intelligence in clinical practice Links to an external site. AI is Already Reshaping Care: Here's What it Means for Doctors Links to an external site.	Reflection 9 Due
11	Nov 15, 2024	9:30 - 10:40 Charlotta Lindvall --- BREAK --- 10:50 - 11:50 Thomas Souneck		PSET 3 Due
12	Nov 22, 2024	Leo Anthony Celi on Haste Policy Camp Leo Anthony Celi on Haste Policy Camp	Readings: Explainable artificial intelligence in breast cancer detection and risk prediction: A systematic scoping review Links to an external site. The ethical issues of the application of artificial intelligence in healthcare: a systematic scoping review Links to an external site. Artificial Intelligence in Health Care: A Report From the National Academy of Medicine Links to an external site.	Reflection 10 Due
13	Nov 29, 2024	THANKSGIVING WEEK, Project Work Week
14	Dec 6, 2024	Final Presentations	Final Projects to be Presented in Class Slides should be sent by 9 am on Dec 6th. Each team (19 total) will present for 8 minutes in class. All team members are expected to be present, a subset may present. If a specific block of class time is required, let instructors know.	Final Project Presentations Final Project Reports Due

Project Details

Projects and Authorship

A note on collaboration: Research is a collaborative activity and we encourage all students to collaborate and learn from each other. In general, when you put your name on something for research, you must: a) have materially contributed to the work, b) be able to defend the
research, and c) acknowledge the contribution of others. Keep this in mind when working together and submitting material for evaluation.

A note on authorship: As noted, the expectation is that by the end of the course the final project will be sufficiently developed to submit to a peer-reviewed journal. The author order can be a somewhat controversial issue and is left to the project participants to decide. We would strongly encourage you to discuss what the order will be, or what philosophy you will use to decide the order while forming groups. In the case of a dispute during or after the course, the instructors will likely not be able to mediate in any meaningful way. We would also recommend equal authorship (now more common), but the decision is left to each team.

For the clinicians: If you expect a certain level of authorship (first, last, etc.) you should mention this in your project pitch. Keep in mind that this is a two-way street involving both clinicians and data scientists. If a project fails to garner enough interest, it may not be able to be completed as part of the course.

A note on acknowledgement: Papers that result from work done during this course should recognize the contributions of the course in an acknowledgement or in other sections. The suggested language is: "This manuscript was composed by participants in the HST.953 course at the Massachusetts Institute of Technology, Fall 2022.'"

Computing Credits

Google Cloud Credits
Supercloud.mit.edu
Local compute (your lab)