EECS6.S982/HST.953 Clinical Data Learning, Visualization, and Deployments

EECS S982/ HST 953 - Clinical Data Learning, Visualization, and Deployments

Welcome to HST 953/ EECS S982! 

Clinical Data Learning, Visualization, and Deployments


6.S982/HST.953 Fall 2022

Lecture: Friday, 9:30 AM - 12:30 PM, E25-117 (E25-119/121 as a spillover room)

OH: Tuesday, 5:00 PM - 6:00 PM, E25-119/121

Instructors: Dr. Marzyeh Ghassemi, Dr. Leo Celi

Course Staff: Dr. Eric Gottlieb, Dr. Ned McCague, Dr. Kenneth Paik

Course Email: hst953faculty@mit.edu

TA: Abbas Zeitoun (zeitoun@mit.edu)


Useful Links

Syllabus Download Syllabus

Lecture Zoom link: https://mit.zoom.us/j/98390254516 Links to an external site.

OH Zoom link:  https://mit.zoom.us/j/94596219565 Links to an external site.

Scribe signup form: https://forms.gle/jvGNcmv39dds2yu18 Links to an external site.

Homework collaborator matching form: https://forms.gle/1PLgTvikCrFTLb2W9 Links to an external site.


Overview

6.S982/HST.953 is a course about the practical considerations for operationalizing machine learning in healthcare settings. We begin the course with a focus on dataset creation (DATA), robust, private and fair machine learning and visualization (ML/VIS) using real retrospective healthcare data that target utility and clinical value. Finally, we explore the intermediate "implementation science" (IMP) tying together how real models might be potentially used through a visual system by practicing clinical staff.

The course will involve three homework assignments (one each on dataset creation, machine learning/visualization and implementation) followed by a course project proposal and presentation.

  • All students are required to complete human subjects training and submit proof of access for MIMIC-III and the eICU-CRD databases.
  • All students regardless of their enrollment status are expected to join a project group and contribute to a final project.

6.S982/HST.953 is not intended to teach graduate machine learning or visualization skills to students, and we expect that students will have some working knowledge of both in order to complete homework assignments and the project. 

We recommend the following courses, or some equivalent experience with subject matter in ML, visualization and HCI:

Recommended Courses:
CS Grad ML 6.867
CS Grad ML in Health 6.S897 / HST.956
CS Grad Visualization 6.813


Grading

Weekly Reflections/edX: While everyone might benefit from both edX content and weekly readings, you will be asked at the beginning of the semester to choose between completing weekly reflections or completing the Collaborative Data Science for Healthcare edX course.

  • The weekly reflections, corresponding to Week 2 - Week 10, will be done as a Canvas discussion, are due before class, and are worth 1 point (1.67% of your grade) per week. This means that reflections are worth a total of 15% of your grade.
  • Alternatively, you may choose to complete the Collaborative Data Science for Healthcare course throughout the semester to earn that 15% of your grade. The course may be accessed here: https://www.edx.org/course/collaborative-data-science-for-healthcare Links to an external site.

Three Problem Sets: Problem sets 1, 2, and 3 are each worth 10 points, or 16.67% of your grade. This means problem sets are worth a total of 50% of your grade. 

Course Final Project: The submission of the project teams is worth 1 point (1.67% of your grade), the final project presentation is worth 10 points (16.67% of your grade) and the final project write up is worth 10 points (16.67% of your grade).

Plagiarism: Student code submissions may be submitted by the instructors to a plagiarism detection tool for a review of similarity and detection of possible plagiarism. Submissions will be used solely for the purpose of detecting similarity, and are not retained indefinitely on the server; typically results are deleted after 14 days but may be removed sooner. For more information on the tool used, refer to https://theory.stanford.edu/~aiken/moss/.


Extra Credit

Scribing: We have a weekly extra credit of 1 point for Latexing/scribing the lectures for the week, and submitting them to the class for review. A maximum of 5 extra credit points can be earned per student.


Schedule

Week Date Lecture Materials Assignments
1/DATA Sept 9, 2022
  • 9:30 - 10:10 Dr. Marzyeh Ghassemi "Course Introduction and Overview"
  • 10:20 - 11:20 Ned McCague "Project Overview; Announcing Your Project Matches"
                             

    --- BREAK ---


  • 11:30 - 11:50 PITCHES
  • 12:00 - 12:15 Dr. Leo Celi "Past Experiences with HST 953 and Course Advice"
  • 12:15 - 12:30 Abbas Zeitoun "Homework Datasets and Benchmarks"
  • [Slides]

Readings:

  • PSET 1 Out
  • Group Project Assigned
2/DATA Sept 16, 2022
  • 9:40 - 10:40 Dr. Alistair Johnson "Creating Clinical Databases For Machine Learning"

    --- BREAK ---
  • 10:50 - 11:50 Dr. Tom Pollard "Focus on De-identification and Standardization"

    --- BREAK ---

  • 12:00 - 12:30 TBD ON Ethical Considerations in data collection and use

Readings:

Extra Material:

 

3/DATA Sept 23, 2022

   No class, talks will be rescheduled for another day

  • Reflection 2 Cancelled
4/ML.VIS Sept 30, 2022
  • 9:40 - 10:40 Dr. Stephanie Hyland -  "Representation learning in ML and Health"

    --- BREAK ---

  • 10:50 - 11:50 Dr. Tianxi Cai "Improving Real World Evidence: Reproducibility and Interoperability" 

    --- BREAK ---

  • 12:00 - 12:30 Dr. Andrew Beam "Healthcare Data Horror Stories" 

Readings:

5/ML.VIS Oct 7, 2022
  • 9:40 - 10:40 Dr. Fanny Chevalier - "Visualization in Healthcare"

    --- BREAK ---

  • 10:50 - 11:50 Johnathon Sellors "UK Biobank: A study in longitudinal health data creation and curation."

    --- BREAK ---
  • 12:00 - 12:30 Senthil Nachimuthu "Nightingale: Data by the doctors, for the people."

Readings:

Supplemental Readings and Videos (for fun, not for the weekly reflection):

6/ML.VIS Oct 14, 2022
  • 9:40 - 10:40 Dr. Tristan Naumann - "NLP for Health Case Study"

    --- BREAK ---

  • 10:50 - 11:50 Theresa Stadler "Synthetic data generation and use: How not to make a mess"

    --- BREAK ---

  • 12:00 - 12:30 Project Check-in times

Readings:

7/IMP Oct 21, 2022
  • 9:40 - 10:40 Shengpu Tang "Augmenting Clinical Decision Making with Reinforcement Learning."

    --- BREAK ---

  • 10:50 - 11:50 Joe Zhang from Imperial "Practical considerations and lessons learnt from clinical AI deployment in the UK"

    --- BREAK ---

  • 12:00 - 12:30 Project Check-in Time with Course Staff

Readings:

Supplemental Readings and Videos (for fun, not for the weekly reflection):

8/IMP Oct 28, 2022
  • 9:40 - 10:40 Dr. Muhammad Mamdani "Applied Artificial Intelligence in Health: From Research to Application"

    --- BREAK ---

  • 10:50 - 11:50 Dr. Katherine Heller "Using Real ML for Real: How to deploy without failing"

    --- BREAK ---

  • 12:00 - 12:30 Dr. Hamsa Bastani "Deploying Models to Help in Crisis: Screening in Migration"

Readings:

9/IMP Nov 4, 2022
  • 9:40 - 10:40 John Halamka "Designing Data Networks to Enable System-Wide Knowledge Discovery"

    --- BREAK ---

  • 10:50 - 11:50 Vivian Neilley "Why it's smart to be on FIHR."

    --- BREAK ---

  • 12:00 - 12:30 "Mid-Point Project Presentations, Part 1"

Supplemental Readings and Videos (for fun, not for the weekly reflection):

10/IMP Nov 11, 2022
  • VETERAN'S DAY, Project Work Week, In-class Help

 

11 Nov 18, 2022
  • 9:40 - 10:40 Dr. Zak Kohane of Harvard, "How Precise is Precision Medicine?"

    --- BREAK ---

  • 10:50 - 11:50 Manu Tandon, "Creating an Information System for Better Care at BI."

    --- BREAK ---

  • 12:00 - 12:30 "Mid-Point Project Presentations, Part 2"

Slides

12 Nov 25, 2022
  • THANKSGIVING WEEK, Project Work

 

13 Dec 2, 2022

 TBD, Most Likely Project Work Week, In-class Help

 

14 Dec 9, 2022
  • Final Presentations

 


Project Details

Projects and Authorship

A note on collaboration: Research is a collaborative activity and we encourage all students to collaborate and learn from each other. In general, when you put your name on something for research, you must: a) have materially contributed to the work, b) be able to defend the
research, and c) acknowledge the contribution of others. Keep this in mind when working together and submitting material for evaluation.

A note on authorship: As noted, the expectation is that by the end of the course the final project will be sufficiently developed to submit to a peer-reviewed journal. The author order can be a somewhat controversial issue and is left to the project participants to decide. We would strongly encourage you to discuss what the order will be, or what philosophy you will use to decide the order while forming groups. In the case of a dispute during or after the course, the instructors will likely not be able to mediate in any meaningful way. We would also recommend equal authorship (now more common), but the decision is left to each team.

For the clinicians: If you expect a certain level of authorship (first, last, etc.) you should mention this in your project pitch. Keep in mind that this is a two-way street involving both clinicians and data scientists. If a project fails to garner enough interest, it may not be able to be completed as part of the course.

A note on acknowledgement: Papers that result from work done during this course should recognize the contributions of the course in an acknowledgement or in other sections. The suggested language is: "This manuscript was composed by participants in the HST.953 course at the Massachusetts Institute of Technology, Fall 2022.'"

 

Project Descriptions

MIMIC-Based Projects:

  1. Identifying History & Physical and Consent forms from the scanned documents: The prediction of these models have been integrated with the PIMS system and are being used by the pre-operative department to identify the forms before surgery. The performance of the models have been degraded lately due to change in the structure of scanned forms.

  2. Predicting Ambulatory No-Show: The model to predict whether a patient will show up to an appointment or not  is developed and integrated with a dashboard. Prior to COVID, the ambulatory operations were using the dashboard and taking necessary intervention for high-risk no show patients. The appointment scheduling data and type of the appointment (more TeleHealth than In-person) have been changed drastically due to COVID. So, we will have to retrain the model and put it back into production.

  3. Optimizing OR Blocks: The purpose of the model is to optimize the operating room schedules in order to address  the capacity challenges of inpatient census. This model was originally developed for BIDMC and there is a need to retrain the model with OR block data from Lahey Medical Center. The output of the model will be provided as a report to the OR Committee to change the schedules of surgeons.

     

St. Michael-Based Projects:

  1. Predicting Relapse in Multiple Sclerosis (MS): MS is a chronic neurological condition where the immune system attacks nerve fibres and myelin sheathing in the brain and spinal cord. The condition can result in some patients being mildly affected to others who may lose their ability to communicate or walk. Many patients suffer relapses - often manifesting as symptoms of fatigue, tingling, blurred vision, weakness, and unsteady gait - during disease progression. A major goal of modern treatment is to prevent relapse and disease progression. The MS Clinic at St. Michael's Hospital is among the largest in the world with excellent patient follow-up. The goal of this project is to create a prediction algorithm that predicts MS relapse episodes among MS patients. Should an adequate algorithm be developed, it will be deployed into clinical practice to help guide patient care by our team of MS neurologists. A preliminary algorithm has already been developed with promising performance. 
  2. Emergency Department Wait Time Prediction: A major predictor of inpatient satisfaction is the perceived wait time in the emergency department, yet many hospitals struggle to optimize their wait times. Knowing expected wait times for patients in the emergency department not helps with staffing, planning, and patient management. The purpose of this project is to develop 'real time' prediction algorithms that estimate overall and patient-specific wait times in the emergency department using a dataset from St. Michael's Hospital. The prediction will vary based on both patient characteristics and the ever-changing workload situation in the emergency department. Suitable algorithms will be deployed into clinical practice. 
  3. Traumatic Brain Injury Outcomes: Patients experiencing traumatic brain injury often need urgent surgical intervention - the level of care needed is often determined by medical imaging findings on head computed tomography (CT) scans as well as certain clinical characteristics. Specialized neurosurgeons and neuroradiologists are needed to 'read' head CT scans and make the determination of whether or not a patient requires urgent surgery. Many hospitals do not have such specialists and rely on larger hospitals who do have this expertise to advise them on whether a patients needs to be urgently transferred to an appropriate hospital, placing significant burden on specialist hospitals and their staff. The purpose of this project is to develop an algorithm that identifies patterns on a head CT that are suggestive of whether or not a patient requires urgent surgical intervention. The available dataset consists of several thousand annotated head CT scans. Suitable algorithms will be deployed into clinical practice.

 

Nightingale Dataset based Projects:

Use the Stanford expanded CheXpert dataset to predict fractures: https://docs.nightingalescience.org/datasets/fracture-aimi-xray Links to an external site.

  1. Predict spine fractures (cervical, thoracic, lumbar) using the x-ray - this could be the algorithm seeing areas that are going to break, since the spine is visible on the x-ray.
  2. Predict hip fractures - if there's signal, the algorithm could be picking up on some global pattern in bone that makes a person more likely to fracture.
  3. Control for age as spurious correlation in prediction of both fractures. If the predictive power goes away when we add age, the algorithm could be just picking up on (i.e., rediscovering) age, which is correlated with fracture

Identifying high-risk breast cancer: https://docs.nightingalescience.org/datasets/brca-psj-path Links to an external site.

  1. Does the time from diagnosis to breast cancer therapy predict metastasis and/or death (independent of stage and pathological variables)? Aside from stage at diagnosis, what demographic, comorbidity and pathologic variables are the strongest predictors of a patient's response to neoadjuvant therapy?
  2. For early stage cancers, what are the strongest predictors of metastasis? How do these predictors (or combinations of) compare to multigene assays currently used in clinical practice (such as Oncotype Dx, or other multigene signature methods)? [This one is somewhat dependent on having enough biopsies with data from multigene assays, I seem to remember there were very few in the dataset that actually had values for those variables. Alternatively, students could compare their own predictive models to published data on Oncotype Dx prediction, though the test scores would have been obtained from different populations.] 

Diagnosing ‘silent’ heart attack using ECG waveforms: https://docs.nightingalescience.org/datasets/silent-cchs-ecg Links to an external site.

  1. Predict regional wall motion abnormalities (scars indicating prior heart attack) using the ECG waveforms. 

 

 

Computing Credits