Welcome to 6.S977!

6.S977 - Ethical Machine Learning in Human Deployments - Spring 2024
Satisfies: Concentration Subject in AI

Friday, 10:00 AM - 1:00 PM
Room 45-102, College of Computing building.

Instructor: Dr. Marzyeh Ghassemi
TA: Jessica Ding

Piazza: https://piazza.com/class/lqb8wf75djr263/
TA Office Hours: Tuesdays 12-1pm, 32-D451

Overview

This course focuses on the human-facing considerations in the pipeline of machine learning (ML) development in human-facing settings. Students will learn about the issues involved in ethical machine learning, in three progressing areas: (1) Introductory material on the inequities that arise in the machine learning pipeline when working with human processes, (2) focused dives in to technical focus areas, (3) topic area presentations in healthcare, education and employment.

In this class, students will hear lectures by leading researchers in the field, and read current papers related to ongoing efforts and challenges in ethical ML. The graded components of this course will involve weekly reflections, attendance, and homeworks that culminate with a final project. A major component of this course is the individual final project, presented on the last day of class (May 10). The course will also have several in-class multi-party negotiation exercises that highlight the difficulty of ethical machine learning in practice.

Students should have taken 2 introductory courses to machine learning; known courses that fulfill these are listed below, but others may be used if similar in content:

Graduate Students: 6.867, 6.806, 6.819
Undergraduate Students: 6.036, 6.401, 6.419, or 6.402

Books

Fairness and Machine Learning - There is a free PDF available online, and we're working with the bookstore to add copies for the course.
AI Ethics- We are working with the bookstore to add copies for the course.

Grading

1. Weekly Reflections: The weekly reflections, corresponding to Week 2 - Week 10, are a total of 10% of your grade.

Done as a Canvas discussion
Must reply to at least one other student
Due before class
Worth 1.25 point per week

2. Assignments: Problem sets are worth a total of 80% of your grade, as broken down below. For the first two homeworks, we have registered the course on psetpartners.mit.edu.

Assignment	Subject	Percentage of Grade
1	HW1: Algorithmic Fairness Assignment	15%
2	HW2: Methodological Focus Assignment	15%
3	Project Proposal with Lit Review and Outline	10%
4	Project Presentation	20%
5	Project Write-Up	20%

3. Attendance: 10% of your grade is assigned for attending the lectures for the week. Attendance will be taken at the beginning of class and the link to claim credit will go down at 10 minutes past class start.

4. Scribing for Extra Credit: If a student is in need of extra credit, they can speak to the course staff to scribe for a lecture, and submit to the course staff for review. You can scribe to take the place of up to 2.5 points (i.e., two weekly reflections), with slots going to students on a greedy assignment algorithm. Each scribing is worth 1.25 points, similar to a weekly reflection. Students can sign up in pairs to record the class lectures, and the grade for scribing will be collectively shared by those who share a scribing session. Scribe LaTeX Template

Schedule (Subject to Change)

Week	Date	Lecture	Materials	HW Due
1/INTRO	Feb 9, 2024	Course Introduction and Overview 10:10 - 10:50 am: Dr. Marzyeh Ghassemi Introduction to the Class 10:50 - 11:05 am: Q/A session --- BREAK --- 11:20 - 12:05 pm: Dr. Marzyeh Ghassemi Lecture 1 12:00 - 12:10 pm: Q/A session --- 12:20 - 12:40 pm: Class Negotiation Exercise "Mobilize Users in Catching Health Violators 12:40 - 1:00 pm: Class Negotiation Exercise "Funding Prioritization for Research"	FairML Book.Intro Ethical Machine Learning in Healthcare The Fallacy of AI Functionality Lecture Recording	HW 1 Out
2/INTRO	Feb 16, 2024	Guest Taught by Walter Gerych Problem Definition 10:10 - 10:50 am: Emma Pierson (Cornell Tech) "Using Machine Learning to Improve Equity in Healthcare and Public Health" 10:50 - 11:05 am: Q/A session --- BREAK --- 11:20 - 12:05 pm: Inioluwa Deborah Raji (UC Berkeley) "Fundamental Tensions in Ethics, Efficiency, and Fairness in AI/Alignment" 12:05 - 12:20 pm: Q/A session --- 12:20 - 1:00 pm: Walter Gerych	FairML Book.WhenToUseML For reflections: https://www.nejm.org/doi/full/10.1056/NEJMp2311050 https://arxiv.org/pdf/2312.14804.pdf https://www.nature.com/articles/s41591-020-01192-7 https://dl.acm.org/doi/abs/10.1145/3351095.3372873 https://arxiv.org/abs/2111.15366 The Fallacy of AI Functionality	Week 2 Reflection
3/INTRO	Feb 23, 2024	Data Collection 10:10 - 10:50 am: Catherine D'Ignazio (MIT) on "Data Feminism and Information Narratives" 10:50 - 11:05 am: Q/A session --- BREAK --- 11:20 - 12:05 pm: Timnit Gebru (DAIR) on "Ethical Data Collection, and DAIRing to Build It." 12:05 - 12:20 pm: Q/A session --- 12:20 - 1:00 pm: Class Negotiation Exercise "Build a Better Faceprint Data Set"	FairML Book.Datasets For reflections: Chapter 4 of Data Feminism DAIR's research philosophy https://www.dair-institute.org/research-philosophy/ AI Art and Its Impact on Artists. https://dl.acm.org/doi/abs/10.1145/3600211.3604681 "Combatting Harmful Hype in Natural Language Processing" https://pml4dc.github.io/iclr2023/pdf/PML4DC_ICLR2023_39.pdf The TESCREAL Bundle: Eugenics and the promise of Utopia through Artificial General Intelligence (manuscript) Clean-Final-Version Eugenics and the Promise of Utopia through Artificial General Intelligence (Clean Final version) (Endnotes + Bibliography(1).pdf	Week 3 Reflection
4/INTRO	Mar 1, 2024	Algorithm Development 10:10 - 11:00 am: Shalmali Joshi (Columbia) "Methodological advances for ethical and fair ML for Health" 11:00 - 11:15 am: Q/A session --- BREAK --- 11:20 - 11:50 am: Berk Ustun, "When Personalization Harms Performance" 11:50 - 12:10 pm: Q/A Session	FairML Book.Classification FairML Book.RelativeNotionsFairness For reflections: On the Need for a Language Describing Distribution Shifts: Illustrations on Tabular Datasets Diagnosing model performance under distribution shift "Why did the Model Fail?": Attributing Model Performance Changes to Distribution Shifts Subgroup Robustness Grows On Trees: An Empirical Baseline Investigation When Personalization Harms Performance. On the Epistemic Limits of Personalized Prediction. Participatory Personalization in Classification.	Week 4 Reflection HW1 Due HW 2 Out
5/INTRO	Mar 8, 2024	Post-Deployment Considerations 10:10 - 10:55 am: Orson Xu (MIT) "How Do We Get There? Toward Intelligent Behavior Intervention" 10:55 - 11:15 am: Q/A session --- BREAK --- 11:20 - 12:05 pm: Sarah Jabbour (UMichigan) "AI for Clinical Diagnostic Decision Making: Can Explainability be a Back-Stop Against Biased AI?" 12:05 - 12:20 pm: Q/A session --- 12:20 - 1:00 pm: Class Negotiation Exercise "AI National Security"	FairML Book.TestingDiscrimination For reflections: https://jamanetwork.com/journals/jama/article-abstract/2812908 https://www.nature.com/articles/s43856-022-00214-4 https://dl.acm.org/doi/pdf/10.1145/3579460	Week 5 Reflection
6/FOCUS	Mar 15, 2024	Human-Process Data As A Biased Medium 10:10 - 10:55 am: Danielle Coleman (UMichigan) "Navigating the Law & Machine Learning: The Intersection of Data Bias, Legal Principles, & Regulation" 10:55 - 11:15 am: Q/A session --- BREAK --- 11:20 - 11:40 pm: Nikhil Garg (Cornell Tech) "Faulty Allocation Based on Usage Creates Inequity" 11:40 - 12:20 pm: Q/A session --- 12:20 - 1:00 pm: Class Discussion on Allocative and Representational Harms	For reflections: Machine Bias Risk Assessments in Criminal Sentencing Equal Protection Under Algorithms: A New Statistical and Legal Framework Fairness in machine learning: Regulation or standards? https://arxiv.org/abs/2312.11754 https://arxiv.org/abs/2204.08620 To Predict and Serve? \| Significance	Week 6 Reflection HW 3 Out
7/FOCUS	Mar 22, 2024	Trade-offs in Machine Learning Approaches 10:10 - 11:10 am: Hamed Nilforoshan (Stanford) "The Measure and Mismeasure of Equity " 11:10 - 11:20 pm: Q/A session --- BREAK --- 11:20 - 11:40 pm: Vinith Suriyakumar (MIT) "Tradeoffs in Trustworthy Machine Learning" 11:40 - 12:20 pm: Q/A session --- 12:20 - 1:00 pm: In-class assistance	For reflections: Human Mobility Networks Reveal Increased Segregation in Large Cities The Measure and Mismeasure of Fairness Causal Conceptions of Fairness and their Consequences Chasing Your Long Tails: Differentially Private Prediction in Health Care Settings Can You Fake It Until You Make It?: Impacts of Differentially Private Synthetic Data on Downstream Classification Fairness When Personalization Harms: Reconsidering the Use of Group Attributes in Prediction	Week 7 Reflection HW2 Due
8	Mar 29, 2024	No Class - Spring Break	Project Work	HW3 Due HW4/5 Out
9/FOCUS	Apr 5, 2024	Model Robustness and Optimization 10:00 - 10:50 am: Haoran Zhang (MIT) "Group Fairness in Machine Learning for Healthcare: Helpful or Harmful?" 10:50 - 11:10 am: Q/A session --- BREAK --- 11:15 - 12:00 pm: Jiacheng Zhu "Algorithmic Fairness on Heterogeneous Data Distributions" 12:00 - 12:20 pm: Q/A session --- 12:20 - 1:00 pm: Class Discussion	For reflections: Fair Mixup: Fairness via Interpolation Training individually fair ML models with Sensitive Subspace Robustness Conditional Learning of Fair Representations Why Is My Classifier Discriminatory? Net benefit, calibration, threshold selection, and training objectives for algorithmic fairness in healthcare Leveling Down in Computer Vision: Pareto Inefficiencies in Fair Deep Classifiers	Week 9 Reflection
10/FOCUS	Apr 12, 2024	Machine Values Implied By Choices 10:00 - 10:50 am: Saadia Gabriel (NYU/UCLA) "Factuality and Generalization Crisis of LLMs." 10:50 - 11:10 am: Q/A session --- BREAK --- 11:15 - 12:00 pm: Kadija Ferryman (JHU) "The Social Life of Health Data" 12:00 - 12:20 pm: Q/A session --- 12:20 - 1:00 pm: Open Discussion	For reflections: https://www.nejm.org/doi/full/10.1056/NEJMra2214964 https://aclanthology.org/P19-1163 https://arxiv.org/abs/2402.10965 https://mit-genai.pubpub.org/pub/cnks7gwl/release/1	Week 10 Reflection
11/CASE	Apr 19, 2024	Case Study: Education 10:00 - 10:50 am: Rene Kizilcec (Cornell) "Inequity in Machine Learning Applied to Education." 10:50 - 11:10 am: Q/A session --- BREAK --- 11:15 - 1:00 pm: Open Discussion	For reflections: https://arxiv.org/abs/2209.03929 https://arxiv.org/abs/2309.04470 https://arxiv.org/abs/2304.06205	Week 11 Reflection
12/CASE	Apr 26, 2024	Case Study: Healthcare 10:00 - 10:50 am: Irene Chen (MIT) "How to Build Models in Health that Improve Care" 10:50 - 11:10 am: Q/A session --- BREAK --- 11:15 - 12:00 pm: Niloufar Salehi (UC Berkeley) "How to Use LLMs To Improve Medical Translation" 12:00 - 12:15 pm: Q/A session --- 12:20 - 1:00 pm: Class Negotiation Exercise on "Regulatory Systems with Embodied Data"	Physician Detection of Clinical Harm in Machine Translation: Quality Estimation Aids in Reliance and Backtranslation Identifies Critical Errors, EMNLP 2023 Reliable and Safe Use of Machine Translation in Medical Settings, FAccT 2022	Week 12 Reflection
13/CASE	May 3, 2024	Case Study: Employment 10:00 - 10:50 am: David Autor (MIT) "How Employment and AI Are Intertwined" 10:50 - 11:10 am: Q/A session --- BREAK --- 11:15 - 1:00 pm: Project Check-ins with Staff	For reflections: "Why are there still so many jobs?” Journal of Economic Perspectives, 2015 "The labor market impacts of technological change: From unbridled enthusiasm to qualified optimism to vast uncertainty.” Brookings, 2022 "AI Could Actually Help Rebuild The Middle Class” NOĒMA, 2024
14	May 10, 2024	Final Project Presentations		HW4 Due HW5 Due

Miscellaneous

Note on Generative AI: You may use AI programs e.g. ChatGPT to help generate ideas and brainstorm. However, material generated by these programs may be inaccurate and biased. You may not submit any work generated by an AI program as your own. Note that OpenAI’s terms of use explicitly state that users may not “represent that output from the Services was human-generated when it is not.” If you include material generated by an AI program, it should be cited. For an example, see guidelines on properly citing ChatGPT.

Any plagiarism or other form of cheating will be dealt with severely under relevant MIT policies.

Note on Plagiarism: Student code submissions may be submitted by the instructors to a plagiarism detection tool for a review of similarity and detection of possible plagiarism. Submissions will be used solely for the purpose of detecting similarity, and are not retained indefinitely on the server; typically results are deleted after 14 days but may be removed sooner. For more information on the tool used, refer to https://theory.stanford.edu/~aiken/moss/.

Note on Collaboration: Research is a collaborative activity and we encourage all students to collaborate and learn from each other. In general, when you put your name on something for research, you must: a) have materially contributed to the work, b) be able to defend the research, and c) acknowledge the contribution of others. Keep this in mind when working together and submitting material for evaluation.

Note on Authorship: By the end of the course your final project may be sufficiently developed to submit to a peer-reviewed journal, and you may choose to include others after the course is completed. The author order can be a somewhat controversial issue and is left to the project participants to decide. In the case of a dispute during or after the course, the instructors will likely not be able to mediate in any meaningful way. We would also recommend equal authorship (now more common), but the decision is left to each team.

Note on Acknowledgement: Papers that result from work done during this course should recognize the contributions of the course in an acknowledgement or in other sections. The suggested language is: "This manuscript was composed by participants in the EECS 6.S977 course on Ethical Machine Learning in Human Deployments at the Massachusetts Institute of Technology, Spring 2024.'"

Course Projects

The goal is to have a submission-ready manuscript by the end of the semester, formatted according to the journal the team is targeting to submit to. The project should tackle an ethical issue that could, or has, occurred in the process of using machine learning in a human setting. Projects can be more technical, involving the reproduction or development of models and evaluation (using an ICML/NeurIPS template), or more socio-technical or policy focused (FacCT/EAAMO template), or examine the complex choices, interactions and implications of machine learning use (SERC template) in a specific application area space (ML4H/CHIL template).

The project is done by an individual, and there will be one project report/presentation per person. Students should choose a major undertaking because various components of the project (Project Proposal, Final Project Presentation, Final Project Write-Up) account for a total of 50% of your grade.

Data Sources

Kaggle is a platform for many kinds of data, and competitions from this platform can be modified for relevant investigations.
NeurIPS Datasets and Benchmarks list from 2021, 2022, or 2023.
MIMIC is an open platform for health data. To obtain access to MIMIC, students must obtain CITI certification, and request access on Physionet.
Nightingale is an open platform for health data. Register for the platform as soon as possible, and have a look at existing questions by reading through the documents here. CITI training is required to use the platform. These projects focus mostly on fairness audits of machine learning systems trained on real health data.

Specific Active Projects

Nightingale Project 1: Identifying fairness violations in breast cancer risk using digital pathology images
Every year, 40 million women get a mammogram; some go on to have an invasive biopsy to better examine a concerning area. Since the 1990s, we have found far more ‘cancers’, which has in turn prompted vastly more surgical procedures and chemotherapy. But death rates from metastatic breast cancer have hardly changed. When a pathologist looks at a biopsy slide, they are looking for known signs of cancer: tubules, cells with atypical looking nuclei, evidence of rapid cell division. Students can predictive models trained on pathology images to predict critical patient outcomes such as mortality and metastasis. They will identify patients at high risk of poor outcomes and compare rates of prediction in different gender/ethnicity groups. https://docs.nightingalescience.org/brca-psj-path.html

Nightingale Project 2: Subtyping cardiac arrest with ECG
A patient is rushed into the ER, unconscious and in cardiac arrest. What happened to cause the arrest? What immediate actions need to be taken? One of the only pieces of data available to the emergency physician in this situation is the electrocardiogram (ECG), which measures the electrical activity of the heart. This rich signal might also contain other clues: about why the heart stopped, what physicians can do in the ER to give the patient the best possible chance of surviving, and the likelihood that a patient who survives will have a normal life, without profound physical or neurological impairments. Students will evaluate the fairness of algorithms using ECGs to predict clinical tasks such as cardiac arrest cause and patient survival post-hospital discharge from hospital. https://docs.nightingalescience.org/arrest-ntuh-ecg.html