6.S977 Ethical Machine Learning in Human Deployments
Welcome to 6.S977!
6.S977 - Ethical Machine Learning in Human Deployments - Spring 2024
Satisfies: Concentration Subject in AI
Friday, 10:00 AM - 1:00 PM
Room 45-102, College of Computing building.
Instructor: Dr. Marzyeh Ghassemi
TA: Jessica Ding
Piazza: https://piazza.com/class/lqb8wf75djr263/
Links to an external site.
TA Office Hours: Tuesdays 12-1pm, 32-D451
Overview
This course focuses on the human-facing considerations in the pipeline of machine learning (ML) development in human-facing settings. Students will learn about the issues involved in ethical machine learning, in three progressing areas: (1) Introductory material on the inequities that arise in the machine learning pipeline when working with human processes, (2) focused dives in to technical focus areas, (3) topic area presentations in healthcare, education and employment.
In this class, students will hear lectures by leading researchers in the field, and read current papers related to ongoing efforts and challenges in ethical ML. The graded components of this course will involve weekly reflections, attendance, and homeworks that culminate with a final project. A major component of this course is the individual final project, presented on the last day of class (May 10). The course will also have several in-class multi-party negotiation exercises that highlight the difficulty of ethical machine learning in practice.
Students should have taken 2 introductory courses to machine learning; known courses that fulfill these are listed below, but others may be used if similar in content:
- Graduate Students: 6.867, 6.806, 6.819
- Undergraduate Students: 6.036, 6.401, 6.419, or 6.402
Books
Fairness and Machine Learning
Links to an external site. - There is a free PDF available online, and we're working with the bookstore to add copies for the course.
AI Ethics- We are working with the bookstore to add copies for the course.
Grading
1. Weekly Reflections: The weekly reflections, corresponding to Week 2 - Week 10, are a total of 10% of your grade.
- Done as a Canvas discussion
- Must reply to at least one other student
- Due before class
- Worth 1.25 point per week
2. Assignments: Problem sets are worth a total of 80% of your grade, as broken down below. For the first two homeworks, we have registered the course on psetpartners.mit.edu.
Assignment | Subject | Percentage of Grade |
1 | HW1: Algorithmic Fairness Assignment | 15% |
2 | HW2: Methodological Focus Assignment | 15% |
3 | Project Proposal with Lit Review and Outline |
10% |
4 | Project Presentation |
20% |
5 | Project Write-Up | 20% |
3. Attendance: 10% of your grade is assigned for attending the lectures for the week. Attendance will be taken at the beginning of class and the link to claim credit will go down at 10 minutes past class start.
4. Scribing for Extra Credit: If a student is in need of extra credit, they can speak to the course staff to scribe for a lecture, and submit to the course staff for review. You can scribe to take the place of up to 2.5 points (i.e., two weekly reflections), with slots going to students on a greedy assignment algorithm. Each scribing is worth 1.25 points, similar to a weekly reflection. Students can sign up in pairs to record the class lectures, and the grade for scribing will be collectively shared by those who share a scribing session. Scribe LaTeX Template
Schedule (Subject to Change)
Week | Date | Lecture | Materials | HW Due |
1/INTRO | Feb 9, 2024 |
Course Introduction and Overview
|
|
HW 1 Out |
2/INTRO | Feb 16, 2024 |
Guest Taught by Walter Gerych
|
For reflections:
|
|
3/INTRO | Feb 23, 2024 |
Data Collection
|
For reflections:
|
|
4/INTRO | Mar 1, 2024 |
Algorithm Development
|
For reflections: |
|
5/INTRO | Mar 8, 2024 |
Post-Deployment Considerations
|
For reflections: |
|
6/FOCUS | Mar 15, 2024 |
Human-Process Data As A Biased Medium
|
For reflections:
|
|
7/FOCUS | Mar 22, 2024 |
Trade-offs in Machine Learning Approaches
|
For reflections:
|
|
8 | Mar 29, 2024 | No Class - Spring Break | Project Work |
HW4/5 Out |
9/FOCUS | Apr 5, 2024 |
Model Robustness and Optimization
|
For reflections:
|
|
10/FOCUS | Apr 12, 2024 |
Machine Values Implied By Choices
|
For reflections: |
|
11/CASE | Apr 19, 2024 |
Case Study: Education
|
For reflections: |
Week 11 Reflection |
12/CASE | Apr 26, 2024 |
Case Study: Healthcare
|
Week 12 Reflection | |
13/CASE | May 3, 2024 |
Case Study: Employment
|
For reflections:
|
|
14 | May 10, 2024 |
Final Project Presentations |
|
Miscellaneous
Note on Generative AI: You may use AI programs e.g. ChatGPT to help generate ideas and brainstorm. However, material generated by these programs may be inaccurate and biased. You may not submit any work generated by an AI program as your own. Note that OpenAI’s terms of use explicitly state that users may not “represent that output from the Services was human-generated when it is not.” Links to an external site.If you include material generated by an AI program, it should be cited. For an example, see guidelines on properly citing ChatGPT. Links to an external site.
Any plagiarism or other form of cheating will be dealt with severely under relevant MIT policies.
Note on Plagiarism: Student code submissions may be submitted by the instructors to a plagiarism detection tool for a review of similarity and detection of possible plagiarism. Submissions will be used solely for the purpose of detecting similarity, and are not retained indefinitely on the server; typically results are deleted after 14 days but may be removed sooner. For more information on the tool used, refer to https://theory.stanford.edu/~aiken/moss/ Links to an external site..
Note on Collaboration: Research is a collaborative activity and we encourage all students to collaborate and learn from each other. In general, when you put your name on something for research, you must: a) have materially contributed to the work, b) be able to defend the research, and c) acknowledge the contribution of others. Keep this in mind when working together and submitting material for evaluation.
Note on Authorship: By the end of the course your final project may be sufficiently developed to submit to a peer-reviewed journal, and you may choose to include others after the course is completed. The author order can be a somewhat controversial issue and is left to the project participants to decide. In the case of a dispute during or after the course, the instructors will likely not be able to mediate in any meaningful way. We would also recommend equal authorship (now more common), but the decision is left to each team.
Note on Acknowledgement: Papers that result from work done during this course should recognize the contributions of the course in an acknowledgement or in other sections. The suggested language is: "This manuscript was composed by participants in the EECS 6.S977 course on Ethical Machine Learning in Human Deployments at the Massachusetts Institute of Technology, Spring 2024.'"
Course Projects
The goal is to have a submission-ready manuscript by the end of the semester, formatted according to the journal the team is targeting to submit to. The project should tackle an ethical issue that could, or has, occurred in the process of using machine learning in a human setting. Projects can be more technical, involving the reproduction or development of models and evaluation (using an ICML/NeurIPS template), or more socio-technical or policy focused (FacCT/EAAMO template), or examine the complex choices, interactions and implications of machine learning use (SERC template) in a specific application area space (ML4H/CHIL template).
The project is done by an individual, and there will be one project report/presentation per person. Students should choose a major undertaking because various components of the project (Project Proposal, Final Project Presentation, Final Project Write-Up) account for a total of 50% of your grade.
Data Sources
- Kaggle is a platform for many kinds of data, and competitions from this platform can be modified for relevant investigations.
- NeurIPS Datasets and Benchmarks list from 2021 Links to an external site., 2022, or 2023.
- MIMIC is an open platform for health data. To obtain access to MIMIC, students must obtain CITI certification, and request access on Physionet.
- Nightingale is an open platform for health data. Register Links to an external site. for the platform as soon as possible, and have a look at existing questions by reading through the documents here Links to an external site.. CITI training is required to use the platform. These projects focus mostly on fairness audits of machine learning systems trained on real health data.
Specific Active Projects
Nightingale Project 1: Identifying fairness violations in breast cancer risk using digital pathology images
Every year, 40 million women get a mammogram; some go on to have an invasive biopsy to better examine a concerning area. Since the 1990s, we have found far more ‘cancers’, which has in turn prompted vastly more surgical procedures and chemotherapy. But death rates from metastatic breast cancer have hardly changed. When a pathologist looks at a biopsy slide, they are looking for known signs of cancer: tubules, cells with atypical looking nuclei, evidence of rapid cell division. Students can predictive models trained on pathology images to predict critical patient outcomes such as mortality and metastasis. They will identify patients at high risk of poor outcomes and compare rates of prediction in different gender/ethnicity groups. https://docs.nightingalescience.org/brca-psj-path.html
Links to an external site.
Nightingale Project 2: Subtyping cardiac arrest with ECG
A patient is rushed into the ER, unconscious and in cardiac arrest. What happened to cause the arrest? What immediate actions need to be taken? One of the only pieces of data available to the emergency physician in this situation is the electrocardiogram (ECG), which measures the electrical activity of the heart. This rich signal might also contain other clues: about why the heart stopped, what physicians can do in the ER to give the patient the best possible chance of surviving, and the likelihood that a patient who survives will have a normal life, without profound physical or neurological impairments. Students will evaluate the fairness of algorithms using ECGs to predict clinical tasks such as cardiac arrest cause and patient survival post-hospital discharge from hospital. https://docs.nightingalescience.org/arrest-ntuh-ecg.html
Links to an external site.