New Course: Prediction for (Individualized) Decision-making

Statistical Modeling, Causal Inference, and Social Science 2024-12-06

This is Jessica. This winter I’m teaching a new graduate seminar on prediction for decision-making intended primarily for Computer Science Ph.D. students. The goal of the new course is to consider various perspectives on what it means to predict for the purpose of decision-making. We’ll look at this question in the context of predictive modeling for automated decisions or to inform expert decisions and causal estimation to inform policy. I’m trying to include a mix of theoretical and applied papers, with an emphasis on philosophical and ethical challenges to evaluating decision-making and applying formal methods in practice, especially in contexts where human experts currently make decisions and/or the decisions involve people. Technically the course title is Prediction for Decision-making. But one of the motivations is that we have yet to adequately address the gap between conventional machine learning, where we optimize loss over aggregates, and the needs of human decision-makers in practice, where we often care about doing right by individual cases. Hence the reference to “individualized.”

Suggestions welcome if this is your cup of tea and you think I missed something important. A few of the listed papers are already coming from pointers I’ve gotten from readers here. I’m especially interested in papers that help illustrate the gaps in current methods when it comes to good individual decisions.

Course Schedule

Week 1 – Introduction and background on statistical decision rules

Background: Statistical decision theory, randomized controlled trials

Berger, J. O. (2013). Statistical decision theory and Bayesian analysis. Springer Science & Business Media. Chapter 1.
Hernan, Miguel A., & Robins, James, M. (2023). Causal inference: what if. CRC PRESS. Chapters 1, 2

Examples

Tarabichi, Y., Cheng, A., Bar-Shain, D., McCrate, B. M., Reese, L. H., Emerman, C., … & Hecker, M. T. (2022). Improving timeliness of antibiotic administration using a provider and pharmacist facing sepsis early warning system in the emergency department setting: a randomized controlled quality improvement initiative. Critical care medicine, 50(3), 418-427.
Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447-453.
Widner, K., Virmani, S., Krause, J., Nayar, J., Tiwari, R., Pedersen, E. R., … & Webster, D. R. (2023). Lessons learned from translating AI from development to deployment in healthcare. Nature Medicine, 29(6), 1304-1306.
Kawakami, A., Sivaraman, V., Cheng, H. F., Stapleton, L., Cheng, Y., Qing, D., … & Holstein, K. (2022). Improving human-AI partnerships in child welfare: understanding worker practices, challenges, and desires for algorithmic decision support. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (pp. 1-18).
Julia Dressel and Hany Farid. 2018. The accuracy, fairness, and limits of predicting recidivism. Sci. Adv. 4, 1 (2018). Publisher: American Association for the Advancement of Science

Week 2 – Prediction versus decision-making

Fernández-Loría, C., & Provost, F. (2022). Causal decision making and causal effect estimation are not the same… and why it matters. INFORMS Journal on Data Science, 1(1), 4-16.
Mitzenmacher, M., & Vassilvitskii, S. (2022). Algorithms with predictions. Communications of the ACM, 65(7), 33-35.
Liu, L., Barocas, S., Kleinberg, J., and Levy, K. (2024). On the actionability of outcome prediction. Proceedings of the AAAI Conference on Artificial Intelligence 38 (20).

Optional

Perdomo, J. C. (2024). The Relative Value of Prediction in Algorithmic Decision Making.
Elmachtoub, A. N., & Grigas, P. (2022). Smart “predict, then optimize”. Management Science, 68(1), 9-26.

Week 3 – Human versus statistical judgment

Meehl, P. Clinical versus statistical prediction: A theoretical analysis and a review of the evidence.
Felin, T., & Holweg, M. (2024). Theory Is All You Need: AI, Human Cognition, and Causal Reasoning. Strategy Science.
Kawakami, A., Sivaraman, V., Stapleton, L., Cheng, H. F., Perer, A., Wu, Z. S., … & Holstein, K. (2022, June). “Why Do I Care What’s Similar?” Probing Challenges in AI-Assisted Child Welfare Decision-Making through Worker-AI Interface Design Concepts. In Proceedings of the 2022 ACM Designing Interactive Systems Conference (pp. 454-470).

Optional

Spengler, P. M. (2013). Clinical versus mechanical prediction. Handbook of psychology: Assessment psychology, 26-49.
Grove, W. M., Zald, D. H., Lebow, B. S., Snitz, B. E., & Nelson, C. (2000). Clinical versus mechanical prediction: a meta-analysis. Psychological assessment, 12(1), 19.
Ægisdóttir, S., White, M. J., Spengler, P. M., Maugherman, A. S., Anderson, L. A., Cook, R. S., … & Rush, J. D. (2006). The meta-analysis of clinical judgment project: Fifty-six years of accumulated research on clinical versus statistical prediction. The counseling psychologist, 34(3), 341-382.
Colunga-Lozano, L. E., Foroutan, F., Rayner, D., De Luca, C., Hernández-Wolters, B., Couban, R., … & Guyatt, G. (2024). Clinical judgment shows similar and sometimes superior discrimination compared to prognostic clinical prediction models: a systematic review. Journal of Clinical Epidemiology, 165, 111200.
Razzaki, S., Baker, A., Perov, Y., Middleton, K., Baxter, J., Mullarkey, D., … & Johri, S. (2018). A comparative study of artificial intelligence and human doctors for the purpose of triage and diagnosis.
Boone, C. (2024). Discretion in Clinical Decision Making: Evidence from Bunching.

Week 4 – Evaluating (individual) predictions and decisions

Dawid, P. (2017). On Individual Risk.
Selbst, A. (2019). Negligence and AI’s Human Users.
Wang, A., Kapoor, S., Barocas, S., & Narayanan, A. (2024). Against predictive optimization: On the legitimacy of decision-making algorithms that optimize predictive accuracy. ACM Journal on Responsible Computing, 1(1), 1-45.

Optional

van Royen, F. S., Moons, K. G., Geersing, G. J., & van Smeden, M. (2022). Developing, validating, updating and judging the impact of prognostic models for respiratory diseases. European Respiratory Journal, 60(3).
Ben-Michael, E., Greiner, D. J., Huang, M., Imai, K., Jiang, Z., & Shin, S. (2024). Does AI help humans make better decisions? A methodological framework for experimental evaluation. arXiv preprint arXiv:2403.12108.
Coston, A., Kawakami, A., Zhu, H., Holstein, K., & Heidari, H. (2023). A validity perspective on evaluating the justified use of data-driven decision-making algorithms. In 2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML) (pp. 690-704). IEEE.
Karusala, N., Upadhyay, S., Veeraraghavan, R., & Gajos, K. Z. (2024). Understanding Contestability on the Margins: Implications for the Design of Algorithmic Decision-making in Public Services. In Proceedings of the CHI Conference on Human Factors in Computing Systems (pp. 1-16).

Week 5 – Data shifts and causality

Adarsh Subbaswamy and Suchi Saria. 2020. From development to deployment: Dataset shift, causality, and shiftstable models in health AI. Biostatistics 21, 2 (Apr. 2020), 345–352.
Peters, J., Bühlmann, P., & Meinshausen, N. (2016). Causal inference by using invariant prediction: identification and confidence intervals. Journal of the Royal Statistical Society Series B: Statistical Methodology, 78(5), 947-1012.

Optional

Wald, Y., Feder, A., Greenfeld, D., & Shalit, U. (2021). On calibration and out-of-domain generalization. Advances in neural information processing systems, 34, 2215-2227.
Guo, L. L., Pfohl, S. R., Fries, J., Johnson, A. E., Posada, J., Aftandilian, C., … & Sung, L. (2022). Evaluation of domain generalization and adaptation on improving model robustness to temporal dataset shift in clinical medicine. Scientific reports, 12(1), 2726.
C. Mendler-Dünner, F. Ding, and Y. Wang. Anticipating performativity by predicting from predictions. Advances in Neural Information Processing Systems, 35:31171–31185, 2022.
Luke Guerdan, Amanda Coston, Kenneth Holstein, and Zhiwei Steven Wu. 2023. Counterfactual prediction under outcome measurement error. In Proceedings of the ACM Conference on Fairness, Accountability, and Transparency (FAccT’23). ACM, New York, NY, 1584–1598. https://doi.org/10.1145/3593013.3594101
Van Parys, B. P., Esfahani, P. M., & Kuhn, D. (2021). From data to decisions: Distributionally robust optimization is optimal. Management Science, 67(6), 3387-3402.

Week 6 – Personalization and fairness

Shalit, U. (2020). Can we learn individual-level treatment policies from clinical data? Biostatistics, 21(2), 359-362.
Curth, A., Peck, R. W., McKinney, E., Weatherall, J., & van Der Schaar, M. (2024). Using machine learning to individualize treatment effect estimation: Challenges and opportunities. Clinical Pharmacology & Therapeutics, 115(4), 710-719.
Pleiss, G., Raghavan, M., Wu, F., Kleinberg, J., & Weinberger, K. Q. (2017). On fairness and calibration. Advances in neural information processing systems, 30.
Marx, C., Calmon, F., & Ustun, B. (2020). Predictive multiplicity in classification. International Conference on Machine Learning. PMLR.

Optional

Hedges, L. (2024). Chapter 6: Planning Experimental Designs. Unpublished manuscript.
Suriyakumar, Vinith Menon, Marzyeh Ghassemi, and Berk Ustun. When personalization harms performance: reconsidering the use of group attributes in prediction. International Conference on Machine Learning. PMLR, 2023.

Week 7 – Calibration for decision-making

Gneiting, T., Balabdaoui, F., & Raftery, A. E. (2007). Probabilistic forecasts, calibration and sharpness. Journal of the Royal Statistical Society Series B: Statistical Methodology, 69(2), 243-268.
Hébert-Johnson, U., Kim, M., Reingold, O., & Rothblum, G. (2018, July). Multicalibration: Calibration for the (computationally-identifiable) masses. In International Conference on Machine Learning (pp. 1939-1948). PMLR.
Gopalan, P., Kalai, A. T., Reingold, O., Sharan, V., & Wieder, U. (2021). Omnipredictors. arXiv preprint arXiv:2109.05389.

Optional

Dawid, P. The well-calibrated Bayesian (1982). Journal of the American Statistical Association.
Dwork, C., Kim, M. P., Reingold, O., Rothblum, G. N., & Yona, G. (2021). Outcome indistinguishability. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing (pp. 1095-1108)
Gopalan, P., Hu, L., Kim, M. P., Reingold, O., & Wieder, U. (2022). Loss minimization through the lens of outcome indistinguishability. arXiv preprint arXiv:2210.08649.
Van Calster, B., Nieboer, D., Vergouwe, Y., De Cock, B., Pencina, M. J., & Steyerberg, E. W. (2016). A calibration hierarchy for risk models was defined: from utopia to empirical data. Journal of clinical epidemiology, 74, 167-176.

Week 8 – Communicating prediction uncertainty

Angelopoulos, A. N., Bates, S., Fisch, A., Lei, L., & Schuster, T. (2022). Conformal risk control.
- If not familiar with conformal prediction, you may wish to consult this tutorial: Angelopoulos, A. N., & Bates, S. (2021). A gentle introduction to conformal prediction and distribution-free uncertainty quantification.

Cortes-Gomez, S., Patiño, C., Byun, Y., Wu, S., Horvitz, E., & Wilder, B. (2024). Decision-Focused Uncertainty Quantification. arXiv preprint arXiv:2410.01767.
Corvelo Benz, N., & Rodriguez, M. (2024). Human-aligned calibration for ai-assisted decision making. Advances in Neural Information Processing Systems, 36.

Optional

Zhang, D., Chatzimparmpas, A., Kamali, N., & Hullman, J. (2024). Evaluating the utility of conformal prediction sets for ai-advised image labeling. In Proceedings of the CHI Conference on Human Factors in Computing Systems (pp. 1-19).

Week 9 – Designing human-AI workflows

Guo, Z., Wu, Y., Hartline, J. D., & Hullman, J. (2024). A Decision Theoretic Framework for Measuring AI Reliance. In The 2024 ACM Conference on Fairness, Accountability, and Transparency (pp. 221-236).
Alur, R., Laine, L., Li, D. K., Shung, D., Raghavan, M., & Shah, D. (2024). Integrating Expert Judgment and Algorithmic Decision Making: An Indistinguishability Framework. arXiv preprint arXiv:2410.08783.
Collina, N., Goel, S., Gupta, V., & Roth, A. (2024). Tractable Agreement Protocols. arXiv preprint arXiv:2411.19791.

Optional

Punzi, C., Pellungrini, R., Setzu, M., Giannotti, F., & Pedreschi, D. (2024). AI, Meet Human: Learning paradigms for hybrid decision making systems. arXiv preprint arXiv:2402.06287.
Mozannar, H., Lang, H., Wei, D., Sattigeri, P., Das, S., & Sontag, D. (2023). Who should predict? exact algorithms for learning to defer to humans. In International conference on artificial intelligence and statistics (pp. 10520-10545). PMLR.
Hilgard, S., Rosenfeld, N., Banaji, M. R., Cao, J., & Parkes, D. (2021, July). Learning representations by humans, for humans. In International conference on machine learning (pp. 4227-4238). PMLR
Karimi, A. H., Muandet, K., Kornblith, S., Schölkopf, B., & Kim, B. (2022). On the relationship between explanation and prediction: A causal view. arXiv preprint arXiv:2212.06925.
Fok, R., & Weld, D. S. (2024). In search of verifiability: Explanations rarely enable complementary performance in AI‐advised decision making. AI Magazine, 45(3), 317-332.
Buçinca, Z., Swaroop, S., Paluch, A. E., Doshi-Velez, F., & Gajos, K. Z. (2024). Contrastive Explanations That Anticipate Human Misconceptions Can Improve Human Decision-Making Skills. arXiv preprint arXiv:2410.04253.