The following schedule runs from June 5 – 9, 2023. (Last January’s institute schedule is available on the Winter page.) Each day will involve a mix of seminar-style discussion and hands-on technical work. Morning sessions will be 10:00a to 12:30p, afternoon sessions will be 1:30p to 4:00p.

An orientation and tech setup day will be held on May 24, 2023, 2-4PM.

Monday: Introductions & Setting the Stage

How are data science methods being used in the humanities? How are humanists studying the automation of decision-making and cultural production?

Required Readings

*Must be logged into a Princeton institutional Google account to access these PDFs.

Brian Beaton et al., “Debating Data Science: A Roundtable,” Radical History Review, no. 127 (January 2017): 133–48.

Wai Chee Dimock, “AI and the Humanities,” PMLA 135, no. 3 (May 2020): 449–54.

Anne Helmreich, Matthew Lincoln, and Charles van den Heuvel, “Data Ecosystems and Futures of Art History,” Histoire de l’art, no. 87 (June 29, 2021): 45–54.

Barbara McGillivray et al., “The Challenges and Prospects of the Intersection of Humanities and Data Science: A White Paper from The Alan Turing Institute,” 2020.


10:00 – 10:30
Sierra Eckert and Ryan Heuser, Center for Digital Humanities
Discussion on the history of data science and recent trends in computational methods in the humanities.
Preview of what will be covered in each day of the Institute
[Opening Slides](/hds-institute/pdf/2023-06-5 Institute Opening Comments.pdf)
10:30 – 11:30
Participant introductions and project descriptions. What do you hope to learn at this Institute?
11:30 – 12:30
Discussion of readings.
Participants: come prepared to discuss how data science is intersecting with scholarship in your discipline.


1:30 – 2:00
Peter Ramadge, Gordon Y.S. Wu Professor of Engineering and Director of the Center for Statistics and Machine Learning.
Presentation on the data science ecosystem at Princeton.
2:00 – 2:30
Helmut Reimitz, Professor of History; Director, Program in Medieval Studies
Presentation on History Books and the History of the Book in the Middle Ages: applying computational tools to learn about the “history of histories” in medieval Europe.
2:30 – 3:00
Informal discussion.

Tuesday: Fundamentals

An introduction to the baseline statistical concepts used in machine learning, from distribution to regression. What does it mean to be “unsupervised”? How do we evaluate a model?


Meredith Broussard, “Machine Learning: The DL on ML,” in Artificial Unintelligence: How Computers Misunderstand the World (Cambridge, Massachusetts & London, England: The MIT Press, 2018).

Wendy Hui Kyong Chun, excerpt from Discriminating Data: Correlation, Neighborhoods, and the New Politics of Recognition (Cambridge, Massachusetts & London, England: The MIT Press, 2021).

Richard So, “‘All Models Are Wrong,’” PMLA 132, no. 3 (May 2017): 668–73,

Tony Chu and Stephanie Yee, “A Visual Introduction to Machine Learning,” R2D3.

Amy Winecoff, “Introduction to Machine Learning for the Humanities” (2023)


10:00 – 12:30
Sierra Eckert, Perkins Postdoctoral Fellow, Center for Digital Humanities
Humanities Data Fundamentals
Relevant files:


1:30 – 4:00
Amy Winecoff, DataX Fellow, Center for IT Policy & Center for Statistics and Machine Learning
Introduction to Machine Learning for the Humanities
Relevant files:

Wednesday: History of Measurement

On the historical epistemology of quantification. How did measurement practices in the past cross the now-accepted divide between the sciences and humanities?


10:00 – 12:30
David Kinney, Postdoctoral Research Associate in Cognitive Science of Values, University Center for Human Values
Lecture on the history of measurement across the disciplines


1:30 – 4:00
David Kinney
Hands-On: Word Embeddings
Relevant files:


Hasok Chang, “Spirit, Air, and Quicksilver: The Search for the ‘Real’ Scale of Temperature,” Historical Studies in the Physical and Biological Sciences 31, no. 2 (2001): 249–84.

Thursday: Unpacking Our Code Libraries

On the use of industry-created tools for influential humanities scholarship. How did software for speech recognition and image tagging end up in a literary text analysis tool?


Selections from Richard So, Redlining Culture: A Data History of Racial Inequality and Postwar Fiction (Columbia University Press, 2020).

Ted Underwood, “Machine Learning and Human Perspective,” PMLA 135, no. 1 (January 2020): 92–109,


10:00 – 12:30
Sierra Eckert, Perkins Postdoctoral Fellow, Center for Digital Humanities
Ryan Heuser, Research Software Engineer, Center for Digita Humanities
Useful Natural Language Processing
Relevant files:


1:30 – 4:00
Sierra Eckert, Ryan Heuser
Named Entity Recognition, Geocoding, and Mapping
Relevant files:

Friday: Participant Projects

An opportunity for participants to workshop their own project ideas. Moderated by Natalia Ermolaev and Meredith Martin.


10:00 – 11:00
Roundtable: Computational & Digital Resources on Campus
Natalia Ermolaev, moderator
Ian Cosden, Director, Research Software Engineering for Computational & Data Science [slides]
Jennifer Grayburn, Assistant Director of Digital and Open Scholarship [slides]
Ben Johnston, Senior Educational Technologist, McGraw Center for Teaching & Learning [slides]
Ariel Ackerly, Makerspace Specialist, Princeton University Library [slides]
Jonathan Halverson, Research Software and Computing Training Lead, Princeton Institute for Computational Science & Engineering
11:00 – 12:30
Participant Project Consultations
Grant Wythoff, moderator
How can participants apply what they have learned to their own discipline?
What research projects or introductory course ideas do participants want to explore?
CDH staff will help participants scope their idea to attainable goals while identifying what support or additional training they need.


1:30 – 3:00
Participant project consultations, continued.
3:00 – 4:00
Share feedback for next summer’s Institute in this exit survey.