We’ll maintain here a running list of resources at the intersection of . . .

text here
Image created by Nick Budak. Inspired by the first issue of Computers & the Humanities, a journal that ran from 1966–2004.

. . . particularly those that focus on 1) the uses of machine learning in humanities research, and 2) critical, historical, and ethical frameworks for analyzing the consequences of machine learning techniques being deployed in the world today.


Readings 📖

A bibliography we recommend is the Critical Dataset Studies Reading List, run by the Knowing Machines project.

See also…

Benedetta Brevini, Is AI Good for the Planet?, (Polity, 2022)

The Data Responsibly Series, comic books published by NYU’s Center for Responsible AI.

Lise Jaillant, ed., Archives, Access and Artificial Intelligence: Working with Born-Digital and Digitized Archival Collections (Bielefeld University Press, 2022).

Mimi Onuoha and Mother Cyborg, A People’s Guide to AI (Allied Media Projects, 2018)

Amandalynne Paullada et al., “Data and Its (Dis)Contents: A Survey of Dataset Development and Use in Machine Learning Research,” Patterns, no. 2 (November 12, 2021): 1–14,

Inioluwa Deborah Raji et al., “AI and the Everything in the Whole Wide World Benchmark,” 35th Conference on Neural Information Processing Systems (NeurIPS 2021) ,

Tasioulas, John. “The Role of the Arts and Humanities in Thinking about Artificial Intelligence.” Blog of the APA (blog), August 11, 2022.

Matthew K. Gold, ed.,Debates in the Digital Humanities (U of Minnesota Press, 2012)

Matthew K. Gold and Lauren Klein, eds.,Debates in the Digital Humanities (U of Minnesota Press, 2016)

Matthew K. Gold and Lauren Klein, eds.,Debates in the Digital Humanities (U of Minnesota Press, 2019)

Sandeep Soni, Lauren F. Klein, and Jacob Eisenstein, “Abolitionist Networks: Modeling Language Change in Nineteenth-Century Activist Newspapers,” Journal of Cultural Analytics 6, no. 1 (January 18, 2021).

Data Sets & Data Work 💾

The Center for Digital Humanities publishes datasets through research partnerships and curation grants. They can be viewed here. The CDH also curates a list of Humanities Datasets in Context, organized by discipline and annotated with information about data format and collection practices.

Tony Chu and Stephanie Yee, “A Visual Introduction to Machine Learning,” R2D3.

Amy A. Winecoff and Elizabeth Anne Watkins, “Artificial Concepts of Artificial Intelligence: Institutional Compliance and Resistance in AI Startups,” in Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society, 2022, 788–99.

Evan Odell’s Hansard Speeches 1979-2020. Version 3.0.1

Campus Resources

General Portals for Humanities Data

Tools ⚒️

The following tools and platforms were mentioned throughout the institute:

Technical Instruction ⚙️

Python Notebooks & Textbooks

Quinn Dombrowski curates a list of Jupyter Notebooks for digital humanities. These notebooks, in the spirit of “literate programming,” contain data files and executable code interwoven with textual descriptions of what’s happening. Dombrowski also curates a list of resources specifically for multilingual NLP.

Peace Ossom-Williamson and Kenton Rambsy, The Data Notebook (Mavs Open Press, 2021), a platform-agnostic “online suite of open interactive resources” on the principles of data storytelling.

Nick Montfort, Exploratory Programming for the Arts and Humanities (Cambridge, MA: MIT Press, 2016).

Stéfan Sinclair and Geoffrey Rockwell, The Art of Literary Text Analysis (2018), a series of linked Jupyter notebooks.

Folgert Karsdorp, Mike Kestemont, and Allen Riddell, Humanities Data Analysis: Case Studies With Python (Princeton University Press 2021), a print book and an online Jupyter Book. Includes an introduction to probability.

Melanie Walsh’s Introduction to Cultural Analytics and Python [book-length course]

W.J.B. Mattingly’s Introduction to Python for Text Analysis [book-length course]


Most of these workshops are now conducted online via Zoom and require an application and registration fee. The CDH offers Graduate Training Grants to offset the cost of attending these workshops.

Online Tutorials

Be sure to check out Foundations and Applications of Humanities Analytics, cotaught at the Santa Fe Institute by David Kinney, one of the instructors at our Institute. This Foundations course “aims to empower scholars in the humanities by eliminating the black box of computational text analysis.” All lectures are available as a YouTube playlist.

CUNY’s Digital Humanities Resource Initiative (DHRI) includes guided tutorials on the following topics:

Programming Historian features lots of excellent peer reviewed workshops in English, Spanish, French, and Portuguese. Some highlights from the 91 lessons currently available in English include:

Library Carpentry’s lessons are geared towards library and information science professionals. They include:

BERT for Humanists provides tutorials in natural language processing (NLP) and machine learning for a humanities audience. Workshops (with Google Colab Notebooks) include:

Constellate is a platform for text analytics developed by ITHAKA (the organization behind JSTOR). Tutorials include:

Finally, some miscellaneous tutorials we’d recommend include:

Examples of Humanities Research Using AI & ML 🧪

DeepMind’s Ithaca, a project using AI to restore and attribute texts from Ancient Greece using neural networks as an aid to epigraphy.

MIT researchers using AI to decode the still-undeciphered Indus script. The paper published in Science concludes that the script is likely not just symbols, but language.

AEOLIAN (Artificial Intelligence for Cultural Organisations), including this writeup on use of AI with medieval manuscripts.

Ben Lee’s video walkthrough of the Newspaper Navigator ML tool and the data paper behind the Newspaper Navigator dataset, “Compounded Mediation: A Data Archaeology of the Newspaper Navigator Dataset“