CS&E Colloquium: Social Science as a Problem Space for Natural Language Processing
The computer science colloquium takes place on Mondays from 11:15 a.m. - 12:15 p.m. This week's speaker, Alexander Hoyle (ETH Zürich AI Center), will be giving a talk titled "Social Science as a Problem Space for Natural Language Processing".
Abstract
Methods in natural language processing (NLP) have matured to the point where they can address complex real-world problems. However, the process of advancing machine learning and NLP relies on the evaluation of constrained and often artificial tasks that may bear no clearly valid relationship to real-world problems. This disconnect leads to failures in generalization and limits methods’ utility.
In contrast, the social sciences provide a rich problem space, where questions of validity are at the center: what and how should we measure? Here, moving from language data to quantifiable social constructs demands complex reasoning over language. The premise underpinning this talk is that an effective way to advance NLP as a field is to anchor it in the needs of social science. An emphasis on operational validity helps mitigate NLP's benchmark myopia while also advancing the study of social phenomena. The talk will focus on contributions to two core activities within computational social science (CSS): the inductive development and measurement of latent constructs in text. Crucially, both are underpinned by human-centered validation; I will show how such an orientation led to a rethinking of standard topic model evaluation practices.
Biography
Alexander Hoyle is a postdoctoral fellow at the ETH Zürich AI Center, working both with groups at the Center for Law and Economics and the Institute for Machine Learning. He received his Ph.D. Computer Science from the University of Maryland in 2024, where he was advised by Philip Resnik. His research is focused on the development and evaluation of methods for computational social science, while remaining sensitive to needs for interpretability and validity. His work has appeared in major NLP and Machine Learning conferences.