Minnesota Natural Language Processing Seminar Series: Diversity-Informed Dialogue Generation
The Minnesota Natural Language Processing (NLP) Seminar is a venue for faculty, postdocs, students, and anyone else interested in theoretical, computational, and human-centric aspects of natural language processing to exchange ideas and foster collaboration. The talks are every other Friday from 12 p.m. - 1 p.m. during the Fall 2021 semester.
This week's speaker, Katie Stasaski (University of California, Berkeley), will be giving a talk titled "Diversity-Informed Dialogue Generation."
Automated generation of conversational dialogue often produces uninteresting, predictable responses; this is known as the diversity problem. We introduce a new strategy to address this problem, called Diversity-Informed Data Collection (DIDC). Unlike prior approaches, which modify model architectures to solve the problem, this method uses dynamically computed corpus-level statistics to determine which conversational participants to collect data from. DIDC produces significantly more diverse data than baseline data collection methods and produces better results on two downstream tasks. This method is generalizable and can be used with other corpus-level metrics.
Katie Stasaski is a 6th year Ph.D. student at UC Berkeley, advised by Marti Hearst. She is interested in the intersection of natural language processing and education. Her past work has dealt specifically with increasing diversity of dialogue systems and generating complex questions. She is fortunate to be funded by an NSF GRFP and a Chancellor's Fellowship, in addition to an Amazon Machine Learning Research Award.