Quantifying the Gap: A Case Study of Wikidata Gender Disparities [conference paper]


17th International Symposium on Open Collaboration (OpenSym) - September 15-17, 2021


Charles Chuankai Zhang (Ph.D. student), Loren Terveen (professor)


Much prior research has found gender bias in peer production systems like Wikipedia and OpenStreetMap. This bias affects both women’s participation in these platforms and content about women on these platforms. We investigated the gender content gap in Wikidata, where less than 22% of items that represent people are about women. We asked: what is the source of this bias? Specifically, does it originate from the actions of Wikidata editors or from external factors; that is, does it simply reflect existing real world gender bias? We conducted a quantitative case study that found: (i) the most popular categories of people included in Wikidata represent male-dominant professions, such as American football; (ii) within a selected set of professions where we could obtain gender distribution data, Wikidata is no more biased than the real world: men and women are included at similar percentages, and the quality of items representing men and women also is similar. We provide possible explanations for our findings and implications for addressing the Wikidata content gap.

Link to full paper

Quantifying the Gap: A Case Study of Wikidata Gender Disparities


human computer interaction, Wikipedia