Sociolinguistically Driven Approaches for Just Natural Language Processing

Blodgett, Su Lin

Publication

Sociolinguistically Driven Approaches for Just Natural Language Processing

Blodgett, Su Lin

Abstract

Natural language processing (NLP) systems are now ubiquitous. Yet the benefits of these language technologies do not accrue evenly to all users, and indeed they can be harmful; NLP systems reproduce stereotypes, prevent speakers of non-standard language varieties from participating fully in public discourse, and re-inscribe historical patterns of linguistic stigmatization and discrimination. How harms arise in NLP systems, and who is harmed by them, can only be understood at the intersection of work on NLP, fairness and justice in machine learning, and the relationships between language and social justice. In this thesis, we propose to address two questions at this intersection: i) How can we conceptualize harms arising from NLP systems?, and ii) How can we quantify such harms? We propose the following contributions. First, we contribute a model in order to collect the first large dataset of African American Language (AAL)-like social media text. We use the dataset to quantify the performance of two types of NLP systems, identifying disparities in model performance between Mainstream U.S. English (MUSE)- and AAL-like text. Turning to the landscape of bias in NLP more broadly, we then provide a critical survey of the emerging literature on bias in NLP and identify its limitations. Drawing on work across sociology, sociolinguistics, linguistic anthropology, social psychology, and education, we provide an account of the relationships between language and injustice, propose a taxonomy of harms arising from NLP systems grounded in those relationships, and propose a set of guiding research questions for work on bias in NLP. Finally, we adapt the measurement modeling framework from the quantitative social sciences to effectively evaluate approaches for quantifying bias in NLP systems. We conclude with a discussion of recent work on bias through the lens of style in NLP, raising a set of normative questions for future work.

Type

openaccess
article
dissertation

Degree

Doctor of Philosophy (PhD)

Advisors

Brendan O'Connor

License

http://creativecommons.org/licenses/by/4.0/

Sociolinguistically Driven Approaches for Just Natural Language Processing

Blodgett, Su Lin

Abstract

Type

Date

Publisher

Degree

Advisors

Rights

License

Files

Research Projects

Organizational Units

Journal Issue

Embargo

URI

DOI

Publisher Version

Embedded videos

Collections