Doctoral Dissertations

Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.

Sociolinguistically Driven Approaches for Just Natural Language Processing

Su Lin Blodgett, University of Massachusetts AmherstFollow

Author ORCID Identifier

https://orcid.org/0000-0002-9861-3483

AccessType

Open Access Dissertation

Document Type

dissertation

Degree Name

Doctor of Philosophy (PhD)

Degree Program

Computer Science

Year Degree Awarded

2021

Month Degree Awarded

February

First Advisor

Brendan O'Connor

Subject Categories

Artificial Intelligence and Robotics

Abstract

Natural language processing (NLP) systems are now ubiquitous. Yet the benefits of these language technologies do not accrue evenly to all users, and indeed they can be harmful; NLP systems reproduce stereotypes, prevent speakers of non-standard language varieties from participating fully in public discourse, and re-inscribe historical patterns of linguistic stigmatization and discrimination. How harms arise in NLP systems, and who is harmed by them, can only be understood at the intersection of work on NLP, fairness and justice in machine learning, and the relationships between language and social justice. In this thesis, we propose to address two questions at this intersection: i) How can we conceptualize harms arising from NLP systems?, and ii) How can we quantify such harms? We propose the following contributions. First, we contribute a model in order to collect the first large dataset of African American Language (AAL)-like social media text. We use the dataset to quantify the performance of two types of NLP systems, identifying disparities in model performance between Mainstream U.S. English (MUSE)- and AAL-like text. Turning to the landscape of bias in NLP more broadly, we then provide a critical survey of the emerging literature on bias in NLP and identify its limitations. Drawing on work across sociology, sociolinguistics, linguistic anthropology, social psychology, and education, we provide an account of the relationships between language and injustice, propose a taxonomy of harms arising from NLP systems grounded in those relationships, and propose a set of guiding research questions for work on bias in NLP. Finally, we adapt the measurement modeling framework from the quantitative social sciences to effectively evaluate approaches for quantifying bias in NLP systems. We conclude with a discussion of recent work on bias through the lens of style in NLP, raising a set of normative questions for future work.

DOI

https://doi.org/10.7275/20410631

Recommended Citation

Blodgett, Su Lin, "Sociolinguistically Driven Approaches for Just Natural Language Processing" (2021). Doctoral Dissertations. 2092.
https://doi.org/10.7275/20410631 https://scholarworks.umass.edu/dissertations_2/2092

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

ScholarWorks@UMass Amherst

Doctoral Dissertations

Sociolinguistically Driven Approaches for Just Natural Language Processing

Author ORCID Identifier

AccessType

Document Type

Degree Name

Degree Program

Year Degree Awarded

Month Degree Awarded

First Advisor

Subject Categories

Abstract

DOI

Recommended Citation

Creative Commons License

Included in

Browse

Author Corner

Links

ScholarWorks@UMass Amherst

Doctoral Dissertations

Sociolinguistically Driven Approaches for Just Natural Language Processing

Author

Author ORCID Identifier

AccessType

Document Type

Degree Name

Degree Program

Year Degree Awarded

Month Degree Awarded

First Advisor

Subject Categories

Abstract

DOI

Recommended Citation

Creative Commons License

Included in

Share

Browse

Author Corner

Links