Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.

Author ORCID Identifier


Open Access Dissertation

Document Type


Degree Name

Doctor of Philosophy (PhD)

Degree Program


Year Degree Awarded


Month Degree Awarded


First Advisor

Joe Pater

Second Advisor

Gaja Jarosz

Third Advisor

Cameron Musco

Subject Categories

Computational Linguistics | Phonetics and Phonology


This dissertation explores the possibility that the phonological grammar manipulates phone representations based on learned distributional class memberships rather than those based on substantive linguistic features. In doing so, this work makes three primary contributions. First, I propose three novel algorithms for learning a phonological class system from the distributional statistics of a language, all of which are based on partitioning graph representations of phone distributions. Second, I propose a new method for fitting Maximum Entropy phonotactic grammars, MaxEntGrams, which offers theoretical complexity improvements over the widely-adopted approach taken by Hayes and Wilson [2008]. Third, I present a series of computational experiments which fit MaxEntGram models built on top of learned phonological class systems to English, Polish, and Korean and evaluate the extent to which the resulting grammars predict existing experimental results on sonority projection. The results of these computational experiments suggest that the models with learned class systems predict human-like sonority projection behavior as well as the standard approach using traditional linguistic feature specification in both English and Korean, and better than the traditional approach in Polish. This success is attributed, in part, to the fact that the combination of phonological class learning and MaxEntGrams eliminates the need for constraint-induction heuristics. All together, none of the tested cases provide evidence that phonotactic models built using traditional, substantive linguistic feature systems predict human behavior better than models that make use of distributionally-defined phone representations.


Creative Commons License

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.

Available for download on Friday, September 01, 2023