Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.

Date of Award


Access Type

Open Access Dissertation

Document type


Degree Name

Doctor of Philosophy (PhD)

Degree Program

Computer Science

First Advisor

Allen R. Hanson

Second Advisor

Erik G. Learned-Miller

Subject Categories

Computer Sciences


Although an automated reader for the blind first appeared nearly two-hundred years ago, computers can currently “read” document text about as well as a sevenyear-old. Scene text recognition brings many new challenges. A central limitation of current approaches is a feed-forward, bottom-up, pipelined architecture that isolates the many tasks and information involved in reading. The result is a system that commits errors from which it cannot recover and has components that lack access to relevant information.

We propose a system for scene text reading that in its design, training, and operation is more integrated. First, we present a simple contextual model for text detection that is ignorant of any recognition. Through the use of special features and data context, this model performs well on the detection task, but limitations remain due to the lack of interpretation. We then introduce a recognition model that integrates several information sources, including font consistency and a lexicon, and compare it to approaches using pipelined architectures with similar information. Next we examine a more unified detection and recognition framework where features are selected based on the joint task of detection and recognition, rather than each task individually. This approach yields better results with fewer features. Finally, we demonstrate a model that incorporates segmentation and recognition at both the character and word levels. Text with difficult layouts and low resolution are more accurately recognized by this integrated approach. By more tightly coupling several aspects of detection and recognition, we hope to establish a new unified way of approaching the problem that will lead to improved performance. We would like computers to become accomplished grammar-school level readers.