Publication Date

1996

Abstract

There are many historical manuscripts written in a single hand which it would be useful to index. Examples include theW. B. DuBois collection at theUniversity ofMassachusetts and the early Presidential libraries at the Library of Congress. The standard technique for indexing documents is to scan them in, convert them to machine readable form (ASCII) using Optical Character Recognition (OCR) and then index them using a text retrieval engine. However, OCR does not work well on handwriting. Here an alternative scheme is proposed for indexing such texts. Each page of the document is segmented into words. The images of the words are then matched against each other to create equivalence classes (each equivalence classes containsmultiple instances of the same word). The user then provides ASCII equivalents for say the top 2000 equivalence classes. The current paper deals with the matching aspects of this process. Due to variations in even a single person’s handwriting, it is expected that the matching will be the most difficult step in the whole process. A matching technique based on Euclidean distance mapping is discussed. Experiments are shown demonstrating the feasibility of the approach.

Comments

This paper was harvested from CiteSeer

Recommended Citation

Manmatha, R., "Indexing Handwriting Using Word Matching" (1996). Computer Science Department Faculty Publication Series. 203.
Retrieved from https://scholarworks.umass.edu/cs_faculty_pubs/203

Download

Included in

Computer Sciences Commons

COinS

ScholarWorks@UMass Amherst

Computer Science Department Faculty Publication Series

Indexing Handwriting Using Word Matching

Publication Date

Abstract

Comments

Recommended Citation

Included in

Browse

Author Corner

Links

ScholarWorks@UMass Amherst

Computer Science Department Faculty Publication Series

Indexing Handwriting Using Word Matching

Authors

Publication Date

Abstract

Comments

Recommended Citation

Included in

Share

Browse

Author Corner

Links