Off-campus UMass Amherst users: To download campus access theses, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users: Please talk to your librarian about requesting this thesis through interlibrary loan.

Theses that have an embargo placed on them will not be available to anyone until the embargo expires.

Access Type

Open Access

Document Type

thesis

Degree Program

Electrical & Computer Engineering

Degree Type

Master of Science (M.S.)

Year Degree Awarded

2012

Month Degree Awarded

September

Keywords

search engine, image retrieval, random walk, book indexing, large scale, cloud computing

Abstract

Search engines play a very important role in daily life. As multimedia product becomes more and more popular, people have developed search engines for images and videos. In the first part of this thesis, I propose a prototype of a book image search engine. I discuss tag representation for the book images, as well as the way to apply the probabilistic model to generate image tags. Then I propose the random walk refinement method using tag similarity graph. The image search system is built on the Galago search engine developed in UMASS CIIR lab.

Consider the large amount of data the search engines need to process, I bring in cloud environment for the large-scale distributed computing in the second part of this thesis. I discuss two models, one is the MapReduce model, which is currently one of the most popular technologies in the IT industry, and the other one is the Maiter model. The asynchronous accumulative update mechanism of Maiter model is a great fit for the random walk refinement process, which takes up 84% of the entire run time, and it accelerates the refinement process by 46 times.

DOI

https://doi.org/10.7275/3273265

First Advisor

James Allan

Second Advisor

Lixin Gao

COinS