Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.



Access Type

Open Access Thesis

Document Type


Degree Program

Electrical & Computer Engineering

Degree Type

Master of Science in Electrical and Computer Engineering (M.S.E.C.E.)

Year Degree Awarded


Month Degree Awarded



In recent years with the exponential growth in web-based applications the amount of data generated has increased tremendously. Quick and accurate analysis of this 'big data' is indispensable to make better business decisions and reduce operational cost. The challenges faced by modern day data centers to process big data are multi fold: to keep up the pace of processing with increased data volume and increased data velocity, deal with system scalability and reduce energy costs. Today's data centers employ a variety of distributed computing frameworks running on a cluster of commodity hardware which include general purpose processors to process big data. Though better performance in terms of big data processing speed has been achieved with existing distributed computing frameworks, there is still an opportunity to increase processing speed further. FPGAs, which are designed for computationally intensive tasks, are promising processing elements that can increase processing speed. In this thesis, we discuss how FPGAs can be integrated into a cluster of general purpose processors running iterative algorithms and obtain high performance.

In this thesis, we designed a heterogeneous cluster comprised of FPGAs and CPUs and ran various benchmarks such as PageRank, Katz and Connected Components to measure the performance of the cluster. Performance improvement in terms of execution time was evaluated against a homogeneous cluster of general purpose processors and a homogeneous cluster of FPGAs. We built multiple four-node heterogeneous clusters with different configurations by varying the number of CPUs and FPGAs.

We studied the effects of load balancing between CPUs and FPGAs. We obtained a speedup of 20X, 11.5X and 2X for PageRank, Katz and Connected Components benchmarks on a cluster cluster configuration of 2 CPU + 2 FPGA for an unbalancing ratio against a 4-node homogeneous CPU cluster. We studied the effect of input graph partitioning, and showed that when the input is a Multilevel-KL partitioned graph we obtain an improvement of 11%, 26% and 9% over randomly partitioned graph for Katz, PageRank and Connected Components benchmarks on a 2 CPU + 2 FPGA cluster.


First Advisor

Russell G Tessier

Second Advisor

Lixin Gao

Third Advisor

David Irwin