Off-campus UMass Amherst users: To download campus access theses, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users: Please talk to your librarian about requesting this thesis through interlibrary loan.

Theses that have an embargo placed on them will not be available to anyone until the embargo expires.

Document Type

Campus Access

Degree Program

Electrical & Computer Engineering

Degree Type

Master of Science in Electrical and Computer Engineering (M.S.E.C.E.)

Year Degree Awarded

2013

Month Degree Awarded

May

Keywords

RDF, SPARQL, MapReduce, Cloud Computing

Abstract

Social network data analysis becomes increasingly important today. In order to improve the integration and reuse of their data, many social networks start to apply RDF to present the data. Accordingly, one common approach for social network data analysis is to employ SPARQL to query RDF data.

As the sizes of social networks expand rapidly, queries need to be executed in parallel such as using the MapReduce framework. However, the state-of-the-art translation from SPARQL queries to MapReduce jobs mainly follows a two layer rule, in which SPARQL is first translated to SQL join, is not efficient. In this thesis, we introduce two primitives to enable automatic translation from SPARQL to MapReduce, and to enable efficient execution of the SPARQL queries. We use multiple-join-with-filter to substitute traditional SQL multiple join when feasible, and merge different stages in the MapReduce query workflow. The evaluation on social network benchmarks shows that these two primitives can achieve up to 2x speedup in query running time compared with the original two layer scheme.

First Advisor

Lixin Gao

Share

COinS