Off-campus UMass Amherst users: To download campus access theses, please use the following link to log into our proxy server with your UMass Amherst user name and password.
Non-UMass Amherst users: Please talk to your librarian about requesting this thesis through interlibrary loan.
Theses that have an embargo placed on them will not be available to anyone until the embargo expires.
Electrical & Computer Engineering
Master of Science in Electrical and Computer Engineering (M.S.E.C.E.)
Year Degree Awarded
Month Degree Awarded
RDF, SPARQL, MapReduce, Cloud Computing
Social network data analysis becomes increasingly important today. In order to improve the integration and reuse of their data, many social networks start to apply RDF to present the data. Accordingly, one common approach for social network data analysis is to employ SPARQL to query RDF data.
As the sizes of social networks expand rapidly, queries need to be executed in parallel such as using the MapReduce framework. However, the state-of-the-art translation from SPARQL queries to MapReduce jobs mainly follows a two layer rule, in which SPARQL is first translated to SQL join, is not efficient. In this thesis, we introduce two primitives to enable automatic translation from SPARQL to MapReduce, and to enable efficient execution of the SPARQL queries. We use multiple-join-with-filter to substitute traditional SQL multiple join when feasible, and merge different stages in the MapReduce query workflow. The evaluation on social network benchmarks shows that these two primitives can achieve up to 2x speedup in query running time compared with the original two layer scheme.