Loading...
Thumbnail Image
Publication

Efficient Social Network Data Query Processing on MapReduce

Abstract
Social network data analysis becomes increasingly important today. In order to improve the integration and reuse of their data, many social networks start to apply RDF to present the data. Accordingly, one common approach for social network data analysis is to employ SPARQL to query RDF data. As the sizes of social networks expand rapidly, queries need to be executed in parallel such as using the MapReduce framework. However, the state-of-the-art translation from SPARQL queries to MapReduce jobs mainly follows a two layer rule, in which SPARQL is first translated to SQL join, is not efficient. In this thesis, we introduce two primitives to enable automatic translation from SPARQL to MapReduce, and to enable efficient execution of the SPARQL queries. We use multiple-join-with-filter to substitute traditional SQL multiple join when feasible, and merge different stages in the MapReduce query workflow. The evaluation on social network benchmarks shows that these two primitives can achieve up to 2x speedup in query running time compared with the original two layer scheme.
Type
campus
article
thesis
Date
2013-01-01
Publisher
Advisors
Rights
License
Research Projects
Organizational Units
Journal Issue
Embargo
Publisher Version
Embedded videos
Collections