Loading...
Thumbnail Image
Publication

The Billion Object Platform (BOP): a system to lower barriers to support big, streaming, spatio-temporal data sources

Abstract
With funding from the Sloan Foundation and Harvard Dataverse, the Harvard Center for Geographic Analysis (CGA) has developed a big spatio-temporal data visualization platform called the Billion Object Platform or "BOP". The goal of the project is to lower barriers for scholars who wish to access large, streaming, spatio-temporal datasets. Since once archived, streaming data gets big fast, and since most GIS systems don't support interactive visualization of millions of objects, a new platform was needed. The BOP is loaded with the latest billion geo-tweets and is fed a real-time stream of about 1 million tweets per day. The CGA has been harvesting and archiving geo-tweets since 2012. As tweets flow into the BOP, they are enriched with sentiment and census information to support further analysis. Incoming and intermediate data is streamed/stored in Apache Kafka. The core of the BOP is Apache Solr, which supports fast search. Some significant enhancements were done to Solr (and contributed back) -- notably 2D "heatmap faceting" to support spatial visualization. The BOP fronts Solr with a RESTful web service, which provides a friendly, and secure API that is accessed from a browser-based client. The client developed, dynamically displays temporal and spatial distributions of results for result sets containing hundreds of millions of features. The system is open source and runs on commodity hardware. It is hosted on Massachusetts Open Cloud (MOC), an OpenStack environment. All components are deployed in Docker orchestrated by Kontena.
Type
paper
article
Date
Publisher
Degree
Advisors
Rights
License
http://creativecommons.org/licenses/by-sa/4.0/
Research Projects
Organizational Units
Embargo
Publisher Version
Embedded videos
Collections