Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.

Document Type

Campus-Only Access for One (1) Year

Degree Name

Doctor of Philosophy (PhD)

Degree Program

Computer Science

Year Degree Awarded

2017

Month Degree Awarded

September

First Advisor

Gerome Miklau

Second Advisor

Alexandra Meliou

Subject Categories

Computer Sciences | Databases and Information Systems

Abstract

Database usability has become an important research topic over the last decade. In the early days, database management systems were maintained by sophisticated users like database administrators. Today, due to the availability of data and computing resources, more non-expert users are involved in database computation. From their point of view, database systems lack ease of use. So researchers believe that usability is as important as the performance and functionality of databases and therefore developed many techniques such as natural language interface to enhance the ease of use of databases. In this thesis, we find some deeper technical issues in database usability, so we look at several core database technologies to further improve the ease of use of databases in two dimensions: we help users process data and exploit computing capacities.

We start by helping users find the data. In the real world, public data is everywhere on the Web, but it is scattered around. We extract a prototype relational knowledge base to solve this problem. We start from the most basic binary mapping relationships (sometimes also named bridge tables) between entities from the web. This mapping relationship facilitates many data transformation applications such as auto-correct, auto-fill, and auto-join.

After finding the data, we help users explore the data. When users issue queries to explore the data, their query results may contain too many items. So the system designer has to present a small subset of representative and diverse items rather than all items. This is known as the query result diversification problem. We propose the RC-Index, which helps to solve the diversification problem by significantly reducing the number of items that must be retrieved by the database to form a diverse set of a desired size. It is nearly an order of magnitude faster than the state-of-the-art and has a good performance guarantee, which improves the ease of use of databases in terms of querying.

Finally, we shift our focus from data to computing capacities. We propose a framework to help users choose configurations in the cloud. Cloud computing has revolutionized data analysis, but choosing the right configuration is challenging because the common pricing mechanism of the public cloud is too complicated. Users have to consider low-level resources to find the best plan for their computational tasks. To address this issue, we propose a new market-based framework for pricing computational tasks in the cloud. We introduce agents to help users configure their personalized databases, which improves the ease of use of databases in the cloud.

Share

COinS