For this reason, learning from demonstration (LfD) has become a popular alternative to traditional robot programming methods, aiming to provide a natural mechanism for quickly teaching robots. By simply showing a robot how to perform a task, users can easily demonstrate new tasks as needed, without any special knowledge about the robot. Unfortunately, LfD often yields little semantic knowledge about the world, and thus lacks robust generalization capabilities, especially for complex, multi-step tasks.

To address this shortcoming of LfD, we present a series of algorithms that draw from recent advances in Bayesian nonparametric statistics and control theory to automatically detect and leverage repeated structure at multiple levels of abstraction in demonstration data. The discovery of repeated structure provides critical insights into task invariants, features of importance, high-level task structure, and appropriate skills for the task. This culminates in the discovery of semantically meaningful skills that are flexible and reusable, providing robust generalization and transfer in complex, multi-step robotic tasks. These algorithms are tested and evaluated using a PR2 mobile manipulator, showing success on several complex real-world tasks, such as furniture assembly.

]]>It is straightforward to construct differentially private algorithms for many common tasks and there are published algorithms to support various tasks under differential privacy. However methods to design error-optimal algorithms for most non-trivial tasks are still unknown. In particular, we are interested in error-optimal algorithms for sets of linear queries. A linear query is a sum of counts of tuples that satisfy a certain condition, which covers the scope of many aggregation tasks including count, sum and histogram. We present the matrix mechanism, a novel mechanism for answering sets of linear queries under differential privacy. The matrix mechanism makes a clear distinction between a set of queries submitted by users, called the query workload, and an alternative set of queries to be answered under differential privacy, called the query strategy. The answer to the query workload can then be computed using the answer to the query strategy. Given a query workload, the query strategy determines the distribution of the output noise and the power of the matrix mechanism comes from adaptively choosing a query strategy that minimizes the output noise.

Our analyses also provide a theoretical measure to the quality of different strategies for a given workload. This measure is then used in accurate and approximate formulations to the optimization problem that outputs the error-optimal strategy. We present a lower bound of error to answer each workload under the matrix mechanism. The bound reveals that the hardness of a query workload is related to the spectral properties of the workload when it is represented in matrix form. In addition, we design an approximate algorithm, which generates strategies generated by our a out perform state-of-art mechanisms over (epsilon, delta)-differential privacy. Those strategies lead to more accurate data analysis while preserving a rigorous privacy guarantee. Moreover, we also combine the matrix mechanism with a novel data-dependent algorithm, which achieves differential privacy by adding noise that is adapted to the input data and to the given query workload.

]]>The goal of this work is to explore the effects of personalization and privacy preservation methods on three information retrieval applications, namely search task identification, task-aware query recommendation, and searcher frustration detection. We pursue this goal by first introducing a novel framework called CrowdLogging for logging and aggregating data privately over a distributed set of users. We then describe several privacy mechanisms for sanitizing global data, including one novel mechanism based on differential privacy. We present a template for describing how local user data and global aggregate data are collected, processed, and used within an application, and apply this template to our three applications.

We find that sanitizing feature vectors aggregated across users has a low impact on performance for classification applications (search task identification and searcher frustration detection). However, sanitizing free-text query reformulations is extremely detrimental to performance for the query recommendation application we consider. Personalization is useful to some degree in all the applications we explore when integrated with global information, achieving gains for search task identification, task-aware query recommendation, and searcher frustration detection.

Finally we introduce an open source system called CrowdLogger that implements the CrowdLogging framework and also serves as a platform for conducting in-situ user studies of search behavior, prototyping and evaluating information retrieval applications, and collecting labeled data.

]]>This thesis demonstrates the potential for system-level power analysis---the inference of a computers internal states based on power observation at the "plug." It also examines which hardware components and software workloads have the greatest impact on information leakage. This thesis identifies the potential for privacy violations by demonstrating that a malicious party could identify which webpage from a given corpus a user is viewing with greater than 99% accuracy. It also identifies constructive applications for power analysis, evaluating its use as an anomaly detection mechanism for embedded devices with greater than 94% accuracy for each device tested. Finally, this thesis includes modeling work that correlates AC and DC power consumption to pinpoint which components contribute most to information leakage and analyzes software workloads to identify which classes of work lead to the most information leakage.

Understanding the security and privacy risks and opportunities that come with energy-proportional computing will allow future systems to either apply system-level power analysis fruitfully or thwart its malicious application.

]]>This thesis presents three new optimization techniques designed to deal with different aspects of structured queries. The first technique involves manipulation of interpolated subqueries, a common structure found across a large number of retrieval models today. We then develop an alternative scoring formulation to make retrieval models more responsive to dynamic pruning techniques. The last technique is delayed execution, which focuses on the class of queries that utilize term dependencies and term conjunction operations. In each case, we empirically show that these optimizations can significantly improve query processing efficiency without negatively impacting retrieval effectiveness.

Additionally, we implement these optimizations in the context of a new retrieval system known as Julien. As opposed to implementing these techniques as one-off solutions hard-wired to specific retrieval models, we treat each technique as a ``behavioral'' extension to the original system. This allows us to flexibly stack the modifications to use the optimizations in conjunction, increasing efficiency even further. By focusing on the behaviors of the objects involved in the retrieval process instead of on the details of the retrieval algorithm itself, we can recast these techniques to be applied only when the conditions are appropriate. Finally, the modular design of these components illustrates a system design that allows improvements to be implemented without disturbing the existing retrieval infrastructure.

]]>This thesis presents a system for uncertain data processing that has two key functionalities, (i) capturing and transforming raw noisy data to rich queriable tuples that carry attributes needed for query processing with quantified uncertainty, and (ii) performing query processing on such tuples, which captures changes of uncertainty as data goes through various query operators. The proposed system considers data naturally captured by continuous distributions, which is prevalent in sensing and scientific applications.

The first part of the thesis addresses data capture and transformation by proposing a probabilistic modeling and inference approach. Since this task is application-specific and requires domain knowledge, this approach is demonstrated for RFID data from mobile readers. More specifically, the proposed solution involves an inference and cleaning substrate to transform raw RFID data streams to object location tuple streams where locations are inferred from raw noisy data and their uncertain values are captured by probability distributions.

The second, also the main part, of this thesis examines query processing for uncertain data modeled by continuous random variables. The proposed system includes new data models and algorithms for relational processing, with a focus on aggregation and conditioning operations. For operations of high complexity, optimizations including approximations with guaranteed error bounds are considered. Then complex queries involving a mix of operations are addressed by query planning, which given a query, finds an efficient plan that meets user-defined accuracy requirements.

Besides relational processing, this thesis also provides the support for user-defined functions (UDFs) on uncertain data, which aims to compute the output distribution given uncertain input and a black-box UDF. The proposed solution employs a learning-based approach using Gaussian processes to compute approximate output with error bounds, and a suite of optimizations for high performance in online settings such as data stream processing and interactive data analysis.

The techniques proposed in this thesis are thoroughly evaluated using both synthetic data with controlled properties and various real-world datasets from the domains of severe weather monitoring, object tracking using RFID readers, and computational astrophysics. The experimental results show that these techniques can yield high accuracy, meet stream speeds, and outperform existing techniques such as Monte Carlo sampling for many important workloads

.

]]>Cloud platforms are particularly suited for such applications due to their ability to provision capacity when needed and charge for usage on pay-per-use basis. Cloud environments enable elastic provisioning by providing a variety of hardware configurations as well as mechanisms to add or remove server capacity.

The first part of this thesis presents Kingfisher, a cost-aware system that provides a generalized provisioning framework for supporting elasticity in the cloud by (i) leveraging multiple mechanisms to reduce the time to transition to new configurations, and (ii) optimizing the selection of a virtual server configuration that minimize cost.

Majority of these enterprise applications, deployed as web applications, are distributed or replicated with a multi-tier architecture. SLAs for such applications are often expressed as a high percentile of a performance metric, for e.g. 99 percentile of end to end response time is less than 1 sec. In the second part of this thesis I present a model driven technique which provisions a multi-tier application for such an SLA and is targeted for cloud platforms.

Enterprises critically depend on these applications and often own large IT infrastructure to support the regular operation of these applications. However, provisioning for a peak load or for high percentile of response time could be prohibitively expensive. Thus there is a need of hybrid cloud model, where the enterprise uses its own private resources for the majority of its computing, but then "bursts" into the cloud when local resources are insufficient. I discuss a new system, namely Seagull, which performs dynamic provisioning over a hybrid cloud model by enabling cloud bursting.

Finally, I describe a methodology to model the configuration patterns (i.e deployment topologies) of different control plane services of a cloud management system itself. I present a generic methodology, based on empirical profiling, which provides initial deployment configuration of a control plane service and also a mechanism which iteratively adjusts the configuration to avoid violation of control plane's Service Level Objective (SLO).

]]>TPCs' dependence solely on transient, harvested power offers several important design-time benefits. For example, omitting batteries saves board space and weight while obviating the need to make devices physically accessible for maintenance. However, transient power may provide an unpredictable supply of energy that makes operation difficult. A predictable energy supply is a key abstraction underlying most electronic designs. TPCs discard this abstraction in favor of opportunistic computation that takes advantage of available resources. A crucial question is how should a software-controlled computing device operate if it depends completely on external entities for power and other resources? The question poses challenges for computation, communication, storage, and other aspects of TPC design.

The main idea of this work is that software techniques can make energy harvesting a practicable form of power supply for electronic devices. Its overarching goal is to facilitate the design and operation of usable TPCs.

This thesis poses a set of challenges that are fundamental to TPCs, then pairs these challenges with approaches that use software techniques to address them. To address the challenge of computing steadily on harvested power, it describes Mementos, an energy-aware state-checkpointing system for TPCs. To address the dependence of opportunistic RF-harvesting TPCs on potentially untrustworthy RFID readers, it describes CCCP, a protocol and system for safely outsourcing data storage to RFID readers that may attempt to tamper with data. Additionally, it describes a simulator that facilitates experimentation with the TPC model, and a prototype computational RFID that implements the TPC model.

To show that TPCs can improve existing electronic devices, this thesis describes applications of TPCs to implantable medical devices (IMDs), a challenging design space in which some battery-constrained devices completely lack protection against radio-based attacks. TPCs can provide security and privacy benefits to IMDs by, for instance, cryptographically authenticating other devices that want to communicate with the IMD before allowing the IMD to use any of its battery power. This thesis describes a simplified IMD that lacks its own radio, saving precious battery energy and therefore size. The simplified IMD instead depends on an RFID-scale TPC for all of its communication functions.

TPCs are a natural area of exploration for future electronic design, given the parallel trends of energy harvesting and miniaturization. This work aims to establish and evaluate basic principles by which TPCs can operate,

]]>We also explore several tertiary effects. We focus on the impact that gaming and game mechanics have on various aspects of this model acquisition process. We discuss explicit game mechanics that were implemented in the source ITS from which our data was collected. Students who are given our system with game mechanics contribute higher amounts of data, while also performing higher quality work. Additionally, we define a novel type of game called a knowledge-refinement game (KRG), which motivates subject matter experts (SMEs) to contribute to an already constructed EEKB, but for the purpose of refining the model in areas in which confidence is low. Experimental work with the KRG provides strong evidence that: 1) the quality of the original EEKB was indeed strong, as validated by KRG players, and 2) both the quality and breadth of knowledge within the EEKB are increased when players use the KRG.

]]>The first part of the thesis addresses multiagent single-step decision making problems where a single joint-decision is required for the plan. We examine these decision-theoretic problems within the broad frameworks of distributed constraint optimization and Markov random fields. Such models succinctly capture the structure of interaction among different decision variables, which is subsequently exploited by algorithms to enhance scalability. The algorithms presented in this thesis are rigorously grounded on concepts from mathematical programming and optimization.

The second part of the thesis addresses multiagent sequential decision making problems under uncertainty and partial observability. We use the decentralized partially observable Markov decision processes (Dec-POMDPs) to formulate multiagent planning problems. To address the challenge of NEXP-Hard complexity and yet push the envelope of scalability, we represent the domain structure in a multiagent system using graphical models such as dynamic Bayesian networks and constraint networks. By exploiting such graphical planning representation in an algorithmic framework composed of techniques from different sub-areas of artificial intelligence, machine learning and operations research, we show impressive gains in increasing the scalability, the range of problems addressed and enabling quality-bounded solutions for multiagent decision theoretic planning.

Our contributions for sequential decision making include a) development of efficient dynamic programming algorithms for finite-horizon decision making, resulting in significantly increased scalability w.r.t. the number of agents and multiple orders-of-magnitude speedup over previous best approaches; b) development of probabilistic inference based algorithms for infinite-horizon decision making, resulting in new insights connecting inference techniques from the machine learning literature to multiagent systems; c) development of mathematical programming based scalable techniques for quality bounded solutions in multiagent systems, which has been considered intractable so far.

Several of our contributions are some of the first for the respective class of problems. For example, we show for the first time how machine learning is closely related to multiagent decision making via a maximum likelihood formulation of the planning problem. We develop new graphical models and machine learning based inference algorithms for large factored planning problems. We also show for the first time how the problem of optimizing agents' policies can be formulated as a compact mixed-integer program, resulting in optimal solution for a range of Dec-POMDP benchmarks.

In summary, we present a synthesis of different techniques from multiple sub-areas of AI, ML and OR to address the scalability and efficiency of algorithms for decision-theoretic reasoning and planning in multiagent systems. Such advances have already shown great promise to bridge the gap between multiagent systems and real-world applications.

]]>