Computer Science Department Dissertations Collection

Permanent URI for this collection

https://hdl.handle.net/20.500.14394/213

Browse

From LoRa Sensing to Coexistence of LoRa Sensing and Communication
(2024-09) Xie, Binbin
Wireless sensing is an exciting new research area which can benefit a large spectrum of disciplines including elderly care, HCI, environment monitoring, and disaster response. The key difference between wireless sensing and traditional sensor-based sensing is that the target does not need to be equipped with any sensors and the wireless signal itself is utilized to sense the context information of humans. The rationale behind wireless sensing is that wireless signals vary with human movement. For instance, when a person moves in a room covered by WiFi, the WiFi signal reflected from this person varies with his/her movement. By analyzing the signal variation, the motion information such as target moving speed and respiration rate can be obtained. The contact-free and sensor-free nature of wireless sensing makes it particularly appealing in challenging scenarios such as pandemic and disaster survivor detection. During the COVID-19 pandemic, it is preferred that the patients' respiration rates can be monitored in a contact-free manner through walls. In disasters such as building collapse where the survivors do not have any sensors with them, wireless sensing can be crucial in detecting their presence and saving lives. % While promising in many aspects, there are several critical issues that hinder wireless sensing from being widely deployed in real-life scenarios. % critical issues still exist. These issues include (1) very limited sensing range due to the intrinsic nature of employing weak reflection signals for sensing; (2) strong interference from other objects in the environment; and (3) severe degradation of sensing performance in the presence of ongoing communication function of wireless technologies. This thesis explores the exciting opportunity of employing LoRa~--~the emerging wireless protocol designed for IoT device connections~--~to realize long-range wide-area wireless sensing. This thesis addresses these fundamental issues by making the following contributions. First, we adopt a chirp concentration scheme which fully exploits the property of LoRa chirp to improve the signal power and accordingly boost the sensing range. Second, to mitigate the impact of interference, we propose the concept of ``virtual fence'' to constrain sensing only within the area of interest. The location and size of virtual fence can be flexibly controlled in software to meet the requirements of different applications. Finally, to make LoRa-based wireless sensing work in the presence of ongoing communication, we propose to employ the reversed chirp, i.e., downchirp, for sensing and keep the original upchirp for communication. This design smartly leverages the orthogonality between downchirp and upchirp to address the issue of communication interference on sensing. While the upchirp-downchirp design can remove most of the interference, we further adopt a novel chirp rotation method to deal with the remaining power leakage interference from upchirp to downchip, enhancing the sensing performance.
LTE-Based Pervasive Sensing across Indoor and Outdoor
(2024-09) Feng, Yuda
Besides the communication function, wireless signals, such as WiFi, Bluetooth and UWB, are recently exploited for sensing purposes. However, designing a wireless sensing system that provides truly pervasive coverage at city or even national scale is still challenging. In this dissertation, we propose to involve the pervasive LTE signals into the ecosystem of wireless sensing, and enable various sensing applications on human, vehicles, and agriculture. In the first part, we exploit the unique advantages of downlink LTE sensing in movement detection across different scales, and resolve the corresponding challenges. We demonstrate the advantages of LTE sensing using two typical applications, finegrained indoor respiration monitoring and large-scale outdoor car speed estimation. The proposed system achieves highly accurate respiration sensing with the common problems, blind spot and orientation-sensitive issues, greatly mitigated. In the second part, we propose to combat inherent limitations of downlink LTE sensing, i.e., the low signal quality due to long propagation and significant variations across different areas. We found the key insight in the unique asymmetric downlink and uplink transmissions. Accordingly, we propose to leverage the complementary features of LTE uplink and downlink signals on signal power, bandwidth and sensing rate. We propose noise-resistant combination algorithms and develop robust LTE sensing, expanding the sensing coverage to more than 4 times and extending to general dynamic movement detection. In the last part, we enable LTE sensing on non-movement physical property detection, soil moisture in agriculture. Soil moisture sensing is a basic function in modern precision irrigation. Multiple wireless soil moisture sensing solutions such as WiFi and RFID have been proposed, which, however, can hardly support large scale deployment in farmfield environments. LTE signal provides a unique opportunity for soil moisture sensing as the ubiquitously deployed base stations. We for the first time propose low-cost and low-power LTE based soil moisture sensing. Our low-cost sensing system ($55) achieves a high accuracy (3.15%) comparable to high-end soil sensors ($850), wide coverage (2.4 km from the base station) and low power consumption (lasting 16 months using batteries).
Resource Management for Edge AI
(2024-09) Liang, Qianlin
With the proliferation of IoT devices and the continuous advancement of AI al- gorithms, edge AI, which represents the synergy of edge computing and artificial intelligence, has garnered increasing attention from both academia and industry. By pushing AI frontier to the edge ecosystem which is closer to users, edge AI provides substantial benefits such as low-latency inference, reduced network bandwidth us- age, and enhanced user privacy. However, deploying compute-intensive AI models on resource-constrained edge platforms presents substantial challenges to resource man- agement, which plays a key role in realizing the benefits and ensuring the success of edge systems. It is imperative to efficiently schedule and share the heterogeneous and limited edge resources, including emerging specialized AI accelerators such as GPUs and TPUs, to adapt to the dynamic edge workloads and satisfy their low-latency requirements. Additionally, energy, particularly for battery-powered edge devices, must be considered as a scarce resource, necessitating efficient operation to support the long-term execution of workloads. This thesis addresses pivotal challenges of resource management in Edge AI. By optimizing resource and energy efficiency for AI applications within the constraints of edge computing environments, this thesis aims to enhance hardware utilization, reduce costs, and improve application performance and reliability.
Developing Digital Biomarkers of Early Childhood Mental Health using Multimodal Sensor Data
(2024-09) Kalanadhabhatta, Manasa
Pediatric mental health is a growing concern around the world, with mental, emotional, and behavioral disorders affecting children's social-emotional development and increasing the risk of adverse behavioral outcomes later in life. However, diagnosing mental health disorders in early childhood remains challenging. Caregivers are often unable to accurately identify signs of problematic behavior, and many lack access to specialized screening services. Digital biomarkers from passively sensed signals collected using smartphones and wearable devices have shown remarkable promise for mental health screening at scale. Nevertheless, such digital mental health tools are yet to make a significant mark in pediatric settings. While this may partly be driven by caregivers' perspectives toward such tools, the fact that children rarely tend to be independent users of mobile and wearable devices is also a key deterrent to developing scalable digital biomarkers of mental health in younger populations. In this thesis, I attempt to bridge this pediatric mental health diagnosis gap by developing novel digital tools that enable screening for problem behaviors in a convenient and scalable manner. These screening tools leverage multimodal signals that can be recorded using ubiquitous devices in the home while children are engaged in brief, clinically validated play-based interactions. I establish the technical feasibility of developing machine learning models to detect interaction-based biomarkers of attention-deficit/hyperactivity, disruptive behavior, and other externalizing disorders using behavioral (audio, video) and physiological (heart rate, electrodermal activity) signals. I incorporate these biomarkers into three new home-based assessments that can be realized using off-the-shelf mobile and wearable devices to predict not just behavioral symptoms but also their neurophysiological underpinnings, thus providing richer insight into the trajectories of early problem behaviors. To facilitate the integration of these next-generation screening tools into existing mental healthcare ecosystems, I further outline design recommendations for such tools by distilling findings from stakeholder studies involving parents and child mental health practitioners. This work thus sets the stage for ubiquitous technologies that can obtain rich, multidimensional data in the wild and enable screening for early childhood mental health concerns at scale.
Leveraging Explanations for Information Retrieval Systems under Data Scarcity
(2024-09) Yu, Puxuan
The importance of explanations in the advance of information retrieval (IR) systems is on the rise. On one hand, this is driven by the increasing complexity of IR systems and the demand for transparency and interpretability from users; on the other hand, explanations can inherently improve the effectiveness of IR systems without necessarily being displayed to users. However, the scarcity of data poses significant challenges in developing these explanations, as acquiring high-quality explanations for relevance judgments is prohibitively expensive yet crucial for training neural network-based IR models and explanation generation models. To overcome these challenges, we utilize open-domain knowledge and generative language models to facilitate the generation of user-oriented explanations for various IR tasks limited by data availability. We start by introducing a novel model-agnostic task for search result explanations that emphasizes context-aware summaries, detailing each document's relevance to the query and other documents. To address this task, we design a novel Transformer-based encoder-decoder architecture. Next, we develop an inherently explainable IR model specifically designed to provide diversified reranking of retrieved documents. This model is pre-trained on open-domain data using explanation tasks, achieving state-of-the-art results in search result diversification with minimal domain-specific data. Additionally, we explore how natural language explanations can enhance the capabilities of generative language models to augment IR datasets through synthetic query generation, achieved by automatically identifying similarities and differences between document pairs. Finally, we utilize zero-shot generative language models to directly elicit natural language explanations of relevance between search queries and candidate documents, providing crucial auxiliary information for the calibration of neural ranking models and thus enhancing their ability to generate meaningful scores.
Context-Aware Query and Document Representation in Information Retrieval Systems
(2024-09) Naseri, Shahrzad
Input representation has a major impact on the effectiveness of Information Retrieval (IR) systems. Further, developing a context-aware input representation for IR systems is crucial to answering user's complicated information need. The goal of this work is to take advantage of the \textit{contextual features} to represent the query and document to enhance the information retrieval systems performance. We focus on three sources of \textit{contextual} features: 1. Entities, defined as things or concepts that exist in the world; 2. Context within pseudo-relevant feedback document in IR systems; and 3. Context within example documents provided by user as the IR system's input. We first introduce a dense entity representation based on the relationships between an entity and other entities described within its summary. We explore its use in the entity ranking task by representing both queries and documents using this model. By integrating this ranking methodology with a term-based ranking method, we achieved statistically significant improvements over the term-based ranking approach. Further, we developed a retrieval model that merges term-based language model retrieval, word-based embedding ranking, and entity-based embedding ranking, resulting in the best performance. Additionally, we introduce an entity-based query expansion framework employing local and global entity knowledge sources; i.e. corpus-based indexed entities and the summary-expanded entity embedding. Our results demonstrate our entity-based expansion framework outperforms the learned combination of word-based expansion techniques. Then we focus on leveraging the context of pseudo-relevance feedback documents (PRF) for ranking relevant terms to the user's query. To achieve this, we utilize transformer models, which excel at capturing context through their attention mechanisms, and expand the query with top-ranked terms. We propose both unsupervised and supervised frameworks. Our unsupervised model employs transformer-generated embeddings to calculate the similarity between a term (from a PRF document) and the query, while considering the term's context within the document. Our results demonstrate that this unsupervised approach outperforms static embedding-based expansion models and performs competitively with state-of-the-art word-based feedback models, relevance model variants, across multiple collections. The supervised framework approaches query expansion as a binary classification task, aiming to identify terms within the PRF documents relevant to the query. We utilize transformer models in a cross-attention architecture to predict relevancy scores for candidate terms. This supervised approach yields performance comparable to term frequency-based feedback models, relevance model variant. Moreover, combining it with the relevance model results in even greater improvement than either model used independently. Finally, we concentrate on leveraging the context of the example documents provided by the user in the query-by-example retrieval problem to formulate a latent query that represents the user's information needs. We construct three query-by-example datasets and develop several transformer-based re-ranking architectures. Our Passage Relevancy Representation by Multiple Examples (PRRIME) overcomes BERT's context window limitations by segmenting query example and candidate documents into passages. It then trains an end-to-end neural ranking architecture to aggregate passage-level relevance representations, demonstrating improvement over the first-stage ranking framework. Additionally, we explore a cross-encoder reranking architecture using the Longformer transformer model for query-by-example retrieval, aiming to capture cross-text relationship, particularly aligning or linking matching information elements across documents. This shows statistically significant improvement on the test set of the dataset which it is trained on but performs not as well as the baseline on the other two datasets which have limited fine-tuning data, indicating limited knowledge transferability. Finally, we investigate a dual-encoder reranking architecture that learns query and document representations through an auxiliary training paradigm. It uses query prediction as an auxiliary task alongside the ranking objective as the main task. It outperforms both the initial retrieval stage and the single-loss training method - i.e training the dual encoders solely with a ranking objective.
Advancing Acoustic Sensing from the Laboratory to Real World: Theories, Applications, and Practical Considerations
(2024-09) Li, Dong
With the proliferation of voice assistants, speakers and microphones are essential components in billions of smart devices that people interact with on a daily basis, such as smartphones, smart watches, smart speakers, home appliances, etc. This dissertation explores the transformation of these devices from simple audio tools into sophisticated acoustic radars, expanding their applications beyond basic audio playback and voice interactions to include gesture tracking, vital sign monitoring, and eye blink detection. We address fundamental technical challenges and practical considerations, which not only resolve existing system limitations but also facilitate the creation of new applications. One major challenge in acoustic sensing is tracking multiple targets simultaneously due to the inherent nature of contact-free tracking. Signals reflected from multiple targets are mixed at the microphone, and thus, it is difficult to separate them to obtain the context information of each individual target. FM-Track pioneers in enabling contact-free multi-target tracking using acoustic signals. A signal model is introduced to characterize the location and motion status of targets by fusing the information from multiple dimensions (i.e., range, velocity, and angle of targets). Then a series of techniques are developed to separate signals reflected from multiple targets and accurately track each individual target. FM-Track can successfully differentiate two targets with a spacing as small as 1 cm. Another significant challenge for acoustic sensing is the extremely limited sensing range, particularly for fine-grained activities due to weak signal reflections. LASense dramatically increases the sensing range for fine-grained human activities by introducing a virtual transceiver idea that purely leverages delicate signal processing techniques in software. LASense can significantly increase the sensing range of respiration monitoring from the state-of-the-art 2 m to 6 m, and enhance the sensing range for finger tapping and eye blink detection by 150% and 80%, respectively. Additionally, this dissertation demonstrates how to apply acoustic sensing techniques to enable new applications, i.e., “listening” to your hand gestures using smart speakers. In SpeakerGesture, we develop a series of novel signal processing techniques and implement our system on two commodity smart speaker prototypes. SpeakerGesture can achieve over 90% accuracy in gesture recognition even when the user is 4 m away from the smart speaker and there is strong interference. At last, this dissertation shares the experience and findings when transitioning acoustic sensing systems from laboratory settings to real-world environments. We identify multiple practical considerations that were not paid attention to in the research community and propose the corresponding solutions. The challenges include: (i) there exists annoying audible sound leakage caused by acoustic sensing; (ii) acoustic sensing actually affects music play and voice calls; (iii) acoustic sensing consumes a significant amount of power, degrading the battery life; (iv) real-world device mobility can fail acoustic sensing.
Modeling Cross-Lingual Knowledge in Multilingual Information Retrieval Systems
(2024-09) Huang, Zhiqi
In many search scenarios, language can become a barrier to comprehensively fulfilling users’ information needs. An Information Retrieval (IR) system equipped with an extra component of language translation is capable of mapping words in different languages, enabling it to retrieve documents according to the user's query regardless of the language in which the query and documents are expressed. Effectively incorporating multilingual knowledge is the key to building the translation component. Such knowledge can be obtained from dictionaries, machine translation modules, or multilingual pre-trained language models. For these different forms of multilingual knowledge, we present cross-lingual knowledge injection, transfer, and language debiasing techniques to enhance the effectiveness of Cross-lingual Information Retrieval (CLIR) and Multilingual Information Retrieval (MLIR). Specifically, by utilizing multilingual knowledge at various levels—from individual word translations to parallel and non-parallel corpora—we develop new model architectures and training goals tailored for information retrieval tasks across diverse linguistic settings. First, we introduce a mixed attention Transformer layer, which augments mutually translated words between query and document into the attention matrix and investigates its effectiveness on CLIR tasks. Next, we study cross-lingual transfer in the IR models and demonstrate a knowledge distillation framework to address the data scarcity problem in model training and improve retrieval effectiveness involving low-resource languages. Then, we focus on a special setting in MLIR, where the query is in one language, and the collection is a mixture of languages. To address the problem of inconsistent ranking results between languages, we design an encoder-decoder model that maps document representations from different languages into the same embedding space. We also present a decomposable soft prompt to capture unique and shared properties across languages. Finally, we introduce a language debiasing method to identify and remove linguistic features from a multilingual embedding space. This approach significantly diminishes the necessity for parallel data in constructing MLIR models, allowing for using non-parallel data instead. By reducing language-specific factors from the training process, we improve the retrieval effectiveness for all linguistic settings in retrieval tasks (e.g., monolingual, cross-lingual, and multilingual), thereby facilitating language-agnostic information retrieval.
Optimization with Intrinsic Diversity: Towards Generalizable, Safe, and Open-ended Learning
(2024-09) Ding, Li
Building general-purpose artificial intelligence (AI) systems and safely aligning them with human values remains a critical, unsolved problem in AI. A significant challenge in this domain is the inherent diversity of human thoughts and demands, reflecting the complexities of the real world. This diversity is often not adequately captured in existing optimization processes, which typically aim to optimize aggregated objectives or average human preferences. This dissertation investigates intrinsic mechanisms for integrating diversity into optimization. First, we introduce Gradient Lexicase Selection and Probabilistic Lexicase Selection to promote diversity in goal-oriented tasks, enhancing model generalization and efficiency. Second, we address diversity in human preferences with Pareto Optimal Preferences Learning (POPL), a reinforcement learning from human feedback (RLHF) framework that learns policies and reward models catering to distinct groups, ensuring safe and fair alignment of AI agents. Finally, we propose Quality Diversity through Human Feedback (QDHF), a novel approach that learns notions of diversity from human judgment of difference to simultaneously optimize for quality and novelty, thereby enhancing the creativity and user satisfaction of model responses in open-ended generative tasks.
Exploiting Structures in Interactive Decision Making
(2024-09) Cao, Tongyi
In this thesis we study several problems in interactive decision making. Interactive decision making plays an important role in many applications such as online advertisement and autonomous driving. Two classical problems are multi-armed bandits and reinforcement learning. Here and more broadly, the central challenge is the \emph{exploration-exploitation} tradeoff, whereby the agent must decide whether to explore uncertain actions that could potentially bring high reward or to stick to the known good actions. Resolving this challenge is particularly difficult in settings with large or continuous state and action spaces. For reinforcement learning, function approximation is a prevalent structure to manage large state and action spaces. However, misspecification of the function classes can have a detrimental effect on the statistical outcomes. These structured settings are the focus of this thesis. First we study the combinatorial pure exploration problem in the multi-arm bandit framework. In this problem, we are given $K$ distributions and a collection of subsets $\Vcal \subset 2^{[K]}$ of these distributions, and we would like to find the subset $v \in \Vcal$ that has largest mean, while collecting, in a sequential fashion, as few samples from the distributions as possible. We develop new algorithms with strong statistical and computational guarantees by leveraging precise concentration-of-measure arguments and a reduction to linear programming. Second we study reinforcement learning in continuous state and action spaces endowed with a metric. We provide a refined analysis of a variant of the algorithm of Sinclair, Banerjee, and Yu (2019) and show that its regret scales with the \emph{zooming dimension} of the instance. Our results are the first provably adaptive guarantees for reinforcement learning in metric spaces. Finally, we study a more fundamental problem of \emph{distribution shift}, where training and deployment conditions for a machine learning model differ. We study the effect of distribution shift in the presence of model misspecification, specifically focusing on $L_{\infty}$-misspecified regression and \emph{adversarial covariate shift}, where the regression target remains fixed while the covariate distribution changes arbitrarily. We develop a new algorithm---inspired by robust optimization techniques—that avoids misspecification amplification while still obtaining optimal statistical rates. As applications, we use this regression procedure to obtain new guarantees in offline and online reinforcement learning with misspecification and establish new separations between previously studied structural conditions and notions of coverage.
Improving Variational Inference through Advanced Stochastic Optimization Techniques
(2024-09) Burroni, Javier
Black-box variational inference (VI) is crucial in probabilistic machine learning, offering an alternative method for Bayesian inference. By requiring only black-box access to the model and its gradients, it recasts complex inference tasks into more manageable optimization problems, aiding in the approximation of intricate posterior distributions across a wide range of models. However, black-box VI faces a fundamental challenge: managing the noise introduced by using stochastic gradient optimization methods, which limits efficient approximations. This thesis presents new approaches to enhance the efficiency of black-box VI by improving different aspects of its optimization process. The first part of this thesis focuses on the importance-weighted evidence lower bound (IW-ELBO), an objective used in the VI optimization problem. The IW-ELBO, by incorporating importance sampling, augments the expressive power of the approximating distributions used in VI. However, it also introduces increased variance in gradient estimation, complicating the optimization process. To mitigate this, our thesis applies the theory of U-statistics, an approach that significantly reduces variance. Since fully implementing U-statistics can be impractical due to exponential growth in computation, we introduce approximate methods that effectively reduce variance with minimal computational overhead. The second part of this thesis addresses a central issue within black-box VI: its stochastic optimization process, i.e., Stochastic Gradient Descent or its variations, is highly sensitive to user-specified hyperparameter choices, often leading to poor results. We address this issue by introducing an algorithm specifically designed for VI, based on the sample average approximation (SAA). This method, SAA for VI, transforms the stochastic optimization task into a sequence of deterministic problems that can be easily solved using standard optimization techniques. As a result, it simplifies and automates the optimization process, reduces the burden of hyperparameter tuning, and exhibits robust performance, particularly in complex statistical models involving hundreds of latent variables. In the third part of this thesis, we shift our focus from the objective and optimization process to the approximating distributions used in VI and their gradient estimation. Specifically, we explore how to use reparameterization---a key technique in black-box VI---for mixture distributions. Due to the discrete nature of choices involved in sampling from mixture distributions, the standard reparameterization trick is not directly applicable. Although prior work has proposed several gradient estimators that use some form of reparameterization, there remains a noticeable lack of clarity regarding which estimators are available, in which contexts they are applicable, and how they compare. To address this gap, we introduce and evaluate the most relevant gradient estimators for mixture distributions using a consistent mathematical framework and, through this framework, we extend existing estimators to new settings. We then give a comprehensive performance comparison of different estimators---theoretically, where we can sometimes compare variance, and empirically, where we assess the estimators across different setups. Finally, we address the often overlooked computational aspect by introducing novel, efficient algorithms for some of the estimators. This thesis contributes to both the theoretical understanding and practical implementation of VI. By introducing new methods and techniques, we aim to enhance the accuracy and efficiency of VI and broaden its applicability.
Controllable Personalization for Information Access
(2024-09) Mysore, Sheshera
Information access systems mediate how we find and discover information in nearly every walk of life. The ranking models powering these systems base their predictions on users' historical interactions to cater to the wide variety of users and workflows that leverage them. However, during a task session, personalized predictions often fall short of user's expectations, with users desiring greater control over a system. Greater control, in turn, leads to greater user trust and satisfaction in using a system. In this thesis, I explore methods to dynamically update personalized rankings through user interaction with ranking models. I explore control in various retrieval tasks through (1) expressive natural language queries, (2) control over latent user representations, and (3) control over both queries and latent user representations. First, I explore long-form narrative queries as a way for users to express rich context-dependent preferences in a narrative-driven recommendation (NDR) task. Here, I propose MINT – a data augmentation strategy leveraging LLMs to generate long-form narrative queries from historical user interactions to allow the training of effective NDR models. Next, I propose LACE, a text recommendation model that represents users with a transparent concept-based user profile inferred from historical user documents. The concepts function as an interpretable bottleneck within a neural recommender, allowing users to control the underlying model. To allow control through queries and latent user representations, I introduce CtrlCE, a controllably personalized cross-encoder which leverages the concept-value profiles introduced in LACE. Specifically, I treat concept-value profiles as editable memories of a user's historical documents and augment a transformer cross-encoder with these memories. Allowing cross-encoders to condition on large amounts of user data while allowing users effective control over personalization. Further, in augmenting cross-encoders with editable memories I train a calibrated mixing model to combine non-personalized query-document scores with personalized user-document scores and only solicit user input when necessary. Finally, having leveraged concept-value memories as a user representation for controllable personalization, I explore such a corpus representation as an interactive topic model, introducing EdTM. I show EdTM to support a variety of user interactions, scale to large corpora, and effectively leverage expressive LLM scoring functions opening the possibility of a wider variety of user control mechanisms over personalized information access tasks in future work.
Graph Properties from Restricted Information
(2024-05) Sengupta, Rik
There are several settings in theoretical computer science where we gain some form of limited access to an unknown underlying graph (typically through a subgraph), and we try to infer something fundamental about the entire graph. This question of learning a structure from partial information is the cornerstone of many active areas of research. In this thesis, we study two very different models for learning graph properties. We show some surprising commonalities and differences in these two settings. In the first, probabilistic setting, we ask: suppose we have a fixed, unknown graph, which is passed through a ``noisy'' channel, that deletes the vertices with some uniform probability, deletes (or flips) the remaining edges with some uniform probability, and returns the resulting subgraph. How many such samples (or ``traces'') would we need to reconstruct the original graph? We show several results in this setting, including separations between random and arbitrary graphs, and vertex and edge deletions, even when all but an o(1)-fraction of the graph disappears in every trace. In the second, deterministic setting, we ask: how can we identify graphs efficiently using logical characterizations? This question is inextricably tied with complexity theory. A technique we have for showing lower bounds in these realms is by means of two-player logical games, where two perfect players (called ``Spoiler'' and ``Duplicator'') play pebble games on graphs and can only argue about the restricted ``views'' given by the induced subgraphs on those pebbles. We generalize the known games into a flexible framework characterizing any reasonable syntactic measure; we also play these games on canonical ordered structures such as linear orders and strings, a notoriously difficult endeavor historically. Here we prove some new bounds, some of which we show to be optimal. Despite being from quite distinct areas of research, the two problems show several notable similarities. They both ask about the fundamental structures of graphs, while having access only to partial ``views'' of the graph. They both rely on distinguishing a given pair of graphs as the main subroutine, and thereafter draw conclusions about identifying entire classes of graphs. They both exploit isomorphism testing as an intermediate subroutine. They both exhibit bottlenecks in the form of ``near-isomorphisms'' that are not true isomorphisms. And finally, they both show a remarkable phenomenon in which there being an inherent *ordering* in the universe makes the question provably easier than there being no such ordering.
Exploring Human-Centered AI Storytelling
(2024-05) Akoury, Nader S
Large language models (LLMs) have ushered in a multitude of new language generation capabilities, bringing AI-guided storytelling much closer to reality. Authors can now meaningfully engage with AI writing assistants to help during the story writing process. These same capabilities have also enabled the ability to add more diverse and complex interactions for narrative-driven role-playing games. Though due to the memory and processing that LLMs require, efficient inference techniques are needed as video games are real-time systems with demanding performance requirements. In this thesis I explore two overarching lines of inquiry: (1) What is the user experience with systems designed for language generation in storytelling and video games? and (2) How can we improve these systems to address limitations that adversely affect user experience? I investigate these questions through the lens of AI story writing assistants and video game dialogue systems. In Chapter 2 I discuss my explorations using LLMs as AI story writing assistants on the online collaborative writing platform Storium, where real authors on the Storium platform can query a model for suggested story continuations. Then in Chapter 3, I extract dialogue from the widely-acclaimed role-playing game Disco Elysium: The Final Cut, which contains 1.1M words of dialogue spread across a complex graph of utterances where node reachability depends on game state. Using a reusable research artifact — a web app that recreatse the dialogue system from the game — I explore real players’ experiences interacting with the game augmented by LLM-generated dialogue. In a natural follow-up, in Chapter 4 I examine how to enhance the player experience, while maintaining game’s existing structure, by introducing a virtual game master (GM) that allows players to type their desired response in a conversation, rather than choose from a set of pre-written options. To address the response time considerations of these real-time systems, in Chapter 5 I investigate efficient inference for the Transformer architecture by incorporating linguistic features into the decoding process. I conclude with Chapter 6, where I consider promising future directions for improving the virtual GM and ways for integrating LLMs into the video game dialogue writing process.
Retrieval Augmented Representation Learning for Information Retrieval
(2024-05) Hashemi, Helia
Information retrieval (IR) is a scientific discipline within the fields of computer and information sciences that enables billions of users to efficiently access the information they need. Applications of information retrieval include, but are not limited to, search engines, question answering, and recommender systems. Decades of IR research have demonstrated that learning accurate query and document representations plays a vital role in the effectiveness of IR systems. State-of the-art representation learning solutions for information retrieval heavily rely on deep neural networks. However, despite their effective performance, current approaches are not quite optimal for all IR settings. For example, information retrieval systems often deal with inputs that are not clear and self-sufficient, e.g., many queries submitted to search engines. In such cases, current state-of-the-art models cannot learn an optimal representation of the input or even an accurate set of all representations. To address this major issue, we develop novel approaches by augmenting neural representation learning models using a retrieval module that guides the model towards learning more effective representations. We study our retrieval augmentation approaches in a diverse set of somewhat novel and emerging information retrieval ap plications. First, we introduce Guided Transformer—an extension to the Transformer network that adjusts the input representations using multiple documents provided by a retrieval module—and demonstrate its effectiveness in learning representations for conversational search problems. Next, we propose novel representation learning models that learn multiple representations for queries that may carry multiple intents, including ambiguous and faceted queries. For doing so, we also introduce a novel optimization approach that enables encoder-decoder architectures to generate a per mutation invariant set of query intents. Furthermore, we study retrieval-augmented data generation for domain adaptation in IR, which concerns applying a retrieval model trained on a source domain to a target domain that often suffers from unavailability of training data. We introduce a novel adaptive IR task, in which only a textual description of the target domain is available. We define a taxonomy of domain attributes in information retrieval to identify different properties of a source domain that can be adapted to a target domain. We introduce a novel automatic data construction pipeline for adapting dense retrieval models to the target domain. We believe that the applications of the developed retrieval augmentation methods can be expanded to many more real-world IR tasks.
Unlocking Natural Language Generalization with Adaptive Retrieval-based Methods
(2024-05) Drozdov, Andrew
Progress in large language model (LLM) training and inference has contributed to the emergence of ``generative retrieval'' as a new sub-topic in the field of artificial intelligence. Generative retrieval encapsulates a family of methods that leverages the strengths of generative language models for information retrieval applications. These methods have particular utility when embedded in natural language interfaces such as conversational chatbots, which represent an extreme diversity of fine-grained tasks that require both the ability for models to quickly adapt and to generate fluent and relevant responses. In this dissertation, I propose three general methods to further advance the capabilities of generative retrieval systems:

1. I introduce a method for effective adaptation of large language models for retrieval through in-context learning. This technique leverages task-specific demonstrations to quickly learn to rank candidate passages. The criteria for demonstration selection is based on ``demonstration difficulty'' and is inspired by gradient-based learning, where difficult and informative data points often lead to higher magnitude gradients.

2. Generative retrieval enables a massive variety of tasks, including retrieval over structured data. Inspired by previous methods for learning compositional structure with recursive computation, I develop a novel extension of least-to-most prompting that dynamically selects demonstrations to cover the many aspects of the input query. This novel approach leads to state-of-the-art results on a challenging compositional generalization benchmark translating text to a SQL-like query language.

3. Retrieving relevant documents from an external datastore is an effective way for language models to automatically ground their predictions externally rather than solely rely on their internal memory. I design an adaptive algorithm that discards distracting or irrelevant documents, and more heavily weights the influence of relevant text. The more precise usage of the datastore leads to state-of-the-art performance on a language modeling benchmark for generating encyclopedic text.

4. Throughout retrieval augmented generation (RAG) many atomic facts are generated that pertain only to a subset of retrieved passages, leading to inefficient usage of the limited prompt context. I introduce a Retrieval-Driven Memory Manager (ReDMM) for RAG that adaptively selects which passages to include at each step of generation, bypassing context length limits. ReDMM is particularly helpful for generating complex answers as measured by a suite of benchmarks for long-form question answering.

I demonstrate that these methods address limitations of previous generative retrieval systems and provide a path forward for more effective language model use.
Efficient k-Nearest Neighbor Search with Black-Box Neural Similarity Functions
(2024-05) Yadav, Nishant
The goal of k-nearest neighbor (k-NN) search is to ﬁnd the top-k similar items for a query under a given similarity function. k-NN search is a widely-used sub-routine in search, recommendation, question-answering and many other applications in machine learning, and information retrieval systems to improve performance and robustness of models and adapt models to new domains. For many of these applications, the state-of-the-art query-item similarity models are black-box neural similarity functions such as cross-encoders that jointly encode a query-item pair to directly output a scalar similarity. However, unlike vector-based similarity functions (e.g., inner product), computing a single query-item score using cross-encoders can be computationally expensive as cross-encoders are typically parameterized using deep neural models such as transformers. For this reason, existing approaches perform k-NN search with cross-encoders using (heuristic) retrieve-and-rerank approaches that perform retrieval with a separate model (such as dual-encoder or BM25) followed by re-ranking using the cross-encoder. In this thesis, we propose efﬁcient matrix factorization-based approaches for ﬁtting query and item embeddings to approximate cross-encoder scores for query-item pairs, and use the approximate scores to perform k-NN search with cross-encoders. First, we introduce, ANNCUR, a CUR-decomposition-based method that computes latent item embeddings by comparing each item with a set of anchor queries. At test-time, ANNCUR computes test query embedding by comparing the test query with only a small set of anchor items chosen uniformly at random and performs retrieval using the approximate scores computed using dot-product of query and item embeddings. We next propose ADACUR that further improves the test-time recall-vs-cost trade-offs by comparing the test query with an incrementally and adaptively chosen set of anchor items, conditioned on the test query. The indexing step for ANNCUR and ADACUR computes a dense matrix by scoring all items against a set of anchor/train queries. With the goal of reducing the indexing complexity, in order to scale to millions of items, we propose methods based on factorization of sparse matrices containing cross-encoder scores between a set of train queries and all the items. Our proposed methods are signiﬁcantly more efﬁcient than CUR-based approaches at indexing the set of items and allow for efﬁciently leveraging existing dual-encoder models while avoiding expensive distillation-based training of dual-encoders. We perform k-NN search for a given query using the approximate scores with an adaptive search strategy that performs retrieval over multiple rounds and uses the feedback on retrieved items to improve the cross-encoder approximation and hence retrieval in subsequent rounds. Empirically, our proposed approaches provide signiﬁcant improvements in recall-vs-latency tradeoffs over existing retrieve-and-rerank pipelines on zero-shot entity linking and information retrieval benchmarks. In the ﬁnal chapter, we propose various loss functions and strategies for training cross-encoder models to improve k-NN search performance of the proposed methods by making the resulting cross-encoder score matrix easier to approximate without affecting the accuracy of the cross-encoder similarity on downstream tasks. We also use proposed k-NN search methods to dynamically sample negative items/examples while training cross-encoders to improve the robustness of cross-encoder models on downstream tasks.
FAST LINEAR ALGEBRA FOR GAUSSIAN PROCESSES
(2024-05) Yadav, Mohit
In machine learning, uncertainty calibration, and prediction interpretability are crucial. Gaussian processes (GPs) are widely recognized for their ability to model uncertainty and interoperability. However, their practical application is often limited by the computational intensity of operations like matrix inversion and determinant computation, which scale cubically with the number of data points. This thesis focuses on developing fast algorithms to tackle this computational challenge for GPs and enhance their scalability for large-scale datasets.

The first two chapters focus on the structured kernel interpolation (SKI) framework, which interpolates the kernel matrix using a dense grid in the input space and exploits iterative algorithms to accelerate GPs. First, we present a novel and fast iterative algorithm for performing GP inference within the SKI framework. After an initial linear computation in the number of data points, we show that the remaining computation sequence scales independently of the dataset size. Our method speeds up GP inference for several low-dimensional datasets.

Unfortunately, SKI’s scalability diminishes as the grid size exponentially increases with increasing input point dimensions. To mitigate this, we integrate sparse grids with the SKI framework, owing to their accurate interpolation and size growing more slowly than dense grids as the number of dimensions rises. Next, we introduce a novel matrix-vector multiplication algorithm for sparse grid kernel matrices, improving SKI’s scalability to higher dimensions. For example, we can scale GP inference to eleven dimensions with over five million points.

The final chapter explores GPs in bandit algorithms for optimizing the ranking of top-k items on platforms like online marketplaces and search engines. We introduce a contextual bandit algorithm using GPs with Kendall kernels, which sidesteps the restrictive assumptions typically required for reward feedback, addressing the challenge of many options. Additionally, we develop a quick algorithm for linear algebraic operations on the kernel matrix for top-k rankings, utilizing a sparse representation of the Kendall kernel. This method reduces inference time, leading to faster bandit algorithms with reduced latency.
Sublinear Algorithms for Matrices: Theory and Applications
(2024-05) Ray, Archan
Matrices are ubiquitous mathematical structures that arise throughout computer science. We study fast algorithms for several central problems involving matrices, including eigenvalue approximation, spectral approximation, and low-rank approximation. In particular, we focus on sublinear time or sublinear query algorithms that can scale to very large matrices. We focus on developing algorithms with theoretical bounds and demonstrate the applicability of these algorithms on real-world datasets. We first present a simple randomized algorithm for approximating textit{all} the eigenvalues of a bounded-entry matrix to a small additive error by querying a small random submatrix of the input matrix. Next, we give the first sublinear query deterministic algorithms that can approximate any symmetric matrix $mathbf{A}$ in the spectral norm -- i.e., that output $tilde{mathbf{A}}$ where $|mathbf{A}-tilde{mathbf{A}}|_2$ is bounded. Using this result, we give the first deterministic algorithm that can approximate all the singular values of any symmetric matrix to small additive error in faster than matrix multiplication time. We further extend the above results by improving the query complexity in the case when $mathbf{A}$ is positive semidefinite (PSD) with entries in ${-1,0,1}$. Then we empirically analyze eigenvalue approximation of symmetric matrices using various matrix-vector query algorithms. We explore the trade-offs between adaptivity and query complexity of such algorithms. This study complements our work on algorithms that read a sublinear number of entries of the input matrix. Finally, we present a generalization of the Nyström method for low-rank approximations of general symmetric matrices. We conduct a comparative study of this generalized Nyström method and other sublinear algorithms for computing low-rank approximations of similarity matrices arising in natural language processing (NLP) tasks.
Towards Effective Modeling of Long-range Context
(2024-05) Sun, Simeng
At the core of recent advancements in natural language processing are language models, which are trained to predict the next token given the preceding context. Recent developments in deep learning has led to the efficient scaling of context window in Transformer-based language models. Despite the progress, these models still exhibit severe limitations when tackling long-context tasks, such as book-level summarization and long-document question answering. While the context window size has been continuously increasing, there is a lack of understanding on how these models utilize long-range context, or context that spans at least several thousand tokens. As such, we first provide an analysis of long-range context modeling with both perplexity and segment-level task evaluations. Our results show that perplexity, the most commonly used intrinsic metric for language model evaluation, may obscure the evaluation of long-range context modeling. In contrast, segment-level evaluation, which involves computing the probability of a sequence of tokens rather than a single token as done in perplexity, proves to be a more suitable method for evaluating long-range context modeling. Based on this finding, we enhance the segment-level evaluation by proposing a challenge dataset ChapterBreak, and demonstrate that SuffixLM, a model trained with segment-level signals, outperforms the standard token-level language model in this task. The limited context modeling capability prompts us to investigate new ways to improve recent large language models. To this end, we first develop a prompting framework, PEARL, by leveraging large instruction fine-tuned language models to decompose complex reasoning into executable plans. We demonstrate the efficacy of PEARL on a subset of the long-document QA dataset, where the correct answer depends on the long-range context instead of a short excerpt. Our second approach builds on the benefits of modeling context at the segment level. Concretely, we propose a new training method, SuffixRL, by fine-tuning a token-level language model directly using segment-level signals. We show that training models with SuffixRL leads to more natural and coherent continuations in an open-ended generation setting. Finally, we conclude this thesis by identifying seven concrete topics that hold promise for future exploration. We hope this thesis can spur more principled research in long-context modeling.

Browse

Recent Submissions