Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.

Author ORCID Identifier



Open Access Dissertation

Document Type


Degree Name

Doctor of Philosophy (PhD)

Degree Program

Computer Science

Year Degree Awarded


Month Degree Awarded


First Advisor

Beverly Park Woolf

Subject Categories

Artificial Intelligence and Robotics | Computational Linguistics | Social Media | Statistical Models


We now live in an age of online communication. As social media becomes an integral part of our life, online communication becomes an essential life skill. In this dissertation, we aim to understand how people effectively communicate online. We research components of success in online communication and present scientific methods to study the skill of effective communication. This research advances the state of art in machine learning and communication studies. For communication studies, we pioneer the study of a communication phenomenon we call Communication Intelligence in online interactions. We create a theory about communication intelligence that measures participants’ ten high-order communication skills, including restraint, self-reflection, perspective taking, and balance. We present a multi-perspective analysis for understanding communication intelligence, including its diverse language, shared linguistic characteristics across people, social dynamics, and the effects of communication modality on communication intelligence. For machine learning, we contribute new computational models and formulations for addressing multi-label and multi-task machine learning problems. We develop a new hierarchical probabilistic model for simultaneously identifying multiple intelligence-embodied communication skills from natural language. The model learns the topic assignment for each sentence and provides a practical and simple way to determine document labels without relying on a threshold function. The model performance increases as the number of labels grows, which makes it a promising approach for large-scale data analysis. We also develop a new multi-task formulation for simultaneously identifying multiple intelligence-embodied communication skills from lexical, discourse, and interaction features. The key merit of this model is that it is a general multi-task formulation that unifies many widely used regularization techniques, including Lasso, group Lasso, sparse-group Lasso, and the Dirty model. This model expands the applicability of multi-task learning by allowing analyzing real-world problems where the degree of task relatedness is uncertain and the true structure of the groups in data is not clear ahead of time. Moreover, it can be applied to streaming data to perform large-scale analysis in real time. Beyond the application of studying communication intelligence, the developed models and formulations can also benefit research in other areas where the problems of simultaneously predicting multiple categories are abundant.