Permanent URI for this collection
Browse
Recent Submissions
Publication InfoExtractor – A Tool for Social Media Data Mining(2011-01-01) Shah, Chirag; File, CharlesWe present InfoExtractor, a web-based tool for collecting data and metadata from focused social media content. InfoExtractor then provides this data in various structured and unstructured formats for easy manipulation and analysis. The tool allows social science researchers to easily collect data for quantitative analysis, and is designed to deliver data from popular and influential social media sites in a useful and easy to access way. InfoExtractor was designed to replace traditional means of content aggregation, such as page scraping and brute- force copying.Publication Facilitating Encounters with Political Difference: Engaging Voters with the Living Voters Guide(2011-01-01) Freelon, Deen G.; Kriplean, Travis; Morgan, John; Bennett, W. Lance; Borning, AlanUnlike 20th-century mass media, the Internet requires self-selection of content by its very nature. This has raised the normative concern that users may opt to encounter only political information and perspectives that accord with their preexisting views. This study examines the different ways that voters appropriated a new, purpose-built online engagement platform to engage with a wide variety of political opinions and arguments. In a deployment aimed at helping Washington state citizens make their 2010 election decisions, we find that users take significant advantage of three key opportunities to engage with political diversity: reading, acknowledging, and writing arguments on both sides of various policy proposals. Notably, engagement with each of these forms of participation drops off as the required level of commitment increases. We conclude by discussing the implications of these results as well as directions for future research.Publication An automated snowball census of the political web(2011-01-01) Gong, AbeThis paper solves a persistent methodological problem for social scientists studying the political web: representative sampling. Virtu- ally all existing studies of the political web are based on incomplete samples, and therefore lack generalizability. In this paper, I combine methods from computer science and sampling theory to conduct an automated snowball census of the political web and constructs an all- but-complete index of English political websites. I check the robust- ness of this index, use it to generate descriptive statistics for the entire political web, and demonstrate that studies based on ad hoc sampling strategies are likely to be biased in important ways. In future research, this bias can be eliminated by using this index as a sampling universe. In addition, the methods and open-source software presented here can be used to creating similar sampling frames for other online content domains.Publication Politics 2.0 with Facebook – Collecting and Analyzing Public Comments on Facebook for Studying Political Discourses(2011-01-01) Shah, Chirag; Yazdani nia, TayebehAnalyzing publicly available content on various social media sites such as YouTube and Twitter, as well as social network sites such as Facebook, has become an increasingly popular method for studying socio-political issues. Such public-contributed content, primarily available as comments, let people express their opinions and sentiments on a given topic, news-story, or post, while allowing social and political scientists to extend their analysis of a political discourse to social sphere. We recognize the importance of Facebook in such analysis and present several approaches and observations of collecting and analyzing public comments from it. In particular, we demonstrate what it takes to do this manually, what we could learn from it, and how we can automate this process using a Facebook Harvester tool we have developed. In addition, we show how a hybrid approach can be formed giving us quick and easy data collection, and meaningful data analysis with substantially less effort than a manual approach. We believe these methods and tools will be highly valuable for political scientists in studying various political discourses as they take place in the Web 2.0 world.Publication Tradeoffs in Accuracy and Efficiency in Supervised Learning Methods(2011-01-01) Collingwood, Loren; Wilkerson, JohnText is becoming a central source of data for social science research. With advances in digitization and open records practices, the central challenge has in large part shifted away from availability to usability. Automated text classification methodologies are becoming increasingly important within political science because they hold the promise of substantially reducing the costs of converting text to data for a variety of tasks. In this paper, we consider a number of questions of interest to prospective users of supervised learning methods, which are appropriate to classification tasks where known categories are applied. For the right task, supervised learning methods can dramatically lower the costs associated with labeling large volumes of textual data while maintaining high reliability and accuracy. Information science researchers devote considerable attention to comparing the performance of supervised learning algorithms and different feature representations, but the questions posed are often less directly relevant to the practical concerns of social science researchers. The first question prospective social science users are likely to ask is — how well do such methods work? The second is likely to be — how much do they cost in terms of human labeling effort? Relatedly, how much do marginal improvements in performance cost? We address these questions in the context of a particular dataset — the Congressional Bills Project — which includes more than 400,000 labeled bill titles (19 policy topics). This corpus also provides opportunities to experiment with varying sample sizes and sampling methodologies. We are ultimately able to locate an accuracy/efficiency sweet spot of sorts for this dataset by leveraging results generated by an ensemble of supervised learning algorithms.Publication News Media Environment, Selective Perception, and the Survival of Preference Diversity within Communication Networks(2011-05-09) Liu, Frank C.S.; Johnson, Paul E.There is a natural tension between the effects on public opinion of social networks and the news media. It is widely believed that social networks tend to harmonize opinions within them, but the presence of media may accentuate diversity by inserting discordant messages. On the other hand, in a totalitarian state where the government controls the media, social networks may mitigate the homogenizing pressure of a regime’s propaganda. The tendency of opinion to follow the “official line” may be mitigated because opponents of the government interact on a personal level and bolster one another’s views. This paper employs agent-based modeling—an approach that allows researchers to observe preference change the individual, social network, and the society levels—to explore conditions under which social networks and news media influence citizens’ preferences. Citizen agents are embedded within networks of interpersonal communication and can be by influenced by widely disseminated news media. Situations such as the one where there are no news media, one with polarized news media, and one where there is only a monolithic (state controlled) media that broadcasts a single, consistent message. We also explore the role of selective perception in these conditions. The results indicate that the overall impact of news media is contingent on the variety of preferences news media provide as well as on the willingness of agents to accept media messages at face value.Publication Researching Real-World Web Use with Roxy, A Research Proxy(2011-01-01) Menchen-Trevino, Ericka; Karr, ChrisOutside of a lab environment, it has been difficult for researchers to collect both behavioral and self-reported Web-use data from the same participants. To address this challenge we created Roxy, open source software that collects real-world Web-use data with participants’ informed consent. Roxy gathers Web log data as well as the text and HTML code of each page visited by participants. We describe Roxy’s data gathering capabilities and search functions and then illustrate how we used the software in a multi-method study. The use case examines selective exposure to political communication during the November 2010 U.S. general election campaign.