Datasets
ICWSM Dataset Sharing Service
As part of the ICWSM Data Sharing Initiative, ICWSM provides a hosting service for new datasets used by papers published in the proceedings of the annual ICWSM conference. All datasets are released as openly available community resources. Please see the instructions on the registration process in order to gain access to the datasets.
Available Datasets
ICWSM-16 is the fifth year of this data sharing initiative.
Here is the list of available datasets:
- iFeel 2.0: A Multilingual Benchmarking System for Sentence-Level Sentiment Analysis
Matheus Araújo, João P. Diniz, Lucas Bastos, Elias Soares, Manoel Júnior, Miller Ferreira, Filipe Ribeiro, Fabricio Benevenuto
- Analyzing Personality through Social Media Profile Picture Choice
Leqi Liu, Daniel Preotiuc-Pietro, Zahra Riahi Samani, Mohsen E. Moghaddam, Lyle Ungar
- The News Cycle's Influence on Social Media Activity
Andrew Yates, Jonah Joselow, Nazli Goharian
- Social Media Participation in an Activist Movement for Racial Equality
Munmun De Choudhury, Shagun Jhaver, Benjamin Sugar, Ingmar Weber
- EigenTransitions with Hypothesis Testing: The Anatomy of Urban Mobility
Ke Zhang, Yu-Ru Lin, Konstantinos Pelechrinis
- Discovering Response-Eliciting Factors in Social Question-Answering: A Reddit Inspired Study
Danish, Yogesh Dahiya, Partha Talukdar
- Distinguishing between Topical and Non-Topical Information Diffusion Mechanisms in Social Media
Przemyslaw A. Grabowicz, Niloy Ganguly, Krishna P. Gummadi
- The Road to Popularity: The Dilution of Growing Audience on Twitter
Przemyslaw A. Grabowicz, Mahmoudreza Babaei, Juhi Kulshrestha, Ingmar Weber
- Expertise in Social Networks: How Do Experts Differ from Other Users?
Benjamin D. Horne, Dorit Nevo, Jesse Freitas, Heng Ji, Sibel Adali
- On the Behaviour of Deviant Communities in Online Social Networks
Mauro Coletto, Luca Maria Aiello, Claudio Lucchese, Fabrizio Silvestri
- Your Age Is No Secret: Inferring Microbloggers' Ages via Content and Interaction Analysis
Jinxue Zhang, Xia Hu, Yanchao Zhang, Huan Liu
- Twitter's Glass Ceiling: The Effect of Perceived Gender on Online Visibility
Shirin Nilizadeh, Anne Groggel, Peter Lista, Srijita Das, Yong-Yeol Ahn, Apu Kapadia, Fabio Rojas
- Fusing Audio, Textual and Visual Features for Sentiment Analysis of News Videos
Moisés H. R. Pereira, Flávio L. C. Pádua, Adriano C. M. Pereira, Fabricio Benevenuto, Daniel H. Dalip
- Science, AskScience, and BadScience: On the Coexistence of Highly Related Communities
Jack Hessel, Chenhao Tan, Lillian Lee
- Comparing Overall and Targeted Sentiments in Social Media during Crises
Saúl Vargas, Richard McCreadie, Craig Macdonald, Iadh Ounis
- "Blissfully happy" or "ready to fight": Varying Interpretations of Emoji
Hannah Miller, Jacob Thebault-Spieker, Shuo Chang, Isaac Johnson, Loren Terveen, Brent Hecht
- Message Impartiality in Social Media Discussions
Muhammad Bilal Zafar, Krishna P. Gummadi, Cristian Danescu-Niculescu-Mizil
You can find the list available datasets from ICWSM 2015 below.
- Is the Sample Good Enough? Comparing Data from Twitter's Streaming API with Twitter's Firehose
Fred Morstatter, Jrgen Pfeffer, Huan Liu, Kathleen M. Carley
- #Bigbirds Never Die: Understanding Social Dynamics of Emergent Hashtags
Yu-Ru Lin, Drew Margolin, Brian Keegan, Andrea Baronchelli, David Lazer
- Detecting Comments on News Articles in Microblogs
Alok Kothari, Walid Magdy, Kareem Darwish, Ahmed Mourad, Ahmed Taei
- A Data-Driven Analysis to Question Epidemic Models for Citation Cascades on the Blogosphere
Abdelhamid Salah Brahim, Lionel Tabourier, Bndicte Le Grand
- Properties, Prediction, and Prevalence of Useful User-Generated Comments for Descriptive Annotation of Social Media Objects
Elaheh Momeni, Claire Cardie, Myle Ott
- A Multi-Indicator Approach for Geolocalization of Tweets
Axel Schulz, Aristotelis Hadjakos, Heiko Paulheim, Johannes Nachtwey, Max Mhlhuser
- Competition and Success in the Meme Pool: A Case Study on Quickmeme.com
Michele Coscia
- Artist Popularity: Do Web and Social Music Services Agree?
Alejandro Bellogn, Arjen P. de Vries, Jiyin He
- Extracting Diurnal Patterns of Real-World Activities from Social Media
Nir Grinberg, Mor Naaman, Blake Shaw, Gilad Lotan
- Political Orientation Inference on Twitter: It's Not Easy!
Raviv Cohen and Derek Ruths
- Quantifying Political Leaning from Tweets and Retweets
Felix Ming Fai Wong, Chee Wei Tan, Soumya Sen, Mung Chiang
You can find the list of available datasets from ICWSM 2012 in the following table.
Paper | Description | # of files | # of observations (tweets/facebook accounts/entries) | # of Twitter users | # of nodes | # of edges |
Z Luo, Miles O and T Wang. Opinion Retrieval in Twitter. | Tweets tagged as relevant or irrelevant for 50 specific queries | 1 | 5000 | | | |
L Chen, W Wang, M Nagarajan, S Wang and AP Sheth. Extracting Diverse Sentiment Expressions with Target-dependent Polarity from Twitter. | Tweets describing movies and persons | 4 | 426660 | | | |
J Mahmud, J Nichols, and C Drews. Where is This Tweet From? Inferring Home Locations of Twitter Users. | Geo tagged tweets from 100 top cities | 2 | 1005259 | 19390 | | |
L Rossi and M Magnani. Conversation practices and network structure in Twitter. | Tweets about the fifth edition of the popular TV show XFactor Italia | 2 | 22287 | | | |
LM Aiello, M Deplano, R Schifanella, and G Ruffo. People are Strange when you're a Stranger: Impact and Influence of Bots on Social Networks. | Social info from anobii.com | 7 | 671585 | | 179654 | 1566369 |
J Park, M Cha, H Kim, and J Jeong. Managing Bad News in Social Media: A Case Study on Domino’s Pizza Crisis. | Tweet collection for Domino's pizza crisis | 10 | 4645 | | 6158 | 29987 |
Y He, C Lin, W Gao, and KF Wong. Tracking Sentiment and Topic Dynamics from Social Media. | Mozilla add on review data | 1 | 9300 | | | |
D O'Callaghan, M Harrigan, J Carthy, and P Cunningham. Network Analysis of Recurring YouTube Spam Campaigns. | Youtube comments data | 1 | 6466882 | | | |
P Agarwal, R Vaithiyanathan, S Sharma, and G Shroff. Catching the Long-Tail: Extracting Local News Events from Twitter. | Tweets that possibly describe a fire in a factory. | 354 | 20751 | | | |
FA Zamal, W Liu, and D Ruths. Homophily and Latent Attribute Inference: Inferring Latent Attributes of Twitter Users from Neighbors. | Tweet data for gender/age/political orientation labeled users | 9 | 233653995 | 340248 | | |
F Giglietto. If Likes Were Votes: An Empirical Study on the 2011 Italian Administrative Elections. | Facebook profile of 104 Italian politicians | 1 | 104 | | | |
Obtaining Datasets
Download and sign the ICWSM Dataset Usage Agreement. Please note that this agreement gives you access to all ICWSM-published datasets. In it, you agree not to redistribute the datasets. Furthermore, ensure that, when using a dataset in your own work, you abide by the citation requests of the authors of the dataset used.
Email the signed agreement, as a PDF file, to dataset-request@icwsm.org. In the body of your email,
- Be clear that you are requesting access to the ICWSM datasets
- Include your name,
- your email address, and
- the name of your organization.
We will respond to your request with a URL, a username, and a password with which you can download the datasets. Please allow seven business days for a response.
Contact
If you have any questions or concerns regarding the terms of agreement, the datasets available, or need to report infringements on the terms of agreement, please contact Fabricio Benevenuto or Derek Ruths.
|
We consider both contributed and official ICWSM datasets to provide an empirical basis for excellent research that could appear at ICWSM and related venues. Please consider using them in work that you will be submitting to ICWSM this year!