Tweet |
As part of the ICWSM Data Sharing Initiative, ICWSM provides a hosting service for new datasets used by papers published in the proceedings of the annual ICWSM conference. All datasets are released as openly available community resources. Please see the instructions on the registration process in order to gain access to the datasets.
ICWSM-15 is the fourth year of this initiative. You can find the list available datasets below. Detailed description of datasets will be uploaded soon.
You can find the list of available datasets from ICWSM 2012 in the following table.
Paper | Description | # of files | # of observations (tweets/facebook accounts/entries) | # of Twitter users | # of nodes | # of edges |
Z Luo, Miles O and T Wang. Opinion Retrieval in Twitter. | Tweets tagged as relevant or irrelevant for 50 specific queries | 1 | 5000 | |||
L Chen, W Wang, M Nagarajan, S Wang and AP Sheth. Extracting Diverse Sentiment Expressions with Target-dependent Polarity from Twitter. | Tweets describing movies and persons | 4 | 426660 | |||
J Mahmud, J Nichols, and C Drews. Where is This Tweet From? Inferring Home Locations of Twitter Users. | Geo tagged tweets from 100 top cities | 2 | 1005259 | 19390 | ||
L Rossi and M Magnani. Conversation practices and network structure in Twitter. | Tweets about the fifth edition of the popular TV show XFactor Italia | 2 | 22287 | |||
LM Aiello, M Deplano, R Schifanella, and G Ruffo. People are Strange when you're a Stranger: Impact and Influence of Bots on Social Networks. | Social info from anobii.com | 7 | 671585 | 179654 | 1566369 | |
J Park, M Cha, H Kim, and J Jeong. Managing Bad News in Social Media: A Case Study on Domino’s Pizza Crisis. | Tweet collection for Domino's pizza crisis | 10 | 4645 | 6158 | 29987 | |
Y He, C Lin, W Gao, and KF Wong. Tracking Sentiment and Topic Dynamics from Social Media. | Mozilla add on review data | 1 | 9300 | |||
D O'Callaghan, M Harrigan, J Carthy, and P Cunningham. Network Analysis of Recurring YouTube Spam Campaigns. | Youtube comments data | 1 | 6466882 | |||
P Agarwal, R Vaithiyanathan, S Sharma, and G Shroff. Catching the Long-Tail: Extracting Local News Events from Twitter. | Tweets that possibly describe a fire in a factory. | 354 | 20751 | |||
FA Zamal, W Liu, and D Ruths. Homophily and Latent Attribute Inference: Inferring Latent Attributes of Twitter Users from Neighbors. | Tweet data for gender/age/political orientation labeled users | 9 | 233653995 | 340248 | ||
F Giglietto. If Likes Were Votes: An Empirical Study on the 2011 Italian Administrative Elections. | Facebook profile of 104 Italian politicians | 1 | 104 |
Download and sign the ICWSM Dataset Usage Agreement. Please note that this agreement gives you access to all ICWSM-published datasets. In it, you agree not to redistribute the datasets. Furthermore, ensure that, when using a dataset in your own work, you abide by the citation requests of the authors of the dataset used.
Email the signed agreement, as a PDF file, to dataset-request@icwsm.org. In the body of your email,
We will respond to your request with a URL, a username, and a password with which you can download the datasets. Please allow seven business days for a response.
If you have any questions or concerns regarding the terms of agreement, the datasets available, or need to report infringements on the terms of agreement, please contact Jimmy Lin.
We consider both contributed and official ICWSM datasets to provide an empirical basis for excellent research that could appear at ICWSM and related venues. Please consider using them in work that you will be submitting to ICWSM this year!