AAAI

General Chairs

William W. Cohen
Carnegie Mellon/Google
Nicolas Nicolov
J.D.Power and Assoc., McGraw-Hill

Program Chairs

Natalie Glance
Google Inc
Matthew Hurst
Live Labs, Microsoft

Data Chairs

Ian Soboroff
NIST
Akshay Java
Live Labs, Microsoft

Local Chair

Cameron Marlow
Facebook

Tutorials Chair

Chris Diehl
Johns Hopkins University

Home

Invited Speakers

Call for Papers

Important Dates

Committees

Submission

Data

Sponsors

ICWSM 2008

ICWSM 2007

3rd Int'l AAAI Conference on Weblogs and Social Media

May 17 - 20, 2009, San Jose, California

http://www.icwsm.org/2009/

Sponsored by the Association for the Advancement of Artificial Intelligence.


ICWSM 2009 Data Challenge

NOTE: we are currently experiencing some delays in processing requests for the dataset. Please allow 2-3 working days for us to respond.

Continuing the ICWSM tradition, ICWSM 2009 is making a dataset available to researchers in the blog and social media fields. We invite you to download the dataset, explore it, learn something interesting about it, and submit a paper about it to ICWSM 2009.

Good research topics might include...

But you should feel free to explore any aspect of the data that you feel would be of interest to the ICWSM community.

Authors are invited to submit papers to a special data challenge workshop, to be held on the last day of ICWSM. Details on paper submissions will appear here soon. The deadline for workshop submissions will be March 1st. The workshop itself will feature presentations by authors as well as a broader discussion of data issues and opportunities confronting the social media community.
We also welcome authors to submit papers on the dataset to the main ICWSM conference. Time permitting, we will invite authors of accepted ICWSM papers on the dataset to also briefly present their work at the workshop.
The best paper (main conference or workshop) on the dataset will be selected by the data chairs and will receive a prize at the conference.
Please note that the datasets made available through ICWSM are not restricted to only ICWSM 2009 or even ICWSM in general. Our long-term goal is to make weblog and social media datasets available to the research community, and while we hope that ICWSM will be a premier venue for presenting that research, we are happy to see the ICWSM datasets used far and wide.

ICWSM 2009 Spinn3r Blog Dataset

The dataset, provided by Spinn3r.com, is a set of 44 million blog posts made between August 1st and October 1st, 2008. The post includes the text as syndicated, as well as metadata such as the blog's homepage, timestamps, etc. The data is formatted in XML and is further arranged into tiers approximating to some degree search engine ranking. The total size of the dataset is 142 GB uncompressed, (27 GB compressed).
This dataset spans a number of big news events (the Olympics; both US presidential nominating conventions; the beginnings of the financial crisis; ...) as well as everything else you might expect to find posted to blogs.
To get access to the Spinn3r dataset, please download and sign the usage agreement , and email it to dataset-request (at) icwsm.org. Once your form has been processed, you will be sent a URL and password where you can download the collection.
Here is a sample of blog posts from the collection. The XML format is described on the Spinn3r website.
Spinn3r provides free access to researchers. If you are interested in making use of their data beyond the ICWSM collection, for example to crawl linked posts or earlier stories from certain blogs, visit their site, spinn3r.com

Community
We have a mailing list for discussing the datasets at http://groups.google.com/group/icwsm-data. Please join to talk about whatever you're doing with the data. In particular, if you are looking for groups to collaborate with, here's a forum for you. We also have a project at Google Code, http://code.google.com/p/icwsm-data/, where we can host tools and resources that you create to go along with the datasets.

Data Chairs
Ian Soboroff, NIST
Akshay Java, Live Labs, Microsoft



Google J.D. Power and Associates Web Intelligence Division. Microsoft
Neilsen Online



Sponsored by the Association for the Advancement of Artificial Intelligence. For more info: icwsm09@aaai.org