TA1: Mining Smartphone Mobility Data
Spiros Papadimitriou, Tina Eliassi-Rad
Room Gereon, Maternushaus
TA2: Psychology for Computer Scientists: Fundamental Orientation and Frameworks
Johannes Eichstaedt, Michal Kosinski
Room Lambertus, Maternushaus
TA3: Using Crowdsourcing E ffectively for Social Media Research
Ujwal Gadiraju, Gianluca Demartini, Djellel Eddine Difallah, Michele Catasta
Room A212, GESIS
TP4: Critical Review of Online Social Data: Limitations, Ethical Challenges, and Current Solutions
Carlos Castillo, Fernando Diaz, Emre Kiciman, Alexandra Olteanu
Room Lambertus, Maternushaus
TP5: The Lifecycle of Geotagged Social Media Data
Rossano Schifanella, Bart Thomee
Training room, GESIS
TAP6: The Web of Cities and Mobility
Bruno Goncalves, Anastasios Noulas, Konstantinos Pelechrinis, Daniele Quercia
Room Gereon, Maternushaus
The details of each tutorial can be found below:
The recent availability of reasonably fast wireless and mobile data networks has spurred demand for more capable mobile computing devices. Conversely, the emergence of new devices has created further demand for better networks, creating an innovation cycle. The current concept of a smartphone as an always-connected computing device with multiple sensing modalities was brought into the mainstream by the Apple iPhone less than a decade ago. Such devices are now seeing an explosive growth. Additionally, for many people in the world, such devices are the first computers they use. Furthermore, small, cheap, always-connected devices with broad, extensible sensing capabilities are very recently emerging (e.g., standalone or peripheral), further blurring the lines between the physical and virtual worlds. All of this opens up countless possibilities for data collection and analysis across a broad range of applications. In this tutorial, we survey the state-of-the-art in terms of mining smartphone mobility data across different application areas. Our tutorial consists of three parts. First, we discuss the possibilities and challenges in the collection of data from various sensing modalities on smartphones. Second, we present cross-cutting challenges and algorithms in sensing and localization. Third, we cover a broad classes of applications--notably mobile health, location-based social networks, and mobile advertising. We conclude by showcasing the opportunities for new data collection techniques and new data-mining methods to meet the challenges and applications that are unique to the smartphone mobile arena.Organizers
Spiros Papadimitriou is an Assistant Professor of Management Science & Information Systems at Rutgers University. Prior to joining academia, he was a Research Scientist at Google and a Research Staff Member at IBM Research. He received his PhD from Carnegie Mellon University. His main research interests are mining graph and streaming data, clustering, time series, large-scale data-processing systems, and mobile/embedded applications. For more details, visit http://bitquill.net/about.html.
Tina Eliassi-Rad is an Associate Professor of Computer Science and Network Science at Northeastern University in Boston, MA. She is also an Associate Professor of Computer Science at Rutgers University (on leave of absence). Prior to joining academia, she was a Member of Technical Staff at Lawrence Livermore National Laboratory. She received her PhD from the University of Wisconsin-Madison. Her current research is at the intersection of graph mining, network science, and computational social science. For more details, visit http://eliassi.org.
Prof. Dr. Katharina Morik (TU Dortmund University) is a member of the German National Academy of Science and Engineering and the North-Rhine-Westfalia Academy of Science and Art. She leads the research center SFB876 on data analysis under resource constraints comprising 14 projects, 20 professors and about 50 PhD students.
Most researchers who use social media are in the business of understanding behavior, and often at scales unbeknownst to traditional psychology. In all likelihood some of the great future breakthroughs in psychology will be discovered by computational scientists. Yet most computational researchers have limited training in the social sciences. We believe that a basic understanding of psychological processes would aid in both understanding findings, as well as identifying striking results that might of interest to the social science community at large.
Fortunately, across the last 50 years, a number of influential theories have emerged in psychology that have helped organize the field tremendously and that set benchmarks for how individuals behave in a given context. The more computational scientists are familiar with these basic frameworks, the easier it will be to identify relevant points of contact with the existing literature.
This tutorial will review some of the currently most widely accepted fundamental theories of psychology, starting with basic brain function that span needs, lifespan and moral development, dimensional theories of emotion and interpersonal interaction, states vs. traits, personality models, and psychological assessment. Our hope is that this tutorial will leave the audience with a foundational orientation in psychology, to aid in the integration of computational findings with psychological theory.Organizers
Michal Kosinski is the Assistant Professor in Organizational Behavior at the Graduate School of Business, Stanford University. After receiving his PhD in Psychology from the University of Cambridge (UK) in 2014, Kosinski spent a year as a Postdoctoral Scholar at the Computer Science Department at Stanford University. Kosinski's research had a significant impact on both academia and the industry. His findings featured in The Economist's special report on the future of insurance (2015), inspired two TED talks, and prompted a discussion in the EU Parliament. In 2013, Kosinski was listed among the 50 most influential people in Big Data by DataIQ and IBM, while three of his papers were placed among Altmetrics' "Top 100 Papers That Most Caught the Public Imagination".
Johannes C. Eichstaedt is a Dean's Scholar and PhD candidate in psychology at the University of Pennsylvania under Martin Seligman. A former physicist, in 2011 he co-founded and led the World Well-Being Project which is pioneering methods to measure the psychological states of large populations using social media, text mining and machine learning. This work has resulted in millions of dollars of grant funding and media attention around the world, including in The New Yorker, Washington Post and The Onion. He was elected a 2014 Emerging Leader in Science & Society by the American Association for Advancement of Science, and has served as an expert for the United Nations and OECD to advise on the society-wide measurement of well-being.
Since the term crowdsourcing was coined in 2005, we have witnessed a surge in the adoption of the crowdsourcing paradigm. Crowdsourcing solutions are highly sought-after to solve problems that require human intelligence at a large scale. In the last decade there have been numerous applications of crowdsourcing spanning several domains in both research and for practical benefits across disciplines (from sociology to computer science). In the realm of research practice, crowdsourcing has unmistakably broken the barriers of qualitative and quantitative studies by providing a means to scale-up previously constrained laboratory studies and controlled experiments. Today, one can easily build ground truths for evaluation, access potential participants around the clock with diverse demographics at will, and all within an unprecedentedly short amount of time. This also comes with a number of challenges related to lack of control on research subjects and to data quality.
In this tutorial, we will introduce the crowdsourcing paradigm in its entirety. We will discuss altruistic and reward-based crowdsourcing, eclipsing the needs of task requesters as well as the behavior of crowd workers. The tutorial will focus on paid microtask crowdsourcing, and reflect on the challenges and opportunities that confront us. In an interactive demonstration session, we will run the audience through the entire life-cycle of creating and deploying microtasks on an established crowdsourcing platform, optimizing task settings in order to meet task needs, and aggregating results thereafter. We will present a selection of state-of-the-art methods to ensure high-quality results and inhibit malicious activity. The tutorial will be framed within the context of Social Media. The human element at the core of all Social Media breeds a rich ground for crowdsourcing, and we aim to spread the virtues of this growing field.Organizers
Ujwal Gadiraju is a PhD Candidate at the L3S Research Center, Leibniz Universität Hannover in Germany. His primary research interests include human computation and crowdsourcing. He has published peer-reviewed papers in top-tier conferences in the realms of Information Retrieval, Social Computing, Web Mining and Crowdsourcing. His recent work deals with improving the effectiveness of crowdsourcing microtasks by considering task design and crowd worker behavior.
Dr. Gianluca Demartini is a Senior Lecturer in Data Science at the Information School of the University of Sheffield, UK. Previously, he was post-doctoral researcher at the eXascale Infolab at the University of Fribourg, visiting researcher at UC Berkeley, junior researcher at the L3S Research Center, and intern at Yahoo! Research. His research interests include Web Information Retrieval, Semantic Web, and Human Computation. He obtained a Ph.D. in Computer Science at the Leibniz University of Hannover in Germany focusing on Entity Retrieval. He has published more than 60 peer-reviewed scientific publications and given tutorials about Entity Retrieval and Crowdsourcing at research conferences. He is a Distinguished ACM Speaker since 2015 and a part-time crowd worker since 2011.
Dr. Djellel Eddine Difallah is a senior researcher at the eXascale Infolab. During his Ph.D at the University of Fribourg (Switzerland) he worked on combining the intelligence of humans in solving complex problems, and the scalability of machines to process large amounts of data. His research interests also include data management and machine learning. Previously, he worked for Microsoft CISL, Google SoC, Schlumberger, and he is a Fulbright alumni.
Dr. Michele Catasta is a research scientist and lecturer at EPFL, Switzerland. During his PhD (EPFL, 2015), he let human memories and information systems have their first dance. To make this debut happen, he added new bells and whistles (human computation, machine learning, psychology) to his original researcher hat (big data analytics, information retrieval, semantic technologies). Michele was in the founding team of Sindice.com, the largest Semantic Web search engine (now SIREn Solutions). He also worked for MIT Media Lab, Google and Yahoo Labs. In the past years, he received several awards and recognitions - among them, a focused grant from Samsung Research USA.
This tutorial aims to carefully and critically scrutinize the use of online social datasets for research, against a variety of possible data, methodological, and ethical pitfalls, by systematically overviewing prior work that identifies, quantifies and provides solutions to them.
To set the context, we will first provide examples of typical limitations, trade-offs or mistakes in current research aims and practices. Then, we will scrutinize the representativeness of social datasets, covering major classes of data biases including population, behavioral, and collection biases, as well as other quality issues such as data decay and temporal variations. Particular attention will be given to issues related to the design and evaluation of methods for collecting or processing social datasets. Finally, we cover various ethical caveats such as algorithmic reinforcement of discriminatory treatment and existing prejudice, and the risk of privacy breaches.
The tutorial will also include two hands-on sessions, where participants will have the opportunity to explore and debate about different types of data biases and effects of design decisions, and to jointly evaluate example research projects given by the tutors. Real-world datasets and code templates will be provided.Organizers
Carlos Castillo is the director of research for data science at Eurecat. He is a web miner with a background on information retrieval, and has been influential in the areas of content quality and credibility. His current research focuses on mining the social web during time-critical situations, including humanitarian crises and natural disasters.
Fernando Diaz is a senior researcher at Microsoft Research. His primary research interest is formal information retrieval models, his experience including distributed information retrieval approaches to web search, interaction logging and modeling, interactive and faceted retrieval, mining of temporal patterns from news and query logs, cross-lingual information retrieval, graph-based retrieval methods, and synthesizing information from multiple corpora.
Emre Kiciman is a senior researcher at Microsoft Research Redmond. He is broadly interested in using social data to help people find what they want and need. His research includes foundations and infrastructure for better social media analysis, observational studies through social media, and social systems engineering.
Alexandra Olteanu is a social computing researcher interested in how data and methodological limitations delimit what we can learn from online social traces about the world. The problems she tackles are often motivated by existing societal challenges such as racial discrimination, climate change or disaster relief.
In this tutorial we cover the four stages that are part of the lifecycle of geotagged social media data in research, namely representing, processing, analyzing, and visualizing. The tutorial aims to arm participants with both theoretical and practical knowledge about how to make sense of geospatial data for use in applications that range from computational social science and social media analysis to behavioral studies on digital platforms. We provide the basics on how to obtain, represent and combine different spatial data sources, with an accent on how to efficiently store, index and query a location-based dataset. We further discuss the main techniques on how to derive insights from spatial data, how to avoid common pitfalls and how to exploit social media (e.g. user interests, user movements) for the purpose of gaining a deeper understanding of the phenomenon under study. The tutorial will end with an overview of the main libraries and paradigms to build interactive and dynamic visualizations of geographical data on a map.Organizers
Rossano Schifanella is an Assistant Professor in Computer Science at the University of Turin, Italy. His research embraces the creative energy of a range of disciplines across technology, social media, data visualization, and urban informatics.
Bart Thomee is a Senior Research Scientist at Yahoo/Flickr in San Francisco, CA, USA. His research primarily focuses on the visual and spatiotemporal dimensions of media, in order to better understand how people experience the world and to better assist them with exploring the planet.
Mining spatial data has been a core subject of study in the data mining community over the past years. Most of scholarly research has focused on the analysis of GPS traces and place recommendations. More recently however, new layers (e.g. social, semantic, linguistic) of big location data have emerged. Given the unprecedented levels of urbanization experienced in the last decade, among the most challenging and crucial ones is the urban fabric layer. The latter includes information that ranges from data related to transportation and navigation in a city to data that are related with the local economy. To integrate urban studies with the research agendas revolving around traditional data mining conferences, it has become clear that a basic introduction to urban studies is needed. The goal of this tutorial is twofold; (a) to provide this introduction in a form that is focused on topics most relevant to the ICWSM community and, (b) to introduce its attendants to the state-of-the-art in the analysis and modeling in this new regime of spatial data with a special focus on urban applications.Organizers
Anastasios Noulas is a Lecturer at the Data Science Institute at Lancaster University, where he leads projects on location-based technologies. Anastasios completed his PhD in 2013 at the Computer Laboratory in the University of Cambridge and has worked as a Data Scientist at Foursquare. In 2015 he helped launching OpenStreetCab.
Bruno Goncalves is a tenured faculty member at Aix-Marseille University and is currently at the NYU Center for Data Science. Bruno completed and is the author of over 50 publications with over 3000 Google Scholar citations and the editor of the forthcoming book Social Phenomena: From Data Analysis To Models his joint PhD in Physics, MSc in C.S. at Emory University in Atlanta, GA in 2008.
Konstantinos Pelechrinis is an assistant professor at the School of Information Sciences at the University of Pittsburgh where he leads the Network and Data Science Lab. Kostas received his PhD from the Computer Science department at the University of California, Riverside. He received the ARO Young Investigator Award in 2015 for his research on composite networks.
Daniele Quercia is a computer scientist and is currently building the Social Dynamics team at Bell Labs in Cambridge, UK. He has been named one of Fortune magazine's 2014 Data All-Stars and spoke about "happy maps" at TED. He completed his PhD at UC London and was a post-doctoral associate at MIT.