Tutorial Program

Coffee Break Morning: 10:15AM - 10:45AM
Lunch Break: 12:30PM - 2:00PM
Coffee Break Afternoon: 3:45PM - 4:15PM

June 11, 2019

8:30am - 12:30pm

T1: Tools for WhatsApp data collection (Room H.002)
Kiran Garimella, Philipe Melo, Gareth Tyson, Jussara Almeida, Fabrcio Benevenuto
T2: Generative models of online discussion threads (Room H.004)
Pablo Aragon, Vicenc Gomez, Andreas Kaltenbrunner, David Garcia, Alberto Lumbreras

2:00pm - 6:00pm

T3: Measuring Information Spread Within and Across Social Platforms (Room H.004)
Emily Saldanha, Maria Glenski, Svitlana Volkova
T4: Introduction to Social Media Network Analysis with NodeXL (Room H.352)
Wasim Ahmed, Harald Meier

Tutorial T1: Tools for WhatsApp data collection

WhatsApp is the most popular communication application in many countries, including the US, UK, Germany and many developing countries such as Brazil, India, and Mexico (where many people use it as an interface to the web). Due to its encrypted and peer-to-peer nature, it is hard for researchers to study which content people share through WhatsApp at scale. Its relevance, however, means that it is important for researchers to study its growth, usage and impact. In an attempt to support the research community, this tutorial will present tools and methodologies for collecting WhatsApp public group data. It will be separated into two parts. In the first part, we will present tools and underlying techniques for collecting data from public WhatsApp groups. In the second part, we will showcase our system, WhatsApp Monitor, a web-based system that helps researchers and journalists to explore the nature of content shared on WhatsApp. Our tool monitors multiple content categories such as images, videos, audio, and textual messages posted on a set of WhatsApp groups and displays the most shared content per day. The tutorial covers practical as well as analytical techniques for obtaining and exploring WhatsApp data. The goal will be to train researchers and other interested parties in how they can collect and analyze WhatsApp group data within the context of their own studies.

  • Kiran Garimella(@gvrkiran) is a Michael Hammer postdoctoral fellow at the Institute for Data, Systems and Society (IDSS) at MIT. Before joining MIT, he was a postdoc at EPFL, Switzerland. His research focuses on using digital data for social good, including areas like polarization, misinformation and human migration. His work on studying and mitigating polarization on social media won the best student paper awards at WSDM 2017 and WebScience 2017. Kiran received his PhD at Aalto University, Finland, and Masters & Bachelors from IIIT Hyderabad, India. Prior to his PhD, he worked as a Research Engineer at Yahoo Research, Barcelona, and QCRI, Doha. Website: https://users.ics.aalto.fi/kiran"

  • Phillipe Melo is a PhD student at UFMG, Brazil. His previous research was about methods of sentiment analysis and is currently studying fake news on social networks. Website:https://homepages.dcc.ufmg.br/~philipe

  • Gareth Tyson(@gareth_tyson) is a Lecturer at Queen Mary University of London, and a Fellow at the Alan Turing Institute. His research focuses on understanding illegal, illicit and unusual activities in social media, and networked systems. His work has received coverage from news outlets such as MIT Tech Review, Washington Post, Slashdot, BBC, The Times, Daily Mail, Wired, Science Daily, Ars Technica, The Independent, Business Insider and The Register. He also serves as a reviewer and program committee member for a number of prominent conferences/journals such as IEEE/ACM ToN, IEEE ICDCS, ACM CoNEXT, and ICWSM. He was nominated for the Best Paper Award at the Web Conference 2018 (best paper in track), his recent INFOCOM paper won the Best Presentation, and he has twice been awarded the Outstanding Programme Committee Member Award from ICWSM (2016 and 2018). Website:http://www.eecs.qmul.ac.uk/~tysong/

  • Jussara M. Almeida is an Associate Professor in the Computer Science Department at UFMG and was an Affiliated Member of the Brazilian Academy of Sciences from 2011 to 2015. She is the leader of the Social Computing Lab at UFMG. Her research is focused around understanding how users interact with different applications, characterizing and modeling the workload patterns that emerge from such interactions, and exploiting those patterns to enhance current applications and services on the Web. Website:https://homepages.dcc.ufmg.br/~jussara/

  • Fabricio Benevenuto(@fbenevenuto) is an Associate Professor in the Computer Science Department of Federal University at Minas Gerais (UFMG). Fabricio was an affiliated member of the Brazilian Academy of Science and, recently, he was visiting faculty at MPI-SWS, Germany (2017-2018), through a fellowship from Humboldt Foundation. Currently, Fabrício is the leads the social computing Lab at UFMG and his research interests are focused on unveiling misinformation campaigns in social media. Website:https://homepages.dcc.ufmg.br/~fabricio/

Tutorial T2: Generative models of online discussion threads

Online discussion is a core feature of numerous social media platforms and has attracted increasing attention from academia for different and relevant reasons, e.g., the resolution of problems in collaborative editing, question answering and e-learning platforms, the response of online communities to news events, online political and civic participation, etc. Discussions on the Internet commonly occur as a exchange of written messages among two or more participants. These conversations are often represented as threads, which are initiated by a user posting a starting message (a post) and then other users replies to either the post or the earlier replies. Given this sequential posting behavior, online discussion threads follow a tree network structure. Different modeling approaches have been proposed to identify the governing mechanisms of the network structure of threads. Statistical models of this type are aimed to reproduce the growth of discussion threads through different features, often related to human behavior. This is why they are usually called generative models: they do not only estimate the statistical significance of their corresponding features but also reproduce the temporal arrival patterns of messages that form a discussion thread. The parameters of these models allow to compare different platforms and communities, they even can help to assess the impact of design choices and user interface changes on the way the discussions unfold. Therefore, we aim to provide the participants with state of the art tools and methods for the analysis, diagnosis, management and improvement of online discussion platform and communities.

  • Pablo Aragón(@elaragon) is a doctoral researcher at Universitat Pompeu Fabra and the technology centre Eurecat. His research focuses on understanding social and political phenomena through the analysis of data from the Internet. Pablo has given multiple talks to academic and industrial audiences and published results in major conferences and journals in social network analysis and computational social science. Website:https://elaragon.wordpress.com/

  • Vicenç Gómez(@vicen__gomez) is a tenure-track professor at the Artificial Intelligence and Machine Learning group at Universitat Pompeu Fabra, Barcelona. His main research interests are machine learning and optimal control in applications to different areas such as social networks. He has been a research associate at the Radboud University Nijmegen (NL) and has held visiting appointments in Los Alamos National Laboratory (USA), the IAS group at Technische Universitaet Darmstadt (Germany), and at University College London (UK). Website:https://www.mbfys.ru.nl/staff/v.gomez/

  • Andreas Kaltenbrunner(@akalten_bcn) is the Director of Data Analytics at NTENT, where he leads a team focusing on the analysis of user behaviour and improvements for ranking in mobile search. Prior to that, he directed and co-founded the Social Media research line at Barcelona Media Foundation, and the Digital Humanities research unit at the technology centre Eurecat. Andreas has co-authored more than 70 publications in the areas of computational social science, social media and social network analysis. Website:https://www.dtic.upf.edu/~akalten/

  • David García(@dgarcia_eu) is group leader at the Medical University of Vienna and faculty member of the Complexity Science Hub Vienna since September 2017. He leads a research group funded by the Vienna Science and Technology Fund. He holds computer science degrees from Universidad Autonoma de Madrid (Spain) and ETH Zurich (Switzerland). David did a PhD and Postdoc at ETH Zurich, working at the chair of systems design. David’s research focuses on computational social science, designing models and analysing human behaviour through digital traces. Website:https://dgarcia.eu/

  • Alberto Lumbreras(@albertlumbreras) is a senior researcher at the Criteo Artificial Intelligence Lab, where he currently works on multi-task learning. Previously he has worked as a researcher for Technicolor, the CNRS, and Telefónica R&D. He obtained his PhD in 2016 from the University of Lyon. His main research interest is the application of Bayesian methods to different topics such as social network analysis, recommender systems, or deep learning. Website:http://www.albertolumbreras.net/

Tutorial T3: Measuring Information Spread Within and Across Social Platforms

Information is spreading on social media 24 hours a day, 7 days a week, and 365 days a year. This information may or may not be truthful and may be newsworthy or just consist of personal opinions. Regardless, it has the power to change the beliefs of an individual or the whole country. The danger is that information spreads through social media as quickly as the push of a button and may have significant impact on the real world. However, because of the large variation in social platform structure, types of information being spread, and domains of interest, researchers studying information spread in online social environments often apply different measures to quantify their observations. This leads to difficulty in making comparisons between results to understand the factors that influence diffusion patterns. In our proposed hands-on tutorial, we plan to demonstrate the development and application a set of rigorous common approaches to measuring and quantifying information spread across diverse social platforms and domains of interest. We first provide a conceptual definition of information spread, diffusion mechanisms, and observables in different platforms. This will allow multiple social platforms to be analyzed using the same framework, despite widely varying structure and interaction mechanisms. Next, we discuss factors that influence information spread. We then define broad groups of measurements that allow us to effectively quantify information spread within and across social platforms including but not limited to Twitter, GitHub, Reddit and Telegram. All participants will get to experience first-hand a novel evaluation framework, a comprehensive Python package with 100+ measurements for quantifying many properties of online information spread. Measurements target several online behavioral phenomena fundamental to online information spread including cascades, recurrence, persistent groups, and gatekeepers. Our novel cross-platform measurements are the first comprehensive library to quantify how information spreads across multiple social environments.

  • Emily Saldanha received her Ph.D. degree from Princeton University, Princeton, NJ, USA, in 2016. She is currently a data scientist in the Data Sciences and Analytics group, National Security Directorate, Pacific Northwest National Laboratory. Her research focuses on applying machine learning and deep learning techniques to identify and understand patterns in complex data sets with specific interests in natural language processing, social media analytics, and time series analysis.

  • Maria Glenski received the M.S. degree from University of Notre Dame, Notre Dame, IN, USA, in 2018. She is currently pursuing the Ph.D. degree with the Department of Computer Science and Engineering, University of Notre Dame. She is also a member of the Interdisciplinary Center for Network Science and Applications (iCeNSA) at the University of Notre Dame. Her research in social media analysis and rating systems has been published in the ACM Conference on Hypertext and Social Media, ACM Transactions on Intelligent Systems and Technology, and the ACM Conference on Computer-Supported Cooperative Work and Social Computing.

  • Svitlana Volkova received the Ph.D. degree from Johns Hopkins University, Baltimore, MD, USA, in 2015. She is currently a senior scientist at the Data Sciences and Analytics group, National Security Directorate, Pacific Northwest National Laboratory. Her research focuses on advancing machine learning, deep learning and natural language processing techniques to build novel predictive and forecasting social media analytics. Her models advance understanding, analysis, and effective reasoning about extreme volumes of dynamic, multilingual, and diverse real-world social media data. She was awarded the Google Anita Borg Memorial Scholarship in 2010 and the Fulbright Scholarship in 2008. Dr. Volkova is a Vice Chair of the ACM Future of Computing Academy.

Tutorial T4: Introduction to Social Media Network Analysis With NodeXL

This short course provides an overview of social network analysis (SNA) and demonstrates through theory and practical case studies its application to research, particularly on social media and digital interaction and behaviour records. This topic has grown in importance with the increasing popularity of social networking websites in particular (e.g. Twitter, YouTube, Facebook, LinkedIn etc.) and social computing in general. As people increasingly participate in online communities for social, commercial, and civic interaction, new methods are needed to study these phenomena. SNA makes a valuable contribution to social media research by providing a language and measures for sets of complex relationships created from patterns of online communication. Social network theory conceptualizes networks as a group of actors who are connected by a set of relationships. Actors are often people, but can also be nations, organizations, objects, etc. Social network analysis focuses on patterns of relations among actors that include humans. It seeks to describe networks of relations as fully as possible. This includes identifying prominent patterns in networks, tracing the flow of information through them, and discovering what effects these relations and networks have on people and organizations. It can therefore be used to study network that are connected via various means in an online environment. Upon completion of this course, participants will:

  • understand the basics of SNA, its terminology and background.
  • be able to transform communication data(e.g. Facebook, YouTube, Twitter, etc.) into network data.
  • understand the different possible data formats for social networks e.g. in a matrix or edge list.
  • know how SNA can be applied to social media analysis.
  • be familiar with the use of standard SNA tools and software in general and the NodeXL social network analysis add-in for Excel in particular
  • be able to derive practical and useful information through SNA analysis that would help design and manage an innovative and successful online community.

  • Dr. Wasim Ahmed, a Lecturer in Digital Business at Northumbria University and as a member of the Social Media Research Foundation and Connected Action Consulting, is an official representative of team NodeXL. Dr Ahmed’s PhD thesis analysed user-generated content from Twitter related to disease outbreaks. He has delivered over 60 talks on his research including a talk to the European Centre for Nuclear Research (CERN), as well as to the UK government. His engagement activities also include numerous keynote talks across the world, most recently, to an event Making Social Media Data Matter run by Boston University College of Communication and more recently to Durban University of Technology in South Africa. He serves on the committee of a Special Interest Group on Social Media for the Association for Information Science and Technology. More about Wasim including his list of publications can be found on his dedicated research blog (https://wasimahmed.org/about/). He has a strong social media following and his social media blogs for the London School of Economics and Political Sciences have a strong readership.
  • Harald Meier is a Diploma-Geographer who graduated from the Department of Geography at the University of Münster, Germany, in 2010. His diploma thesis offered a new approach to world city network research and eventually he founded the Digital Space Lab in 2011 where he conducts social network analyses with a special focus on the spatial configurations of online networks. He takes particular research interest into economic sector and production network analysis of the global economy. In 2014 Harald has teamed up with the Globalization and World Cities (GaWC) Research Network to promote his research project "Hyperlink Network Geographies". NodeXL has been his prime tool for social network analysis and data visualization.