T1: Characterization, Detection, and Mitigation of Cyberbullying
Charalampos Chelmis, Daphney-Stavroula Zois
T2: Analytics over the Hows and the Whys
Deepak P, Dinesh Garg, Shirish Shevade
T3: Causal Inference and Counterfactual Reasoning
Amit Sharma, Emre Kiciman
T4: Generative Models for Social Media Analytics: Networks, Text, and Time
Kevin Shuai Xu, James R. Foulds
T5: Trade-Offs in Social Media for Interpreting Unstructured Data
Deepak Ajwani, Sourav Dutta, Patrick Nicholson, Alice Marascu, Alessandra Sala
Details about each tutorial are below.
Bullying, once limited to physical spaces (e.g., schools, workplaces or sports fields) and particular times of the day (e.g., school hours), can now occur anytime, anywhere. Cyberbullying can take many forms; however, it typically refers to repeated and hostile behavior performed in an effort to intentionally and repeatedly harass or harm individuals. The consequences of cyberbullying can be devastating. This tutorial presents a systematic review of approaches to detect, characterize, and mitigate cyberbullying behavior on online social media. First, we discuss the concept of cyberbullying, and the related nomenclature, by drawing from the social and psychological sciences. Then, we perform an in-depth review of the state-of-the-art cyberbullying research. We characterize the phenomenon of bulling on social media, and present recent computational approaches for the detection, quantification, and mitigation of cyberbullying We subsequently discuss the role of content and network structure of online social media interactions to these ends. Throughout the tutorial, we highlight the open challenges that need to be addressed in understanding, predicting, and preventing cyberbullying behavior, as well as promising research directions for researchers interested in this area.Organizers
Charalampos Chelmis is an Assistant Professor in Computer Science at the University at Albany, SUNY. His research focus is on Network Science and Big Data analytics. Before joining UAlbany, he was Senior Research Associate at the University of Southern California. He obtained his Ph.D. from the University of Southern California in 2013.
Daphney-Stavroula Zois is an Assistant Professor in Electrical and Computer Engineering at the University at Albany, SUNY. She specializes in Machine Learning and Signal Processing methods for decision-making under uncertainty. Before joining UAlbany, she was a Postdoctoral Researcher at UIUC. She obtained her Ph.D. from the University of Southern California in 2014.
The past decade has seen an enormous growth of Community Question Answering (CQA) systems. These, of which Yahoo! Answers, StackOverflow, Quora and Baidu Zhidao are popular examples, have evolved into dependable social media sources of knowledge for finding answers to how and why questions. Unlike factoid question answering where the expected answer is a word or a noun-phrase, the question-answer archives from these systems encompass experiential and reusable knowledge outlined in descriptive fashion. The community has witnessed enormous research interest developing in processing such experiential information, typically in the form of question-answer pairs augmented with additional metadata or content. This tutorial provides a scholarly overview of the literature in this burgeoning field within social media, positioning the advancements within the backdrop of more generic multi-view and multi-modal data analytics from machine learning and related communities, as well as more established paradigms of knowledge reuse such as case-based reasoning from AI. The expansion of CQA from comprising just textual information to encompass various forms of information such as images, videos, a rich user network and links to ontologies make this area a fertile area for interdisciplinary research cutting across sub-disciplines in data analytics such as image/vision, graphs and scalable data systems.Organizers
Deepak P is an Assistant Professor at Queen's University Belfast, UK. His interests lie in the general field of data analytics spanning NLP, IR and Data Mining, with a recent growing focus on ML/NLP methods for CQA analytics, the topic of the tutorial. In particular, he has published his research in CQA analytics at CIKM 2011, CIKM 2012, ECIR 2013, ACL 2014, EMNLP 2016 and EMNLP 2017. Further, he has authored more than 50 papers in top-notch avenues in NLP, AI, IR and databases and is the inventor on seven USPTO granted patents. He is recipient of the INAE Young Engineer Award 2015 and is a Senior Member of the ACM and the IEEE.
Dinesh Garg is currently an associate professor in Computer Science \& Engineering at IIT Gandhinagar (IITGN). Prior to joining IITGN, he was working as a Senior Researcher in Business Analytics and Maths Sciences (BAMS) department of IBM India Research Lab, Bangalore. Dinesh completed his M.Sc. (Engg.) and Ph.D. degrees in Computer Science & Automation (CSA) from Indian Institute of Science (IISc), Bangalore in the year of 2002 and 2006, respectively. Dinesh's research interests span across wide variety of topics within broad umbrella of intelligent system and artificial intelligence. His research has led to multiple best paper awards, and he was awarded the INAE Young Engineer Award in 2007. He is also a Senior Member of the IEEE.
Shirish Shevade received his Ph.D. from the Indian Institute of Science, Bangalore, India, in 2002. He is currently an Associate Professor in the Department of Computer Science and Automation at the Indian Institute of Science. His research interests span many areas of Machine Learning such as Support Vector Machines, Gaussian Processes and semi-supervised learning. He is a Senior Member of IEEE.
Digital systems have provided new ways of collecting large-scale data about social questions, but also present new challenges for inferring causal mechanisms of behavior, a critical goal in social sciences research. This tutorial introduces participants to causal inference and counterfactual reasoning, drawing from a broad literature from statistics, social sciences and machine learning. We first motivate the use of causal inference with social and online data through examples drawn from online social networks, health, education and governance. To tackle such questions, we will introduce key concepts and intuitions and a range of analysis methods, including randomized experiments, observational methods like matching and stratification, and natural experiment-based methods such as instrumental variables and regression discontinuity. We discuss best practices for evaluation and validation, drawing from our experiences. Throughout, the emphasis is on special considerations with social data, such as dealing with high dimensionality or an underlying social network.Organizers
Amit Sharma is a researcher at Microsoft Research India, focusing on understanding the mechanisms, such as recommendation systems and social influence, that shape people's interactions with algorithmic systems. His work contributes to methods for causal inference from large-scale data. He completed his Ph.D. in computer science at Cornell University.
Emre Kiciman is a Principal Researcher at Microsoft Research AI. His current research focuses on causal analysis of large-scale social media timelines. His interests include using social data to support individuals and policy-makers, and the impact of AI on people and society. Emre received his Ph.D. in from Stanford University.
Traditional social network models aim to understand social phenomena by studying graphs which represent the connections between individuals. In the age of social media, in which many of our social interactions are recorded digitally, social network data have become much richer and more complex. It has become increasingly clear that our models need to go beyond a single network, to include aspects such as textual and temporal information, and to handle data with multiple relations. Generative probabilistic models are well suited for such analyses of social media data, as they provide a natural framework for reasoning collectively over multiple data modalities.
This tutorial presents recent advances in generative models for social media analytics, focusing on models that encode social phenomena with latent (i.e. hidden) attributes, which are subsequently recovered from data. The tutorial begins with a review of generative models for social networks, including latent space models, block models, and modern variants of these such as mixed membership models. The second part of the tutorial showcases richer models for social media data that include text and dynamics, alongside illustrative case studies. The tutorial aims to serve a multidisciplinary audience, including scholars from both the social and computational sciences.Organizers
Kevin S. Xu is an assistant professor in the EECS Department at the University of Toledo. His main research interests are in machine learning and statistical signal processing with applications to network science and human dynamics. He received his PhD in 2012 from the University of Michigan.
James R. Foulds (Jimmy) is an assistant professor in the Department of Information Systems at the University of Maryland, Baltimore County. His research interests are in machine learning, focusing on probabilistic latent variable models and the inference algorithms to learn them from social networks and text data.
Social capabilities of future information retrieval (IR) and natural language processing (NLP) systems are expected to have enriched cognition for analyzing vast amounts of unstructured multi-modal contents in social media. This would enable next-generation functionalities for applications like credibility, group behaviour, and network analysis; lying at the confluence of semantic search, NLP and text mining. To this end, our tutorial will provide the audience with a broad overview on the latest breakthroughs and state-of-the-art techniques for deep semantic linking, understanding and contextual interpretation of unstructured and social media data. With technical presentations on problems like named-entity disambiguation and linking (NED), assigning human-readable topical tags, dynamically updating knowledge hierarchies with domain-specific vocabulary, as well as hands-on experience on the NED problem, the audience will be ready to tackle more complex problems on their own.
NED, a central problem in natural language interpretation, identifies and links the key entities, concepts, and domain-specificity within the unstructured data to an external structured knowledge repository. A higher level of semantic understanding involves capturing the overall topic, context and point-of-views expressed within the content. This finds diverse applications such as domain-specific knowledge acquisition (e.g., social media lingo, abbreviations) and evolution of concepts and topics for studying community behaviour and semantic linking of information. Such cognitive blocks should be highly accurate with scalable training, just-in-time prediction and computationally cheap updates. We show how current techniques in text mining, graph processing and machine learning can be leveraged to meet the above requirements by carefully breaking complex learning models into smaller models and using external knowledge. Furthermore, for advanced cognitive tasks, the contextual, semantic, and graph features used in the models need to be light-weight and update-friendly. As a real-world manifestation, we showcase how the discussed methodologies can be used to retrieve and link topically related social and multimedia content segments for faster navigation and ingestion of diverse information.Organizers
Deepak Ajwani is a research scientist (MTS) at Nokia Bell Labs, Ireland, carrying out research on algorithms for large graphs in parallel, external and distributed computation models. He is one of the lead researchers on a cognitive computing project to contextualize content. In recent years, he has extensively worked and published in the domains of AI, NLP, Algorithms and IR. He has given numerous invited talks and presented tutorials in WWW'15, Porto Winter School 2015 and German-Israeli Winter School on Big Data Algorithms in Tel Aviv, 2017.
Sourav Dutta is currently a Post-Doctoral Researcher on Data Analytics at Nokia Bell Labs, Ireland. His research interests span Data Mining, NLP, IR, Algorithms and AI, and has several publications in major conferences in this area (VLDB, IJCAI, WWW, TACL, EMNLP, EDBT, etc.). He previously worked at IBM Research, India and was awarded the Google European Doctoral Fellowship during his PhD studies. He has hosted multiple seminars and has prior teaching experience during both his masters and doctoral studies.
Patrick Nicholson is a research scientist (MTS) at Nokia Bell Labs Ireland, and has research interests that span data compression, compressed and distributed data structures, and machine learning. Prior to Bell Labs, Patrick was a postdoctoral researcher at the Max-Planck-Institute fur Informatik in Saarbruecken, Germany, and has taught courses there as well as in the University of Waterloo. He has given dozens of contributing and invited talks on both theoretical and experimental algorithmic results.
Alice Marascu is a research scientist at Nokia Bell Labs, Ireland. Previously, she was a research scientist at IBM Research-Ireland, and held post-doctoral research roles at University of Trento-Italy, and INRIA Rennes Bretagne Atlantique-France. Her research spans natural language processing, large scale streaming data processing, large scale complex pattern recognition and mining, time series analysis. She has given multiple talks to industrial and academic audiences and published results in main conferences in the areas of big data, data mining, machine learning, query answering (VLDB, PVLDB, SIGMOD, Big Data Conference, etc).
Alessandra Sala is the head of the Analytics Research Group in Nokia Bell Labs. In this role, she oversees numerous analytics research groups worldwide. In her prior appointment, she held a research associate position in the Department of Computer Science at University of California Santa Barbara. During this appointment, she was a key contributor of several funded proposals from National Science Foundation in USA and her research was awarded with the Cisco Research Award in 2011. She has developed efficient distributed systems that support robust and flexible application level services such as scalable search, flexible data dissemination, and reliable anonymous communication. She is general chair of ACM COSN 2014 and has served on the TPC of IEEE INFOCOM, WWW, P2P, PETS, IEEE GLOBECOM and others.