Workshop on Challenges and Opportunities in Automated Coding of COntentious Political Events (COPE 2019) @Euro CSS 2019
Collecting protest and conflict event information from news sources enables historical and comparative studies of social movements in social and political sciences. As the collection of event data covers more countries, longer time periods, and more details and granularity, which are more abound in local sources in comparison to international resources, their utility in social science applications multiplies. Given the excessive time and human effort costs that manual data collection would incur, there is an increasing tendency to rely on machine learning and natural language processing (NLP) methods to develop automated classification and extraction tools that would possibly deal better with the enormous amount and variety of data to be collected.
As an interdisciplinary team of researchers (composed of computer scientists, computational linguists and social scientists) of the Emerging Welfare Project (https://emw.ku.edu.tr), we have been working on automated protest information collection for over two years. While building our protest information collection models and designing our methodology, we have encountered many of the well-known challenges of automated event extraction, ranging from the problem of source selection to the concerns about completeness and validity of the data to the issues of generalizability (Wang et.al. 2016). This workshop will address these issues and potential methods to tackle them with new methodologies and task designs.
The need for collecting protest or conflict data has been satisfied by manual, semi-automatic and automatic approaches. However, the results that have been yielded by these approaches to date are either not at a sufficient quality or they require tremendous effort to replicate on new data. Recent reviews point at major causes for concern in existing protest databases such as insufficient validity and reliability, inconsistencies within and between corpora, and lack of generalizability in terms of methodologies and results. On the one hand, manual or semi-automatic methods require high quality human effort while, on the other hand, text classification and information extraction systems tend not to perform similarly well on corpora from a setting that is different from the one used for training. Aforementioned shortcomings stem mainly from the lack of regard given to the variable nature of contentious politics, which takes slightly different forms in different countries and time periods in line with spatial and temporal variation of sociopolitical phenomena. Those who attempt to tackle this problem usually resort to not fully automated methods, such as using key term-based filtering of sources that attempt to make variability more manageable but sacrifice recall performance, resulting in missing undetermined amount of information from the outset. Also, training models based on a single case or filtered data would yield static tools that are less capable of performing with comparable recall and precision when applied to contexts different from those that are trained on. This is also a significant factor in the validity, reliability and consistency problems facing existing protest databases.
This workshop will work to develop solutions for these methodological issues in a collective manner. In general, there is lack of scientific collaboration among academic groups working on event-coding programs (Wang et al 2016 and Lorenzini et al 2016) and the most important objective of this workshop is to fill this gap and connect the researchers. We hope to contribute in the formation of a possible collaborative environment for automated event coding, which has increasingly become target of ever growing scientific interest, budget and efforts. We also aim at editing a special issue out of the workshop to be published in a top computational linguistics or political science journal.
Call for Papers
We invite empirical, theoretical or methodological contributions as extended abstracts (500 words and at most 1 page) in the following topics of interests :
- Automated protest event coding projects’ assessment analyses
- Completeness and validity of the protest event databases
- Source selection problem
- Report selection problem
- Inconsistent corpora over time
- Information Extraction for collecting protest event information
- Training Data collection/annotation processes for machine learning
- Limits of data preparation, tool development, and automation
- Copyright of data used
The workshop will have a session for discussing results of CLEF 2019 Lab ProtestNews (https://emw.ku.edu.tr/clef-protestnews-2019/) (Hürriyetoğlu et al., 2019) and a new edition of this lab. We will accept abstracts that are about submissions to the new version of the lab to be presented in this session. The subscription and submission to the new version of the challenge will be managed on https://competitions.codalab.org/competitions/20288.
Please submit one page that contain at most 500 words and at most 1 page in PDF format.
The submission system can be accessed on https://easychair.org/conferences/?conf=cope2019.
Data challenge submission deadline: July 21, 2019
Abstract submission deadline: July 21, 2019
Notification of acceptance: July 28, 2019
Workshop date: September 2, 2019
Registration deadline: August 17, 2019
Erdem Yörük (Sociology, Koç University and University of Oxford)
Ali Hürriyetoğlu (Computational Linguistics, Koç University)
Çağrı Yoltar (Anthropology, Koç University)
Fırat Durusan (Political Science, Ankara University and Koç University)
Osman Mutlu (Computer Science, Koç University)
Arda Akdemir (Computer Science, Koç University)
Aline Villavicencio (Computer Science, University of Essex)
Hürriyetoğlu, A., Yörük, E., Yüret, D., Yoltar, Ç., Gürel, B., Duruşan, F., & Mutlu, O. (2019, April). A Task Set Proposal for Automatic Protest Information Collection Across Multiple Countries. In European Conference on Information Retrieval (pp. 316-323). Springer, Cham. URL: https://link.springer.com/chapter/10.1007/978-3-030-15719-7_42
Lorenzini, J., Makarov, P., Kriesi, H., & Wueest, B. (2016). Towards a Dataset of Automatically Coded Protest Events from English-language Newswire Documents. In Paper presented at the Amsterdam Text Analysis Conference. URL: http://bruno-wueest.ch/assets/files/Lorenzini_etal_2016.pdf
Wang, W., Kennedy, R., Lazer, D., & Ramakrishnan, N. (2016). Growing pains for global monitoring of societal events. Science, 353(6307), 1502-1503. URL: http://science.sciencemag.org/content/353/6307/1502