Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE) @ACL-IJCNLP 2022 (Online Event)
Today, the unprecedented quantity of easily accessible data on social, political, and economic processes offers ground-breaking potential in guiding data-driven analysis in social and human sciences and in driving informed policy-making processes. The need for precise and high-quality information about a wide variety of events ranging from political violence, environmental catastrophes, and conflict, to international economic and health crises has rapidly escalated (Porta and Diani, 2015; Coleman et al. 2014). Governments, multilateral organizations, local and global NGOs, and social movements present an increasing demand for this data to prevent or resolve conflicts, provide relief for those that are afflicted, or improve the lives of and protect citizens in a variety of ways. For instance, Black Lives Matter protests and conflict in Syria events are only two examples where we must understand, analyze, and improve the real-life situations using such data.
Event extraction has long been a challenge for the natural language processing (NLP) community as it requires sophisticated methods in defining event ontologies, creating language resources, and developing algorithmic approaches (Pustojevsky et al. 2003; Boroş, 2018; Chen et al. 2021). Social and political scientists have been working to create socio-political event databases such as ACLED, EMBERS, GDELT, ICEWS, MMAD, PHOENIX, POLDEM, SPEED, TERRIER, and UCDP following similar steps for decades. These projects and the new ones increasingly rely on machine learning (ML) and NLP methods to deal better with the vast amount and variety of data in this domain (Hürriyetoğlu et al. 2020). Automation offers scholars not only the opportunity to improve existing practices, but also to vastly expand the scope of data that can be collected and studied, thus potentially opening up new research frontiers within the field of socio-political events, such as political violence & social movements. But automated approaches as well suffer from major issues like bias, generalizability, class imbalance, training data limitations, and ethical issues that have the potential to affect the results and their use drastically (Lau and Baldwin 2020; Bhatia et al. 2020; Chang et al. 2019). Moreover, the results of the automated systems for socio-political event information collection may not be comparable to each other or not of sufficient quality (Wang et al. 2016; Schrodt 2020).
Socio-political events are varied and nuanced. Both the political context and the local language used may affect whether and how they are reported. Therefore, all steps of information collection (event definition, language resources, and manual or algorithmic steps) may need to be constantly updated, leading to a series of challenging questions: Do events related to minority groups are represented well? Are new types of events covered? Are the event definitions and their operationalization comparable across systems (Hürriyetoğlu 2019, 2020a, 2020b)? This workshop aims to seek answers to these kind of questions, to inspire innovative technological and scientific solutions for tackling the aforementioned issues, and to quantify the quality of the automated event extraction systems. Moreover, the workshop will trigger a deeper understanding of the performance of the computational tools used and the usability of the resulting socio-political event datasets.
We invite contributions from researchers in computer science, NLP, ML, AI, socio-political sciences, conflict analysis and forecasting, peace studies, as well as computational social science scholars involved in the collection and utilization of socio-political event data. Social and political scientists will be interested in reporting and discussing their approaches and observe what the state-of-the-art text processing systems can achieve for their domain. Computational scholars will have the opportunity to illustrate the capacity of their approaches in this domain and benefit from being challenged by real-world use cases. Academic workshops specific to tackling event information in general or for analyzing text in specific domains such as health, law, finance, and biomedical sciences have significantly accelerated progress in these topics and fields, respectively. However, there is not a comparable effort for handling socio-political events. We hope to fill this gap and contribute to social and political sciences in a similar spirit. We invite work on all aspects of automated coding of socio-political events from mono- or multi-lingual text sources. This includes (but is not limited to) the following topics
- Extracting events in and beyond a sentence
- Training data collection and annotation processes
- Event coreference detection
- Event-event relations, e.g., subevents, main events, causal relations
- Event dataset evaluation in light of reliability and validity metrics
- Defining, populating, and facilitating event schemas and ontologies
- Automated tools and pipelines for event collection related tasks
- Lexical, Syntactic, and pragmatic aspects of event information manifestation
- Development and analysis of rule-based, ML, hybrid, and human-in-the-loop approaches for creating event datasets
- COVID-19 related socio-political events
- Applications of event databases
- Online social movements
- Bias and fairness of the sources and event datasets
- Estimating what is missing in event datasets using internal and external information
- Novel event detection
- Release of new event datasets
- Ethics, misinformation, privacy, and fairness concerns pertaining to event datasets
- Copyright issues on event dataset creation, dissemination, and sharing
- Qualities of the event information on various online and offline platforms
This call solicits full papers reporting original and unpublished research on the topics listed above. The papers should emphasize obtained results rather than intended work and should indicate clearly the state of completion of the reported results. Submissions should be between 4 and 8 pages in total, plus unlimited pages of references. Final versions of the papers will be given one additional page of content (up to 9 pages plus references) so that reviewers’ comments can be taken into account.
Authors are also invited to submit short papers not exceeding 4 pages (plus two additional pages for references). Short papers should describe:
- a small, focused contribution;
- work in progress;
- a negative result;
- a position paper.
- a report on shared task participation.
Papers should be submitted on the START page of the workshop (https://www.softconf.com/acl2021/w22_case2021) in PDF format, in compliance with the ACL 2021 author guidelines provided on https://2021.aclweb.org/calls/papers .
The reviewing process will be double blind and papers should not include the authors’ names and affiliations. Each submission will be reviewed by at least three members of the program committee. If you do include any author names on the title page, your submission will be automatically rejected. In the body of your submission, you should eliminate all direct references to your own previous work.
Workshop Proceedings will be published on ACL Anthology.
Event information detection consists of multiple subsequent steps that could drastically affect the quality of the resulted event database. Thus, we believe one must consider a complete scenario that consists of document and sentence classification as relevant or not, event coreference resolution, event information extraction, and event classification in relation to an event taxonomy, and test the results on a list of events created manually to determine performance of the state-of-the-art on this task.
With this objective in mind, we organize a shared task on socio-political and crisis event detection at the workshop. Although the subtasks form a coherent flow, task participants can focus on one or more of them. Therefore, participants can choose the tasks or subtask(s) they would like to participate in. Participants will have access to all of the data for all tasks and subtasks. Any combination of these resources to achieve high performance for any of the tasks is allowed. For instance, Task 1 data could be used to potentially improve the performance on Task 2 and vice versa. The tasks and subtasks are:
Task 1. Multilingual protest news detection
- Subtask 1: Document classification ⇒ Does a news article contain information about a past or ongoing event?
- Subtask 2: Sentence classification ⇒ Does a sentence contain information about a past or ongoing event?
- Subtask 3: Event sentence coreference identification ⇒ Which event sentences (subtask 2) are about the same event?
- Subtask 4: Event extraction ⇒ What is the event trigger and its arguments?
We particularly focus on events that are in the scope of contentious politics and characterized by riots and social movements, i.e., the “repertoire of contention” (Giugni 1998, Tarrow 1994, Tilly 1984), which we name GLOCON Gold in our operationalization (Hürriyetoğlu et al. 2020a). The aim of the shared task is to detect and classify socio-political and crisis event information at document, sentence, cross-sentence, and token levels in a multilingual setting. The detailed description of the subtasks can be found in Hürriyetoğlu et al. (2019, 2020b). The data size for English is increased and data for Portuguese, Spanish, and Hindi are added in this edition.
Task 2: Fine-grained classification of Socio-political events
The objective of this task is to evaluate generalized zero-shot learning event classification approaches to classify short text snippets reporting socio-political events with fine-grained event types using the Armed Conflict Location & Event Data Project (ACLED) event taxonomy, which consists of 25 event subtypes pertaining to political violence, demonstrations (rioting and protesting) and selected non-violent, politically important events. The task is to label text snippets using ACLED types and potentially other types of similar events not covered directly by ACLED (unseen classes). One should keep in mind that the event definitions for task 1 and task 2 are not fully compatible.
Task 3: Discovering Black Lives Matter events in United States
This task is only an evaluation task where the participants of task 1 will have the possibility to evaluate their systems on reproducing a manually curated Black Lives Matter (BLM) related protest event list. Participants will use document collections provided by us to extract place and date of the BLM events. The event definition applied for determining these events is the same as the one facilitated for task 1. Participants may utilize any other data source to improve performance of their submissions.
Please find the detailed description of the tasks, application form, sample data, baseline scripts, and submission formats are on the dedicated repository (https://github.com/emerging-welfare/case-2021-shared-task).
Participants in the Shared Task are expected to submit a paper to the workshop. Submitting a paper is not mandatory for participating in the Shared Task. Papers must follow the CASE 2021 workshop submission instructions (ACL 2021 style template: https://2021.aclweb.org/calls/papers) and will undergo regular peer review. Their acceptance will not depend on the results obtained in the shared task, but on the quality of the paper. Authors of accepted papers will be informed about the evaluation results of their systems prior to the paper submission deadline (see the important dates).
Ali Hürriyetoğlu (Koc University, Turkey)
Hristo Tanev (Joint Research Centre (JRC), European Commission, Italy)
Vanni Zavarella (Joint Research Centre (JRC) of the European Commission, Italy)
Reyyan Yeniterzi (Sabancı University, Turkey)
Aline Villavicencio (University of Sheffield, the United Kingdom; and Institute of Informatics, Federal University of Rio Grande do Sul, Brazil)
Erdem Yörük (Koc University, Turkey),
Deniz Yuret (Koc University, Turkey),
Jakub Piskorski (Polish Academy of Sciences, Poland),
Gautam Kishore Shahi (University of Duisburg-Essen, Germany).
Tommaso Caselli (University of Groningen, the Netherlands),
Osman Mutlu (Koc University, Turkey),
Fırat Duruşan (Koc University, Turkey),
Ali Safaya (Koc University, Turkey),
Bharathi Raja Asoka Chakravarthi (Insight SFI Centre for Data Analytics, the United Kingdom),
Gautam Kishore Shahi (University of Duisburg-Essen, Germany),
Jakub Piskorski (Polish Academy of Sciences, Poland),
Matina Halkia (European Commission – Joint Research Centre, Italy),
Benjamin J. Radford (UNC Charlotte, the United States),
Mark Lee (University of Birmingham, the United Kingdom),
YiJyun Lin (University of Nevada, the United States),
Fredrik Olsson (RISE, Sweden),
Kristine Eck (Uppsala University, Sweden),
Nelleke Oostdijk (Radboud University, the Netherlands),
Francielle Vargas (University of São Paulo, Brazil),
Farhana Liza (University of Essex, the UK),
Nicoletta Calzolari (Institute for Computational Linguistics, Italy),
Milena Slavcheva (Bulgarian Academy of Sciences, Bulgaria),
Harish Tayyar Madabushi (University of Birmingham, the United Kingdom),
Ritesh Kumar (Dr. Bhimrao Ambedkar University, India),
Alexandra DeLucia (Johns Hopkins University, United States),
Jasmine Lorenzini (University of Geneva, Switzerland),
Kalliopi Zervanou Eindhoven (University of Technology the Netherlands),
Andrew Lee Halterman (Massachusetts Institute of Technology, the United States),
Marijn Schraagen (Utrecht University, the Netherlands).
Niklas Stoehr (ETH Zürich, Switzerland)
Bhatia, S., Lau, J. H., & Baldwin, T. (2020). You are right. I am ALARMED–But by Climate Change Counter Movement. arXiv preprint arXiv:2004.14907.
Boroş, E. (2018). Neural Methods for Event Extraction. Ph.D. thesis, Université Paris-Saclay.
Chang, K. W., Prabhakaran, V., & Ordonez, V. (2019, November). Bias and fairness in natural language processing. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): Tutorial Abstracts.
Chen M., Zhang H., Ning Q., Li M., Ji H., Roth D. (2021). Event-centric Natural Language Understanding. Proc. The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI2021) Tutorial. URL: https://blender.cs.illinois.edu/paper/eventtutorial2021.pdf
Coleman, P. T., Deutsch, M., & Marcus, E. C. (Eds.). (2014). The handbook of conflict resolution: Theory and practice. John Wiley & Sons.
Della Porta, D., & Diani, M. (Eds.). (2015). The Oxford handbook of social movements. Oxford University Press.
Hürriyetoğlu, A., Yörük, E., Yüret, D., Yoltar, Ç., Gürel, B., Duruşan, F., … & Akdemir, A. (2019, September). Overview of CLEF 2019 lab ProtestNews: extracting protests from news in a cross-context setting. In International Conference of the Cross-Language Evaluation Forum for European Languages (pp. 425-432). Springer, Cham. URL: http://ceur-ws.org/Vol-2380/paper_249.pdf
Hürriyetoğlu, A., Zavarella, V., Tanev, H., Yörük, E., Safaya, A., & Mutlu, O. (2020a, May). Automated Extraction of Socio-political Events from News (AESPEN): Workshop and Shared Task Report. In Proceedings of the Workshop on Automated Extraction of Socio-political Events from News 2020 (pp. 1-6).
Hürriyetoğlu, A., Yörük, E., Yüret, D., Mutlu, O., Yoltar, Ç., Duruşan, F., & Gürel, B. (2020b). Cross-context news corpus for protest events related knowledge base construction. arXiv preprint arXiv:2008.00351. In AutomatedKnowledge Base Construction (AKBC). URL: https://www.akbc.ws/2020/papers/7NZkNhLCjp
Lau, J. H., & Baldwin, T. (2020, July). Give Me Convenience and Give Her Death: Who Should Decide What Uses of NLP are Appropriate, and on What Basis?. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 2908-2913).
Pustejovsky, J., Castano, J. M., Ingria, R., Sauri, R., Gaizauskas, R. J., Setzer, A., … & Radev, D. R. (2003). TimeML: Robust specification of event and temporal expressions in text. New directions in question answering, 3, 28-34.
Schrodt, P. A. (2020, May). Keynote Abstract: Current Open Questions for Operational Event Data. In Proceedings of the Workshop on Automated Extraction of Socio-political Events from News 2020.
Wang, W., Kennedy, R., Lazer, D., & Ramakrishnan, N. (2016). Growing pains for global monitoring of societal events. Science, 353(6307), 1502-1503.
 https://www.cartercenter.org/peace/conflict_resolution/syria-conflict-resolution.html, accessed on September 28, 2020.
 https://en.wikipedia.org/wiki/Protests_over_responses_to_the_COVID-19_pandemic, accessed on September 28, 2020.