The 6th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE @ RANLP 2023)
Nowadays, the unprecedented quantity of easily accessible data on social, political, and economic processes offers ground-breaking potential in guiding data-driven analysis in social and human sciences and in driving informed policy-making processes. Governments, multilateral organizations, and local and global NGOs present an increasing demand for high-quality information about a wide variety of events ranging from political violence, environmental disasters, and conflict, to international economic and health crises (Coleman et al. 2014; Porta and Diani, 2015) to prevent or resolve conflicts, provide relief for those that are afflicted, or improve the lives of and protect citizens in a variety of ways. Citizen actions against the COVID measures in the period 2020-2022 and the Russia – Ukraine war are only two examples where event-centered data can contribute to better understanding of real-life situations. Finally, these efforts respond to “growing public interest in up-to-date information on crowds” as well.
Event extraction has long been a challenge for the natural language processing (NLP) community as it requires sophisticated methods in defining event ontologies, creating language resources, developing algorithmic approaches and ML models (Pustojevsky et al. 2003; Boroş, 2018; Chen et al. 2021). Previous issues of the CASE-series of workshops have featured works which use BERT and other deep learning models, syntactic parsing, semantic argument structure analysis, temporal and space reasoning, lexical learning, and other NLP methods and algorithms. Detecting and extracting information about socio political events is a complex NLP task: events can be described via elaborated syntactic and semantic language structures; event descriptions may enter into different semantic relations between each other, such as coreference, causality, inclusion, spatio-temporal proximity and others.
Events as linguistic phenomena are usually modeled through frames and ontologies and event types are often represented via elaborated taxonomies. Detecting socio-political events in real world texts poses problems, which originate from the dynamics of the activity of the governments, political parties, movements and other socially active groups. They may frequently change their leading figures, strategies and organization. These factors can make existing statistical models and knowledge bases less relevant as the time passes and require development of methods which rely on limited data, such as few-shot learning, man-in-the loop or other specific learning strategies.
Social and political scientists have been working to create socio-political event (SPE) databases such as ACLED, EMBERS, GDELT, ICEWS, MMAD, PHOENIX, POLDEM, SPEED, TERRIER, and UCDP following similar steps for decades. These projects and the new ones increasingly rely on machine learning (ML), deep learning (DL), and NLP methods to deal better with the vast amount and variety of data in this domain (Hürriyetoğlu et al. 2020). Unfortunately automated approaches suffer from major issues like bias, limited generalizability, class imbalance, training data limitations, and ethical issues that have the potential to affect the results and their use drastically (Lau and Baldwin 2020; Bhatia et al. 2020; Chang et al. 2019). Moreover, the results of the automated systems for socio-political events (SPE) information collection have neither been comparable to each other nor been of sufficient quality (Wang et al. 2016; Schrodt 2020). SPEs are varied and nuanced. Both the political context and the local language used may affect whether and how they are reported.
We invite contributions from researchers in computer science, NLP, ML, DL, AI, socio-political sciences, conflict analysis and forecasting, peace studies, as well as computational social science scholars involved in the collection and utilization of SPE data. Academic workshops specific to tackling event information in general or for analyzing text in specific domains such as health, law, finance, and biomedical sciences have significantly accelerated progress in these topics and fields, respectively. However, there has not been a comparable effort for handling SPEs. We fill this gap. We invite work on all aspects of automated coding and analysis of SPEs and events in general from mono- or multi-lingual text sources. This includes (but is not limited to) the following topics: 1) Extracting events and their arguments in and beyond a sentence or document, event coreference resolution. 2) Research in NLP technologies, related to event detection, such as: geocoding, temporal reasoning, argument structure detection, syntactic and semantic analysis of event structures, text classification for event type detection, learning event-related lexica, co-reference in event descriptions, machine translation for multilingual event detection, named entity recognition, fake news analysis, text similarity and others with focus on real or potential event detection applications. 3) New datasets, training data collection and annotation for event information. 4) Event-event relations, e.g., subevents, main events, spatio-temporal relations, causal relations. 5) Event dataset evaluation in light of reliability and validity metrics. 6) Defining, populating, and facilitating event schemas and ontologies. 7) Automated tools and pipelines for event collection related tasks. 8) Lexical, syntactic, semantic, discursive, and pragmatic aspects of event manifestation. 9) Methodologies for development, evaluation, and analysis of event datasets. 10) Applications of event databases, e.g. early warning, conflict prediction, policymaking. 11) Estimating what is missing in event datasets using internal and external information. 12) Detection of new and emerging SPE types, e.g. creative protests, 13) Release of new event datasets, 14) Bias and fairness of the sources and event datasets. 15) Ethics, misinformation, privacy, and fairness concerns pertaining to event datasets. 16) Copyright issues on event dataset creation, dissemination, and sharing. 17) Cross-lingual, multilingual and multimodal aspects in event analysis, 18- Climate change and conflict-related resources and approaches related to contentious politics around climate change. Moreover, we will encourage submissions of new system description papers on our available benchmarks.
[1] https://sites.google.com/view/crowdcountingconsortium/faqs
[2] https://acleddata.com/political-violence-targeting-women/#curated
REFERENCES
Bhatia, S., Lau, J. H., & Baldwin, T. (2020). You are right. I am ALARMED–But by Climate Change Counter Movement.
Boroş, E. (2018). Neural Methods for Event Extraction.
Chang, K. W., Prabhakaran, V., & Ordonez, V. (2019, November). Bias and fairness in natural language processing.
Chen M., Zhang H., Ning Q., Li M., Ji H., Roth D. (2021). Event-centric Natural Language Understanding.
Coleman, P. T., Deutsch, M., & Marcus, E. C. (Eds.). (2014). The handbook of conflict resolution: Theory and practice.
Della Porta, D., & Diani, M. (Eds.). (2015). The Oxford handbook of social movements.
Hürriyetoğlu, A., Zavarella, V., Tanev, H., Yörük, E., Safaya, A., & Mutlu, O. (2020, May). Automated Extraction of Socio-political Events from News (AESPEN): Workshop and Shared Task Report.
Lau, J. H., & Baldwin, T. (2020, July). Give Me Convenience and Give Her Death: Who Should Decide What Uses of NLP are Appropriate, and on What Basis?.
Pustejovsky, J., Castano, J. M., Ingria, R., Sauri, R., Gaizauskas, R. J., Setzer, A., … & Radev, D. R. (2003). TimeML: Robust specification of event and temporal expressions in text.
Schrodt, P. A. (2020, May). Keynote Abstract: Current Open Questions for Operational Event Data.
Wang, W., Kennedy, R., Lazer, D., & Ramakrishnan, N. (2016). Growing pains for global monitoring of societal events.
⇒ Workshop paper submission deadline: 10 July 2023
⇒ Workshop paper acceptance notification: 5 August 2023
⇒ Workshop paper camera-ready versions: 25 August 2023
CALL FOR PAPERS
We invite contributions from researchers in computer science, NLP, ML, DL, AI, socio-political sciences, conflict analysis and forecasting, peace studies, as well as computational social science scholars involved in the collection and utilization of SPE data. Social and political scientists will be interested in reporting and discussing their approaches and observing what the state-of-the-art text processing systems can achieve for their domain. Computational scholars will have the opportunity to illustrate the capacity of their approaches in this domain and benefit from being challenged by real-world use cases. Academic workshops specific to tackling event information in general or for analyzing text in specific domains such as health, law, finance, and biomedical sciences have significantly accelerated progress in these topics and fields, respectively. However, there has not been a comparable effort for handling SPEs. We fill this gap. We invite work on all aspects of automated coding and analysis of SPEs and events in general from mono- or multi-lingual text sources. This includes (but is not limited to) the following topics
- ⇒ Extracting events in and beyond a sentence, event coreference resolution
- ⇒ New datasets, training data collection and annotation for event information
- ⇒ Event-event relations, e.g., subevents, main events, causal relations
- ⇒ Event dataset evaluation in light of reliability and validity metrics
- ⇒ Defining, populating, and facilitating event schemas and ontologies
- ⇒ Automated tools and pipelines for event collection related tasks
- ⇒ Lexical, syntactic, discursive, and pragmatic aspects of event manifestation
- ⇒ Methodologies for development, evaluation, and analysis of event datasets
- ⇒ Applications of event databases, e.g. early warning, conflict prediction, policymaking
- ⇒ Estimating what is missing in event datasets using internal and external information
- ⇒ Detection of new SPE types, e.g. creative protests, cyber activism, COVID19 related
- ⇒ Release of new event datasets
- ⇒ Bias and fairness of the sources and event datasets
- ⇒ Ethics, misinformation, privacy, and fairness concerns pertaining to event datasets
- ⇒ Copyright issues on event dataset creation, dissemination, and sharing
We encourage submissions of new system description papers on our available benchmarks (ProtestNews @ CLEF 2019, AESPEN @ LREC 2020, and CASE @ 2021). Please contact the organizers if you would like to access the data.
Submission Format
CASE 2023 will solicit short and long papers reporting original and unpublished research on the topics listed above. The papers should emphasize obtained results rather than intended work and should indicate clearly the state of completion of the results.
Submission
This call solicits short and long papers reporting original and unpublished research on the topics listed above. The papers should emphasize obtained results rather than intended work and should indicate clearly the state of completion of the reported results. The page limits and content structure announced at ACL ARR page (https://aclrollingreview.org/cfp) should be followed for both short and long papers.
Papers should be submitted on the START page of the workshop (https://softconf.com/ranlp23/CASE/) or on ARR page (TBA on the workshop website) in PDF format, in compliance with the ACL publication author guidelines for ACL publications https://acl-org.github.io/ACLPUB/formatting.html
The reviewing process will be double-blind and papers should not include the author’s names and affiliations. Each submission will be reviewed by at least three members of the program committee. The workshop proceedings will be published on ACL Anthology.
CASE 2023 will contain shared tasks we have prepared. These are 1) a multilingual SPE information classification and extraction task by extending the list of the languages covered in CASE 2021 (English, Spanish, Portuguese, and Hindi) with Turkish, Mandarin, and Urdu, 2) a challenge on replicating spatio-temporal distribution of protests pertaining to Political violence targeting women (PVTW), and 3) a task on event causality identification. These tasks are continuation of the tasks in CASE 2022, but with more or different data. Participating teams, which were 15 in CASE 2021 and 20 in CASE 2022, will be required to submit a system description report, which will be peer reviewed by the program committee. We expect a comparable number of participants in CASE 2023. This shared task series has been significantly contributing to advancement of the automated event extraction techniques. Contribution to multilinguality, zero-shot evaluation, event causality detection, and comparison between manual and automated event information collection results are the unique characteristics of this shared task series.
TASK 1 & 2: MULTILINGUAL PROTEST EVENT DETECTION:
Task 1- Multilingual protest news detection: This is the same shared task organized at CASE 2021 (For more info: https://aclanthology.org/2021.case-1.11/) But this time there will be additional data and languages at the evaluation stage. Contact person: Ali Hürriyetoğlu (ali.hurriyetoglu@gmail.com). Github: https://github.com/emerging-welfare/case-2022-multilingual-event
Task 2- Automatically replicating manually created event datasets: The participants of Task 1 will be invited to run the systems they will develop to tackle Task 1 on a news archive (For more info https://aclanthology.org/2021.case-1.27/). Contact person: Hristo Tanev (htanev@gmail.com). Github: https://github.com/zavavan/case2022_task2, please also see https://github.com/emerging-welfare/case-2022-multilingual-event
TASK 3: EVENT CAUSALITY IDENTIFICATION:
Task 3- Event causality identification: Causality is a core cognitive concept and appears in many natural language processing (NLP) works that aim to tackle inference and understanding. We are interested to study event causality in news, and therefore, introduce the Causal News Corpus. The Causal News Corpus consists of 3,559 event sentences, extracted from protest event news, that have been annotated with sequence labels on whether it contains causal relations or not. Subsequently, causal sentences are also annotated with Cause, Effect, and Signal spans. Our two subtasks (Sequence Classification and Span Detection) work on the Causal News Corpus, and we hope that accurate, automated solutions may be proposed for the detection and extraction of causal events in news. Contact person: Fiona Anting Tan (tan.f@u.nus.edu). Github: https://github.com/tanfiona/CausalNewsCorpus
- ⇒ Shared Task Soft Launch: 15 April
- ⇒ Shared Task Official Launch: 01 May
- ⇒ Test period: 15-30 June
- ⇒ Paper submission deadline: 10 July
- ⇒ Paper acceptance notification: 5 August
- ⇒ Paper camera-ready versions: 25 August
- ⇒ Camera-ready proceedings ready: 31 August
- ⇒ CASE Workshop: 7-8 September
PUBLICATION:
Participants in the Shared Task are expected to submit a paper to the workshop. Submitting a paper is not mandatory for participating in the Shared Task. Papers must follow the CASE 2023 workshop submission instructions (ACL 2022 style template: T.B.D.) and will undergo regular peer review. Their acceptance will not depend on the results obtained in the shared task but on the quality of the paper. Authors of accepted papers will be informed about the evaluation results of their systems prior to the paper submission deadline (see the important dates).
We will continue our tradition of inviting one social and one computational scientist as keynote speakers. The social science keynote will be delivered by Erdem Yörük with the title “Using Automated Text Processing to Understand Social Movements and Human Behaviour” and the computational one will be delivered by Ruslan Mitkov with the title “TBD”. Additionally, we will reserve a session for invited talks (as we did in CASE 2021 and CASE 2022) that will be selected from the “findings of <venue>” category of the conferences.
T.B.A
ORGANIZATION COMMITTEE
Erdem Yörük is an Associate Professor in the Department of Sociology at KU and an Associate Member in the Department of Social Policy and Intervention at University of Oxford. His work focuses on social welfare and social policy, social movements, political sociology, and comparative and historical sociology.
Milena Slavcheva is a researcher in computational linguistics and human language technologies, more specifically in lexical semantics, object-oriented modeling, standardisation of language resources, IT applications for evidence-based policy making.
PROGRAM COMMITTEE
T.B.A.
COLLABORATORS & CONTRIBUTORS
Moreover, the following collaborators have expressed their support and will be contributing to both organization and program committees (without any particular order):