The 6th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE @ RANLP 2023, on September 7, 2023)
Please find the proceedings of CASE 2023 on https://aclanthology.org/
Nowadays, the unprecedented quantity of easily accessible data on social, political, and economic processes offers ground-breaking potential in guiding data-driven analysis in social and human sciences and in driving informed policy-making processes. Governments, multilateral organizations, and local and global NGOs present an increasing demand for high-quality information about a wide variety of events ranging from political violence, environmental disasters, and conflict, to international economic and health crises (Coleman et al. 2014; Porta and Diani, 2015) to prevent or resolve conflicts, provide relief for those that are afflicted, or improve the lives of and protect citizens in a variety of ways. Citizen actions against the COVID measures in the period 2020-2022 and the Russia – Ukraine war are only two examples where event-centered data can contribute to better understanding of real-life situations. Finally, these efforts respond to “growing public interest in up-to-date information on crowds” as well.
Event extraction has long been a challenge for the natural language processing (NLP) community as it requires sophisticated methods in defining event ontologies, creating language resources, developing algorithmic approaches and ML models (Pustojevsky et al. 2003; Boroş, 2018; Chen et al. 2021). Previous issues of the CASE-series of workshops have featured works which use BERT and other deep learning models, syntactic parsing, semantic argument structure analysis, temporal and space reasoning, lexical learning, and other NLP methods and algorithms. Detecting and extracting information about socio political events is a complex NLP task: events can be described via elaborated syntactic and semantic language structures; event descriptions may enter into different semantic relations between each other, such as coreference, causality, inclusion, spatio-temporal proximity and others.
Events as linguistic phenomena are usually modeled through frames and ontologies and event types are often represented via elaborated taxonomies. Detecting socio-political events in real world texts poses problems, which originate from the dynamics of the activity of the governments, political parties, movements and other socially active groups. They may frequently change their leading figures, strategies and organization. These factors can make existing statistical models and knowledge bases less relevant as the time passes and require development of methods which rely on limited data, such as few-shot learning, man-in-the loop or other specific learning strategies.
Social and political scientists have been working to create socio-political event (SPE) databases such as ACLED, EMBERS, GDELT, ICEWS, MMAD, PHOENIX, POLDEM, SPEED, TERRIER, and UCDP following similar steps for decades. These projects and the new ones increasingly rely on machine learning (ML), deep learning (DL), and NLP methods to deal better with the vast amount and variety of data in this domain (Hürriyetoğlu et al. 2020). Unfortunately automated approaches suffer from major issues like bias, limited generalizability, class imbalance, training data limitations, and ethical issues that have the potential to affect the results and their use drastically (Lau and Baldwin 2020; Bhatia et al. 2020; Chang et al. 2019). Moreover, the results of the automated systems for socio-political events (SPE) information collection have neither been comparable to each other nor been of sufficient quality (Wang et al. 2016; Schrodt 2020). SPEs are varied and nuanced. Both the political context and the local language used may affect whether and how they are reported.
We invite contributions from researchers in computer science, NLP, ML, DL, AI, socio-political sciences, conflict analysis and forecasting, peace studies, as well as computational social science scholars involved in the collection and utilization of SPE data. Academic workshops specific to tackling event information in general or for analyzing text in specific domains such as health, law, finance, and biomedical sciences have significantly accelerated progress in these topics and fields, respectively. However, there has not been a comparable effort for handling SPEs. We fill this gap. We invite work on all aspects of automated coding and analysis of SPEs and events in general from mono- or multi-lingual text sources. This includes (but is not limited to) the following topics: 1) Extracting events and their arguments in and beyond a sentence or document, event coreference resolution. 2) Research in NLP technologies, related to event detection, such as: geocoding, temporal reasoning, argument structure detection, syntactic and semantic analysis of event structures, text classification for event type detection, learning event-related lexica, co-reference in event descriptions, machine translation for multilingual event detection, named entity recognition, fake news analysis, text similarity and others with focus on real or potential event detection applications. 3) New datasets, training data collection and annotation for event information. 4) Event-event relations, e.g., subevents, main events, spatio-temporal relations, causal relations. 5) Event dataset evaluation in light of reliability and validity metrics. 6) Defining, populating, and facilitating event schemas and ontologies. 7) Automated tools and pipelines for event collection related tasks. 8) Lexical, syntactic, semantic, discursive, and pragmatic aspects of event manifestation. 9) Methodologies for development, evaluation, and analysis of event datasets. 10) Applications of event databases, e.g. early warning, conflict prediction, policymaking. 11) Estimating what is missing in event datasets using internal and external information. 12) Detection of new and emerging SPE types, e.g. creative protests, 13) Release of new event datasets, 14) Bias and fairness of the sources and event datasets. 15) Ethics, misinformation, privacy, and fairness concerns pertaining to event datasets. 16) Copyright issues on event dataset creation, dissemination, and sharing. 17) Cross-lingual, multilingual and multimodal aspects in event analysis, 18- Climate change and conflict-related resources and approaches related to contentious politics around climate change. Moreover, we will encourage submissions of new system description papers on our available benchmarks.
[1] https://sites.google.com/view/crowdcountingconsortium/faqs
[2] https://acleddata.com/political-violence-targeting-women/#curated
REFERENCES
Bhatia, S., Lau, J. H., & Baldwin, T. (2020). You are right. I am ALARMED–But by Climate Change Counter Movement.
Boroş, E. (2018). Neural Methods for Event Extraction.
Chang, K. W., Prabhakaran, V., & Ordonez, V. (2019, November). Bias and fairness in natural language processing.
Chen M., Zhang H., Ning Q., Li M., Ji H., Roth D. (2021). Event-centric Natural Language Understanding.
Coleman, P. T., Deutsch, M., & Marcus, E. C. (Eds.). (2014). The handbook of conflict resolution: Theory and practice.
Della Porta, D., & Diani, M. (Eds.). (2015). The Oxford handbook of social movements.
Hürriyetoğlu, A., Zavarella, V., Tanev, H., Yörük, E., Safaya, A., & Mutlu, O. (2020, May). Automated Extraction of Socio-political Events from News (AESPEN): Workshop and Shared Task Report.
Lau, J. H., & Baldwin, T. (2020, July). Give Me Convenience and Give Her Death: Who Should Decide What Uses of NLP are Appropriate, and on What Basis?.
Pustejovsky, J., Castano, J. M., Ingria, R., Sauri, R., Gaizauskas, R. J., Setzer, A., … & Radev, D. R. (2003). TimeML: Robust specification of event and temporal expressions in text.
Schrodt, P. A. (2020, May). Keynote Abstract: Current Open Questions for Operational Event Data.
Wang, W., Kennedy, R., Lazer, D., & Ramakrishnan, N. (2016). Growing pains for global monitoring of societal events.
⇒ Workshop paper submission deadline: 10 July 2023 24 July 2023 (AoE)
⇒ Workshop paper acceptance notification: 5 August 2023 12 August 2023
⇒ Workshop paper camera-ready versions: 25 August 2023
⇒ Workshop date: 7 September 2023
CALL FOR PAPERS
We invite contributions from researchers in computer science, NLP, ML, DL, AI, socio-political sciences, conflict analysis and forecasting, peace studies, as well as computational social science scholars involved in the collection and utilization of SPE data. Social and political scientists will be interested in reporting and discussing their approaches and observing what the state-of-the-art text processing systems can achieve for their domain. Computational scholars will have the opportunity to illustrate the capacity of their approaches in this domain and benefit from being challenged by real-world use cases. Academic workshops specific to tackling event information in general or for analyzing text in specific domains such as health, law, finance, and biomedical sciences have significantly accelerated progress in these topics and fields, respectively. However, there has not been a comparable effort for handling SPEs. We fill this gap. We invite work on all aspects of automated coding and analysis of SPEs and events in general from mono- or multi-lingual text sources. This includes (but is not limited to) the following topics
- ⇒ Extracting events in and beyond a sentence, event coreference resolution
- ⇒ New datasets, training data collection and annotation for event information
- ⇒ Event-event relations, e.g., subevents, main events, causal relations
- ⇒ Event dataset evaluation in light of reliability and validity metrics
- ⇒ Defining, populating, and facilitating event schemas and ontologies
- ⇒ Automated tools and pipelines for event collection related tasks
- ⇒ Lexical, syntactic, discursive, and pragmatic aspects of event manifestation
- ⇒ Methodologies for development, evaluation, and analysis of event datasets
- ⇒ Applications of event databases, e.g. early warning, conflict prediction, policymaking
- ⇒ Estimating what is missing in event datasets using internal and external information
- ⇒ Detection of new SPE types, e.g. creative protests, cyber activism, COVID19 related
- ⇒ Release of new event datasets
- ⇒ Bias and fairness of the sources and event datasets
- ⇒ Ethics, misinformation, privacy, and fairness concerns pertaining to event datasets
- ⇒ Copyright issues on event dataset creation, dissemination, and sharing
We encourage submissions of new system description papers on our available benchmarks (ProtestNews @ CLEF 2019, AESPEN @ LREC 2020, and CASE @ 2021). Please contact the organizers if you would like to access the data.
Submission Format
CASE 2023 will solicit short and long papers reporting original and unpublished research on the topics listed above. The papers should emphasize obtained results rather than intended work and should indicate clearly the state of completion of the results.
Submission
This call solicits short and long papers reporting original and unpublished research on the topics listed above. The papers should emphasize obtained results rather than intended work and should indicate clearly the state of completion of the reported results. The page limits and content structure announced at ACL ARR page (https://aclrollingreview.org/cfp) should be followed for both short and long papers.
Papers should be submitted on the START page of the workshop (https://softconf.com/ranlp23/CASE/) or on ARR page (TBA on the workshop website) in PDF format, in compliance with the ACL publication author guidelines for ACL publications https://acl-org.github.io/ACLPUB/formatting.html
The reviewing process will be double-blind and papers should not include the author’s names and affiliations. Each submission will be reviewed by at least three members of the program committee. The workshop proceedings will be published on ACL Anthology.
PUBLICATION:
Participants in the Shared Task are expected to submit a paper to the workshop. Submitting a paper is not mandatory for participating in the Shared Task. Papers must follow the CASE 2023 workshop submission instructions (ACL 2022 style template: T.B.D.) and will undergo regular peer review. Their acceptance will not depend on the results obtained in the shared task but on the quality of the paper. Authors of accepted papers will be informed about the evaluation results of their systems prior to the paper submission deadline (see the important dates).
TASK 1: MULTILINGUAL PROTEST NEWS DETECTION
The performance of an automated system depends on the target event type as it may be broad or potentially the event trigger(s) can be ambiguous. The context of the trigger occurrence may need to be handled as well. For in- stance, the ‘protest’ event type may be synonymous with ‘demonstration’ or not in a specific context. Moreover, the hypothetical cases such as future protest plans may need to be excluded from the results. Finally, the relevance of a protest depends on the actors as in a contentious political event only citizen-led events are in the scope. This challenge becomes even harder in a cross-lingual and zero-shot setting in case training data are not available in new languages. We tackle the task in four steps and hope state-of-the-art approaches will yield optimal results.
Contact person: Ali Hürriyetoğlu (ali.hurriyetoglu@gmail.com)
Github: https://github.com/emerging-
TASK 2: AUTOMATICALLY REPLICATING MANUALLY CREATED EVENT DATASETS
There is a mismatch between the event information collected between automated and manual approaches. We aim to identify similarities and differences between the results of these paradigms for creating event datasets. The participants of Task 1 will be invited to run the systems they will develop to tackle Task 1 on a text archive. Participation in Task 1 is not a precondition to participate in Task 2.
Contact person: Hristo Tanev (hristo.tanev@ec.europa.eu) and Onur Uca (onuruca@mersin.edu.tr)
Github: https://github.com/zavavan/
TASK 3: EVENT CAUSALITY IDENTIFICATION
Causality is a core cognitive concept and appears in many natural language processing (NLP) works that aim to tackle inference and understanding. We are interested in studying event causality in the news and, therefore, introduce the Causal News Corpus. The Causal News Corpus consists of 3,767 event sentences extracted from protest event news, that have been annotated with sequence labels on whether it contains causal relations or not. Subsequently, causal sentences are also annotated with Cause, Effect and Signal spans. Our subtasks work on the Causal News Corpus, and we hope that accurate, automated solutions may be proposed for the detection and extraction of causal events in news.
Contact person: Fiona Anting Tan (tan.f@u.nus.edu)
Github: https://github.com/tanfiona/
TASK 3: MULTIMODAL HATE SPEECH EVENT DETECTION
Hate speech detection is one of the most important aspects of event identification during political events like invasions. In the case of hate speech detection, the event is the occurrence of hate speech, the entity is the target of the hate speech, and the relationship is the connection between the two. Since multimodal content is widely prevalent across the internet, the detection of hate speech in text-embedded images is very important. Given a text-embedded image, this task aims to automatically identify the hate speech and its targets. This task will have two subtasks.
Contact person: Surendrabikram Thapa (surendrabikram@vt.edu)
Github: https://github.com/
**** Deadlines for the Shared tasks ****
TASK 1, 3, 4:
Training & Validation data available: May 1, 2023
Test data available: Jun 15, 2023
Test start: Jun 15, 2023
Test end: Jun 30, 2023
System Description Paper submissions due: Jul 10, 2023
Notification to authors after review: Aug 5, 2023
Camera-ready: Aug 25, 2023
TASK 2:
Sample Text archive is available: May 22, 2023
Text archive for evaluation is available: July 1, 2023
Evaluation period starts: July 1, 2023
Evaluation period ends: July 24, 2023
System Description Paper submissions due: July 31, 2023
Notification to authors after review: August 7, 2023
Camera-ready: August 25, 2023
⇒ Workshop date: 7 September 2023
We will continue our tradition of inviting keynote speakers both in social science and computer science. The social science keynote will be delivered by Prof. Erdem Yörük with the title “Using Automated Text Processing to Understand Social Movements and Human Behaviour” and the computational one will be delivered by Prof. Ruslan Mitkov with the title “TBD”. Additionally, we will reserve a session for invited talks (as we did in CASE 2021 and CASE 2022) that will be selected from the “findings of <venue>” category of the conferences.
—————————————————
Title: With a little help from NLP: My Language Technology applications with impact on society (and my thoughts on the future of NLP)
Abstract: The talk will present original methodologies developed by the speaker, underpinning implemented Language Technology tools which are already having an impact on the following areas of society: e-learning, translation and interpreting and care for people with language disabilities.
The first part of the presentation will introduce an original methodology and tool for generating multiple-choice tests from electronic textbooks. The application draws on a variety of Natural Language Processing (NLP) techniques which include term extraction, semantic computing and sentence transformation. The presentation will include an evaluation of the tool which demonstrates that generation of multiple-choice tests items with the help of this tool is almost four times faster than manual construction and the quality of the test items is not compromised. This application benefits e-learning users (both teachers and students) and is an example of how NLP can have a positive societal impact, in which the speaker passionately believes. The latest version of the system based on deep learning techniques will also be briefly introduced.
The talk will go on to discuss two other original recent projects which are also related to the application of NLP beyond academia. First, a project, whose objective is to develop next-generation translation memory tools for translators and, in the near future, for interpreters, will be briefly presented. Finally, a project will be outlined which focuses on helping users with autism to read and better understand texts. The speaker will put forward ideas as to what we can do next.
The presentation will finish with a brief outline of the latest (and forthcoming) research topics (to be) which the speaker plans to pursue and his vision on the future NLP applications. In particular, he will share his views as to how NLP will develop and what should be done for NLP to be more successful, more inclusive and more ethical.
Abstract: The Bulgarian Event Corpus is being constructed within the CLaDA-BG (Bulgarian National Interdisciplinary Research E-Infrastructure for Bulgarian Language and Cultural Heritage Resources and Technologies. In the spirit of European CLARIN and DARIAH) we aim to support researchers in Humanities and Social Sciences (H&SS) to access the necessary datasets for their research. The different types of objects of study, representation and search are integrated on the basis of common metadata and content categories. The approach for interlinking of the datasets is called contextualization. The implementation of contextualization in CLaDA-BG will utilize a common Bulgaria-centered knowledge graph – BGKG. The knowledge facts within BGKG are constructed around events of different types. Thus, construction of BGKG requires a set of appropriate language resources for training of Bulgarian language pipeline for extraction of events from text documents. A key element within these language resources is the Bulgarian Event Corpus.
In the talk I will present the design of the annotation schema, the annotation process, relation to ontologies and RDF representation. We have started with the CIDOC-CRM ontology for the construction of the annotation schema. This ontology provides a good conceptualization of events motivated by the domain of museums which is appropriate for our goals. During the design of the annotation schema, we extended the ontology with new events depending on the content of the corpus. The documents to be annotated were selected from scientific and popular publications of the partners within CLaDA-BG and articles from Bulgarian Wikipedia. The annotation is done on several layers: Named Entities, Events, Roles, Linking, terms and keywords.
—————————————————
Title: Using Automated Text Processing to Understand Social Movements and Human Behaviour
Abstract: Erdem Yörük’s keynote will describe two large-scale ERC-funded projects that employs computational social science methods to extract data on protests and public opinion. The first is the Global Contentious Politics Dataset (GLOCON) Project. Glocon ( available at glocon.ku.edu.tr) is the first automated comparative protest event database on emerging markets using local news sources. The countries included in the GLOCON dataset are India, South Africa, Argentina, Brazil and Turkey. Glocon has been created by using natural language processing, and machine learning in order to extract protest data from online news sources. The project develops fully automated tools for document classification, sentence classification, and detailed protest event information extraction that will perform in a multi-source, multi-context protest event setting with consistent performances of recall and precision for each country context. GLOCON counts the number of events such as strikes, rallies, boycotts, protests, riots, and demonstrations, i.e. the “repertoire of contention,” and operationalizes protest events by various social groups. The project has developed a novel bottom-up methodology that is based on a random sampling of news archives, as opposed to keyword filtering. The high-quality GSC is designed in a way that can accommodate context variability from the outset as it is compiled randomly from a variety of news sources from different countries. The second one, Politus Project, aims at scaling up traditional survey polls for public opinion research with AI-based social data analytics. Politus develops an AI-based innovation that combines quantitative and computational methods to create a data platform that delivers representative, valid, instant, real-time, multi-country, and multi-language panel data on key political and social trends. The project will collect content information from Twitter and process it with AI tools to generate a large set of indicators on political and social trends through its data platform. The deep learning models and NLP tools will be designed from the ground up as language-independent and generalizable systems. The platform will deliver geolocated hourly panel data on demography, ideology, topics, values, and beliefs, behavior, sentiment, emotion, attitudes, and stance of users aggregated at the district level. In this keynote, Dr. Yörük will describe the general methodology of the projects, including data collection, data analysis, and their approach for representativeness, which is based on multilevel regression with post-stratification.
Short bio: Erdem Yörük is an Associate Professor in the Department of Sociology at Koç University and an Associate Member in the Department of Social Policy and Intervention at University of Oxford. He has serves as the principal investigator of the ERC-funded project “Emerging Welfare” (The New Politics of Welfare: Towards an “Emerging Markets” Welfare State Regime) (emw.ku.edu.tr), ERC-funded Politus Project (politusanalytics.com) and the H2020 project Social Comquant (socialcomquant.ku.edu.tr). He holds a Ph.D. from the Department of Sociology at Johns Hopkins University (2012). His work focuses on social welfare and social policy, social movements, political sociology, and computational social sciences. His work has been supported by the National Science Foundation (NSF), Ford Foundation, FP7 Marie Curie CIG, European Research Council StG, ERC PoC, H2020, and the Science Academy of Turkey. His projects have created two datasets on welfare (glow.ku.edu.tr) and protest movements (glocon.ku.edu.tr). His articles have appeared in World Development, Governance, Politics & Society, Journal of European Social Policy, New Left Review, Current Sociology, South Atlantic Quarterly, American Behavioral Scientist, International Journal of Communication, Social Policy and Administration, and Social Indicators Research, among others. His book, “The Politics of the Welfare State in Turkey” was published by the University of Michigan Press in May 2022.
⇒ Workshop date: 7 September 2023
Session I: Machine Learning for Event Extraction
(ChaIR: Erik Velldal)
Speaker: Peter Ivanoc (in-person)
Speaker: Osman Mutlu (online)
Speaker: Alexandra DeLucia (online)
Speaker: Javier Osorio (in-person)
Spekaer: Hristo Tanev (in-person)
“Using Automated Text Processing to Understand Social Movements and Human Behaviour”
Session II: Knowledge graphs
(CHAIR: HRISTO TANEV)
Speaker: Milena Slavcheva (in-person)
Speaker: Oktie Hassanzadeh (in-person)
Session III: Shared task 4: Multimodal Hate Speech Event Detection
(CHAIR: MILENA SLAVCHEVA)
Speaker: Surendrabikam Thapa (in-person)
Speaker: Cagri Toraman (in-person)
Speaker: Mohammad Zohair
Speaker: Mrithula KL
Session IV: Shared task 3: Geocoding and Causality Identification
(CHAIR: ÇAĞRI TORAMAN)
Speaker: Timo Pierre Schrader (in-person)
(included in the task overview) Amrita Bhatia, Ananya Thomas, Nitansh Jain and Jatin Bedi
ORGANIZATION COMMITTEE
Erdem Yörük is an Associate Professor in the Department of Sociology at KU and an Associate Member in the Department of Social Policy and Intervention at University of Oxford. His work focuses on social welfare and social policy, social movements, political sociology, and comparative and historical sociology.
Milena Slavcheva is a researcher in computational linguistics and human language technologies, more specifically in lexical semantics, object-oriented modeling, standardisation of language resources, IT applications for evidence-based policy making.
PROGRAM COMMITTEE
T.B.A.
COLLABORATORS & CONTRIBUTORS
Moreover, the following collaborators have expressed their support and will be contributing to both organization and program committees (without any particular order):
Andrew Halterman | Michigan State University |
Giuseppe Tirone | European Commission, Joint Research Centre |
Osman Mutlu | Koc University |
Tadashi Nomoto | National Institute of Japanese Literature |
Hristo Tanev | European Commission, Joint Research Centre |
Onur Uca | Mersin University |
Peratham Wiriyathammabhum | – |
Marijn Schraagen | Utrecht University |
Gaurav Singh | S&P Global |
Fiona Anting Tan | University of Singapore |
Surendrabikram Thapa | Virginia Tech |
Alexandra DeLucia | Johns Hopkins University |
Kumari Neha | Indraprastha Institute of Information Technology Delhi |
Maria Eskevich | Huygens Institute |
Guanqun Yang | Stevens Institute of Technology |
Cagri Toraman | Aselsan, Turkey |
Debanjana Kar | IBM |
Man Luo | Arizona State University |
Nelleke Oostdijk | Radboud University |
Hansi Hettiarachchi | Birmingham City University |