Please join the discussion on:
We invite you to watch sessions of online Automatic Extraction of Socio-Political Events from News (AESPEN) workshop. You can find the links below:
May 12th, 2020 –> The workshop will be held online on June 09-10-11 between 11:00 and 13:00 UTC. Please register until June 5th, 2020 before 23:59 (AoE) using the registration form. Please find the workshop program and proceedings here.
Call for Papers
Automatic construction of event databases has long been a challenge for the natural language processing (NLP) community in terms of algorithmic approaches and language resources. At the same time, social and political scientists have been working on creating socio-political event databases for decades using manual, semi-automatic, and automatic approaches. However, the results yielded by these approaches to date are either not of sufficient quality or require tremendous effort to be replicated on new data. On the one hand, manual or semi-automatic methods require high-quality human effort; on the other hand, state-of-the-art event automated detection systems are not accurate enough for their output being directly usable without human moderation. Finally, the NLP community has not achieved a consensus on the treatment of events both in terms of task definition and appropriate techniques for their detection.
Given the aforementioned limitations, there is an increasing tendency to rely on machine learning (ML) and NLP methods to deal better with the vast amount and variety of data to be collected. This workshop aims to inspire the emergence of innovative technological and scientific solutions in the field of event detection and event metadata extraction from news, as well as the development of evaluation metrics for event recognition. Moreover, the workshop will aim at triggering a deeper understanding of the usability of socio-political event datasets.
Lorenzini, J., Makarov, P., Kriesi, H., & Wueest, B. (2016). Towards a Dataset of Automatically Coded Protest Events from English-language Newswire Documents. In Paper presented at the Amsterdam Text Analysis Conference URL: http://bruno-wueest.ch/assets/files/Lorenzini_etal_2016.pdf
Wang, W., Kennedy, R., Lazer, D., & Ramakrishnan, N. (2016). Growing pains for global monitoring of societal events. Science, 353(6307), 1502-1503. URL: http://science.sciencemag.org/content/353/6307/1502
Motivation and Topics of Interest
Automating political event collection requires the availability of gold-standard corpora that can be used for system development and evaluation. Moreover, automated tool performances need to be reproducible and comparable. Although a tremendous effort is being spent on creating socio-political event databases such as ACLED, GDELT, MMAD, and ICEWS, there has not been much progress in harmonising event schemas and tasks. This limitation causes the definition of the events and automated event information collection tool performances to be restricted to single projects. Consequently, the lack of comparable and reproducible settings hinders progress on this task.
We invite contributions from researchers in NLP, ML and AI involved in automated event data collection, as well as researchers in Social and Political Sciences, Conflict Analysis and Peace studies, who make use of this kind of data for their analytical work. Our goal is to enable the emergence of innovative NLP/IE solutions that can deal with the current stream of information, manage the risks of information overload, identify different sources and perspectives, and provide unitary and intelligible representations of the larger and long-term storylines behind news articles.
Our workshop will provide a venue for discussing the creation and facilitation of language resources in the social and political sciences domain. Social and political scientists will be interested in reporting and discussing their automated tools in comparison to their traditional coding approaches. Computational linguistics and machine learning practitioners and researchers will benefit from being challenged by real-world use cases, in terms of event data extraction, representation and aggregation.
We invite work on all aspects of automated coding of socio-political events from mono- or multi-lingual news sources. This includes (but is not limited to) the following topics
- Event metadata extraction
- Source bias mitigation
- Event data schema and representation
- Event information duplication detection
- Extracting events beyond a sentence in a document
- Training data collection/annotation processes
- Event coreference (in- and cross-document)
- Sub-event and event subset relations
- Event dataset evaluation and validity metrics
- Event datasets quality assessments
- Defining, populating and facilitating event ontologies
- Automated tools for relevant tasks
- Understanding the limits that are introduced by copyright rules
- Ethical concerns and ethical design
We are organizing a shared-task that will provide a setting that consists of data, task definition, and evaluation schema. Participants of this shared-task will have the possibility to report their results in the workshop after peer-review of their working notes. A session will be dedicated to discuss the results of the shared task during the workshop.
We introduce the event sentence coreference identification (ESCI) subtask in the scope of the protest event collection task. A news article may contain one or more events that are expressed with one or more sentences. Identifying event sentences that are about the same event is necessary in order to collect event information robustly. Therefore, we should develop methods that are able to identify whether a group of sentences are about the same event. Reliable identification of this relation will enable us to determine how many events are reported in a news article as well. Moreover, identifying sentences that are about the same event has the potential to facilitate cross-document event sentence relation identification in the long term. Participants of the data challenge will receive event related sentences and their clustering, in which a cluster represents all sentences about an event, in a news article. The task of the participants is to automatically learn and predict grouping of these sentences on test data that will be delivered to them one week before the submission deadline. All sentences about an event should be in the same cluster.
Please send an e-mail to email@example.com if you would like to participate in the shared task.
This call solicits full papers reporting original and unpublished research on the topics listed above. The papers should emphasize obtained results rather than intended work and should indicate clearly the state of completion of the reported results. Submissions should be between 4 and 8 pages in total.
Authors are also invited to submit short papers not exceeding 4 pages (plus two additional pages for references). Short papers should describe:
- a small, focused contribution;
- work in progress;
- a negative result;
- a position paper.
- a report on shared task participation.
Papers should be submitted on the START page of the workshop (https://www.softconf.com/lrec2020/AESPEN2020/) in PDF format, in compliance with the style sheet adopted for the LREC Proceedings (to be found here: https://lrec2020.lrec-conf.org/en/submission2020/authors-kit/)
The reviewing process will be double blind and papers should not include the authors’ names and affiliations. Each submission will be reviewed by at least three members of the program committee. If you do include any author names on the title page, your submission will be automatically rejected. In the body of your submission, you should eliminate all direct references to your own previous work.
Workshop Proceedings will be published on the LREC 2020 website.
Identify, Describe, and Share your LRs!
Describing your LRs in the LRE Map is now a normal practice in the submission procedure of LREC (introduced in 2010 and adopted by other conferences). To continue the efforts initiated at LREC 2014 about “Sharing LRs” (data, tools, web-services, etc.), authors will have the possibility, when submitting a paper, to upload LRs in a special LREC repository. This effort of sharing LRs, linked to the LRE Map for their description, may become a new “regular” feature for conferences in our field, thus contributing to creating a common repository where everyone can deposit and share data.
As scientific work requires accurate citations of referenced work so as to allow the community to understand the whole context and also replicate the experiments conducted by other researchers, LREC 2020 endorses the need to uniquely Identify LRs through the use of the International Standard Language Resource Number (ISLRN, www.islrn.org), a Persistent Unique Identifier to be assigned to each Language Resource. The assignment of ISLRNs to LRs cited in LREC papers will be offered at submission time.
The first keynote speech will be delivered by Prof. Clionadh Raleigh.
Title: Too soon? The limitations of AI for event data
Not all conflict datasets offer equal levels of coverage, depth, use-ability, and content. A review of the inclusion criteria, methodology, and sourcing of leading publicly available conflict datasets demonstrates that there are significant discrepancies in the output produced by ostensibly similar projects. This keynote will question the presumption of substantial overlap between datasets, and identify a number of important gaps left by deficiencies across core criteria for effective conflict data collection and analysis, including:
Data Collection and Oversight: A rigorous, human coder is the best way to ensure reliable, consistent, and accurate events that are not false positives. Automated event data projects are still being refined and are not yet at the point where they can be used as accurate representations of reality. It is not appropriate to use these event datasets to present trends, maps, or distributions of violence in a state.
Inclusion: Inclusion criteria should allow for accurate representations of political violence, while being flexible to how political violence has changed. Who is considered a relevant and legitimate actor in conflict is pre-determined by the mandate of the dataset; the definitions, catchment, and categorization are critical, as they tell a user who and what is likely to be included.
Coverage and Classification: Clear, coherent, and correct classifications are important for users because conflicts are not homogenous: disorder events differ in their frequency, sequences, and intensity. Event types that reflect the variation of modalities common across conflicts and periods of disorder are basic, central components of insightful and useful analysis.
Use-ability and Transparency: Datasets must be useful and useable if they are to be relied upon for regular analysis, and users should be able to access every detail of how conflict data are coded and collected. Use-ability is closely tied to straightforward, consistent inclusion criteria and clear methodology.
Sourcing: Extensive sourcing — including from local partners and media in local languages — provides the most thorough and accurate information on political violence and demonstrations, as well as the most accurate presentation of the risks that citizens and civilians experience in their homes and communities.
Prof. Clionadh Raleigh is a professor of Political Geography focused on modern disorder and political elite networks in developing states. She is the director of the ACLED project which produces and analyzes real-time data on political violence and protest in the world’s most unstable states. Moreover, she is recipient of two European Research Council Grants.
The second keynote speech will be delivered by Dr. Philip A. Schrodt from Parus Analytics, LLC.
Title: Current Open Questions for Operational Event Data
In this brief keynote, I will address what I see as five major issues in terms of development for operational event data sets (that is, event data intended for real time monitoring and forecasting, rather than purely for academic research). First, there are no currently active real time systems with fully open and transparent pipelines: instead, one or more components are proprietary. Ideally we need several of these, using different approaches (and in particular, comparisons between classical dictionary- and rule-based coders versus newer coders based on machine-learning approaches). Second, the CAMEO event ontology needs to be replaced by a more general system that includes, for example, political codes for electoral competition, legislative debate, and parliamentary coalition formation, as well as a robust set of codes for non-political events such as natural disasters, disease, and economic dislocations. Third, the issue of duplicate stories needs to be addressed — for example, the ICEWS system can generate as many as 150 coded events from a single occurrence on the ground — either to reduce these sets of related stories to a single set of events, or at least to label clusters of related stories as is already done in a number of systems (for example European Media Monitor). Fourth, a systematic analysis needs to be done as to the additional information provided by hundreds of highly local sources (which have varying degrees of varacity and independence from states and local elites) as opposed to a relatively small number of international sources: obviously this will vary depending on the specific question being asked but has yet to be addressed at all. Finally, and this will overlap with academic work, a number of open benchmarks need to be constructed for the calibration of both coding systems and resulting models: these could be historical but need to include an easily licensed (or open) very large set of texts covering a substantial period of time, probably along the lines of the Linguistics Data Consortium Gigaword sets; if licensed, these need to be accessible to individual researchers and NGOs, not just academic institutions.
Philip Schrodt is a senior research scientist at the statistical consulting firm Parus Analytical Systems. He received an M.A. in mathematics and a Ph.D. in political science from Indiana University in 1976, and has held permanent academic positions at Pennsylvania State University (4 years), the University of Kansas (21 years), and Northwestern University (12 years), where he helped develop Northwestern’s programs on mathematical methods in the social sciences. He has also held research appointments in the United Kingdom and Norway, and has taught and done field research in the Middle East. Dr. Schrodt’s major areas of research are quantitative models of political conflict and computational political methodology. His current research focuses on predicting political change using statistical and pattern recognition methods, research that has been supported by the U.S. National Science Foundation, the Defense Advanced Research Projects Agency, and the U.S. government’s multi-agency Political Instability Task Force. Dr. Schrodt has published more than 90 articles in political science, is past president and a fellow of the Society for Political Methodology, and his Kansas Event Data System computer program won the “Outstanding Computer Software Award” from the American Political Science Association in 1995.
All dates are in 2020 and (23:59 GMT+1):
January 14th: Announcing the shared task
February 22nd –> March 7th: Workshop paper submission deadline
March 2nd –> same: Cut-off date for the shared task results
March 7th -> same: Submission deadline for the working notes of the shared task
March 13th –> March 27th : Notification of acceptance
April 2nd -> same: Camera-ready deadline
May 12th, 2020 -> same: The workshop date –> The workshop will be held online on June 09-10-11 between 11:00 and 13:00 UTC. Please register until June 5th, 2020 before 23:59 (AoE) using the registration form. Please find the workshop program and proceedings here.
Please register until June 5th, 2020 before 23:59 (AoE) using the registration form.
The proceedings can be found here.
We invite you to watch sessions of online Automatic Extraction of Socio-Political Events from News (AESPEN) workshop. You can find the links below:
June 9 (UTC time) – Shared task & datasets
11:00 – 11:05 Welcome, opening remarks, announcements (Announce the Google Sheet for tracking the questions, comments, etc.)
11:05 – 11:55 Keynote: Too soon? The limitations of AI for event data (Clionadh Raleigh)
11:55 – 12:05 Shared task – Event Sentence Coreference Identification (Ali Hürriyetoğlu, Ali Safaya, Osman Mutlu, Erdem Yörük)
12:05 – 12:30 Event Clustering within News Articles (Faik Kerem Örs, Süveyda Yeniterzi and Reyyan Yeniterzi)
12:30 – 12:55 Seeing the Forest and the Trees: Detection and Cross-Document Coreference Resolution of Militarized Interstate Disputes (Benjamin Radford)
12:55 – 13:00 Closing remarks, comments, feedback, discussion, announce the Google Docs that will be used for keeping track of the questions.
June 10 (UTC time) – Event information collection projects (Chair: Doug Bond (Harvard University & VRA, Inc.)
11:00 – 11:10 Welcome, opening remarks, announcements
11:10 – 11:35 Supervised Event Coding from Text Written in Arabic: Introducing Hadath (Javier Osorio, Alejandro Reyes, Alejandro Beltrán and Atal Ahmadzai)
11:35 – 12:00 Protest Event Analysis: A Longitudinal Analysis for Greece (Konstantina Papanikolaou and Haris Papageorgiou)
12:00 – 12:50 Keynote: Current Open Questions for Operational Event Data (Philip A. Schrodt)
12:50 – 13:00 Closing remarks, comments, feedback, discussion, announce the Google Docs that will be used for keeping track of the questions.
June 11 (UTC time) – State-of-the-art & use of datasets (Chair: Deniz Yüret (Koc University))
11:00 – 11:10 Welcome, opening remarks, announcements
11:10 – 11:35 Analyzing ELMo and DistilBERT on Socio-political News Classification (Berfu Büyüköz, Ali Hürriyetoğlu and Arzucan Özgür)
11:35 – 12:00 Text Categorization for Conflict Event Annotation (Fredrik Olsson, Magnus Sahlgren, Fehmi ben Abdesslem, Ariel Ekgren and Kristine Eck)
12:00 – 12:25 TF-IDF Character N-grams versus Word Embedding-based Models for Fine-grained Event Classification: A Preliminary Study (Jakub Piskorski and Guillaume Jacquet)
12:25 – 12:50 Conflict Event Modelling: Research Experiment and Event Data Limitations (Matina Halkia, Stefano Ferri, Michail Papazoglou, Marie-Sophie Van Dammeand Dimitrios Thomakos)
12:50 – 13:00 Closing remarks, comments, feedback, discussion
13:30 – 14:30 Panel discussion. Announce a Google Docs for keeping track of the questions
Please consider joining the e-mail group “firstname.lastname@example.org” (https://groups.google.com/forum/#!forum/automated-political-event-collection) to keep in touch.
Ali Hürriyetoğlu (Koc University)
Hristo Tanev (European Commission – Joint Research Center)
Erdem Yörük (Koc University and University of Oxford)
Vanni Zavarella (European Commission – Joint Research Center)
Svetla Boycheva (Institute of Information and Communication Technologies, Bulgarian Academy of Sciences)
Fırat Durusan (Koc University)
Theresa Gessler (University of Zürich)
Christian Göbel (University of Vienna)
Burak Gürel (Koc University)
Matina Halkia (European Commission – Joint Research Center)
Sophia Hunger (European University Institute)
J. Craig Jenkins (The Ohio State University)
Liron Lavi (UCLA Y&S Nazarian Center for Israel Studies)
Jasmine Lorenzini (University of Geneva)
Bernardo Magnini (Fondazione Bruno Kessler (FBK))
Osman Mutlu (Koc University)
Nelleke Oostdijk (Radboud University)
Arzucan Özgür (Boğaziçi University)
Jakub Piskorski (Polish Academy of Sciences)
Lidia Pivovarova (University of Helsinki)
Benjamin J. Radford (UNC Charlotte)
Clionadh Raleigh (University of Sussex)
Ali Safaya (Koc University)
Parang Saraf (Virginia Tech)
Philip Schrodt (Parus Analytical Systems)
Manuela Speranza (Fondazione Bruno Kessler, Trento)
Çağrı Yoltar (Koc University)
Aline Villavicencio (The University of Sheffield)
Kalliopi Zervanou (Eindhoven University of Technology)