Call for workshop papers and shared task participation:  Automated Extraction of Socio-political Events from News (AESPEN) @ LREC 2020

URL: https://emw.ku.edu.tr/aespen-2020/

Submission deadline: February 22nd, 2020 –> March 7th, 2020

Cut-off date for the shared task results: March 7th, 2020

Submission page: https://www.softconf.com/lrec2020/AESPEN2020/

Workshop date: May 12th, 2020

—————————————————————–

Call for Papers

Automatic construction of event databases has long been a challenge for the natural language processing (NLP) community in terms of algorithmic approaches and language resources. At the same time, social and political scientists have been working on creating socio-political event databases for decades using manual, semi-automatic, and automatic approaches. However, the results yielded by these approaches to date are either not of sufficient quality or require tremendous effort to be replicated on new data. On the one hand, manual or semi-automatic methods require high-quality human effort; on the other hand, state-of-the-art event automated detection systems are not accurate enough for their output being directly usable without human moderation. Finally, the NLP community has not achieved a consensus on the treatment of events both in terms of task definition and appropriate techniques for their detection. 

Given the aforementioned limitations, there is an increasing tendency to rely on machine learning (ML) and NLP methods to deal better with the vast amount and variety of data to be collected. This workshop aims to inspire the emergence of innovative technological and scientific solutions in the field of event detection and event metadata extraction from news, as well as the development of evaluation metrics for event recognition. Moreover, the workshop will aim at triggering a deeper understanding of the usability of socio-political event datasets.

References:

Lorenzini, J., Makarov, P., Kriesi, H., & Wueest, B. (2016). Towards a Dataset of Automatically Coded Protest Events from English-language Newswire Documents. In Paper presented at the Amsterdam Text Analysis Conference URL: http://bruno-wueest.ch/assets/files/Lorenzini_etal_2016.pdf

Wang, W., Kennedy, R., Lazer, D., & Ramakrishnan, N. (2016). Growing pains for global monitoring of societal events. Science, 353(6307), 1502-1503. URL: http://science.sciencemag.org/content/353/6307/1502

Motivation and Topics of Interest

Automating political event collection requires the availability of gold-standard corpora that can be used for system development and evaluation. Moreover, automated tool performances need to be reproducible and comparable. Although a tremendous effort is being spent on creating socio-political event databases such as ACLED, GDELT, MMAD, and ICEWS, there has not been much progress in harmonising event schemas and tasks. This limitation causes the definition of the events and automated event information collection tool performances to be restricted to single projects. Consequently, the lack of comparable and reproducible settings hinders progress on this task.

We invite contributions from researchers in NLP, ML and AI involved in automated event data collection, as well as researchers in Social and Political Sciences, Conflict Analysis and Peace studies, who make use of this kind of data for their analytical work. Our goal is to enable the emergence of innovative NLP/IE solutions that can deal with the current stream of information, manage the risks of information overload, identify different sources and perspectives, and provide unitary and intelligible representations of the larger and long-term storylines behind news articles.

Our workshop will provide a venue for discussing the creation and facilitation of language resources in the social and political sciences domain. Social and political scientists will be interested in reporting and discussing their automated tools in comparison to their traditional coding approaches. Computational linguistics and machine learning practitioners and researchers will benefit from being challenged by real-world use cases, in terms of event data extraction, representation and aggregation.

We invite work on all aspects of automated coding of socio-political events from mono- or multi-lingual news sources. This includes (but is not limited to) the following topics

  • Event metadata extraction
  • Source bias mitigation
  • Event data schema and representation
  • Event information duplication detection 
  • Extracting events beyond a sentence in a document
  • Training data collection/annotation processes 
  • Event coreference (in- and cross-document)
  • Sub-event and event subset relations
  • Event dataset evaluation and validity metrics
  • Event datasets quality assessments
  • Defining, populating and facilitating event ontologies
  • Automated tools for relevant tasks
  • Understanding the limits that are introduced by copyright rules
  • Ethical concerns and ethical design

Shared Task

We are organizing a shared-task that will provide a setting that consists of data, task definition, and evaluation schema. Participants of this shared-task will have the possibility to report their results in the workshop after peer-review of their working notes. A session will be dedicated to discuss the results of the shared task during the workshop.

We introduce the event sentence coreference identification (ESCI) subtask in the scope of the protest event collection task. A news article may contain one or more events that are expressed with one or more sentences. Identifying event sentences that are about the same event is necessary in order to collect event information robustly. Therefore, we should develop methods that are able to identify whether a group of sentences are about the same event. Reliable identification of this relation will enable us to determine how many events are reported in a news article as well. Moreover, identifying sentences that are about the same event has the potential to facilitate cross-document event sentence relation identification in the long term. Participants of the data challenge will receive event related sentences and their clustering, in which a cluster represents all sentences about an event, in a news article. The task of the participants is to automatically learn and predict grouping of these sentences on test data that will be delivered to them one week before the submission deadline. All sentences about an event should be in the same cluster.

Please send an e-mail to ahurriyetoglu@ku.edu.tr if you would like to participate in the shared task.

Submissions

This call solicits full papers reporting original and unpublished research on the topics listed above. The papers should emphasize obtained results rather than intended work and should indicate clearly the state of completion of the reported results. Submissions should be between 4 and 8 pages in total.

Authors are also invited to submit short papers not exceeding 4 pages (plus two additional pages for references). Short papers should describe:

  • a small, focused contribution;
  • work in progress; 
  • a negative result;
  • a position paper.
  • a report on shared task participation.

Papers should be submitted on the START page of the workshop (https://www.softconf.com/lrec2020/AESPEN2020/) in PDF format, in compliance with the style sheet adopted for the LREC Proceedings (to be found here: https://lrec2020.lrec-conf.org/en/submission2020/authors-kit/)

The reviewing process will be double blind and papers should not include the authors’ names and affiliations. Each submission will be reviewed by at least three members of the program committee. If you do include any author names on the title page, your submission will be automatically rejected. In the body of your submission, you should eliminate all direct references to your own previous work.

Workshop Proceedings will be published on the LREC 2020 website.

Identify, Describe, and Share your LRs!

Describing your LRs in the LRE Map is now a normal practice in the submission procedure of LREC (introduced in 2010 and adopted by other conferences). To continue the efforts initiated at LREC 2014 about “Sharing LRs” (data, tools, web-services, etc.), authors will have the possibility, when submitting a paper, to upload LRs in a special LREC repository. This effort of sharing LRs, linked to the LRE Map for their description, may become a new “regular” feature for conferences in our field, thus contributing to creating a common repository where everyone can deposit and share data.

As scientific work requires accurate citations of referenced work so as to allow the community to understand the whole context and also replicate the experiments conducted by other researchers, LREC 2020 endorses the need to uniquely Identify LRs through the use of the International Standard Language Resource Number (ISLRN, www.islrn.org), a Persistent Unique Identifier to be assigned to each Language Resource. The assignment of ISLRNs to LRs cited in LREC papers will be offered at submission time.

Keynote Speaker

The keynote speech will be delivered by Prof. Clionadh Raleigh.

Title: Too soon? The limitations of AI for event data

Bio

Prof. Clionadh Raleigh is a professor of Political Geography focused on modern disorder and political elite networks in developing states. She is the director of the ACLED project which produces and analyzes real-time data on political violence and protest in the world’s most unstable states. Moreover, she is recipient of two European Research Council Grants.

Important dates

All dates are in 2020 and (23:59 GMT+1):

January 14th: Announcing the shared task

February 22nd –> March 7th: Workshop paper submission deadline

March 2nd –> same: Cut-off date for the shared task results

March 7th -> same: Submission deadline for the working notes of the shared task

March 13th –> March 27th : Notification of acceptance

April 2nd -> same: Camera-ready deadline

May 12th, 2020 -> same: The workshop date

Contact

Do not hesitate to contact ahurriyetoglu@ku.edu.tr or vanni.zavarella@ec.europa.eu for any questions or comments.

Organizing Committee

Ali Hürriyetoğlu (Koc University)

Hristo Tanev (European Commission – Joint Research Center)

Erdem Yörük (Koc University and University of Oxford)

Vanni Zavarella (European Commission – Joint Research Center)

Programme Committee

Svetla Boycheva (Institute of Information and Communication Technologies, Bulgarian Academy of Sciences)

Fırat Durusan (Koc University)

Theresa Gessler (University of Zürich)

Christian Göbel (University of Vienna)

Burak Gürel (Koc University)

Matina Halkia (European Commission – Joint Research Center)

Sophia Hunger (European University Institute)

J. Craig Jenkins (The Ohio State University)

Liron Lavi (UCLA Y&S Nazarian Center for Israel Studies)

Jasmine Lorenzini (University of Geneva)

Bernardo Magnini (Fondazione Bruno Kessler (FBK))

Osman Mutlu (Koc University)

Nelleke Oostdijk (Radboud University)

Arzucan Özgür (Boğaziçi University)

Jakub Piskorski (Polish Academy of Sciences)

Lidia Pivovarova (University of Helsinki)

Benjamin J. Radford (UNC Charlotte)

Clionadh Raleigh (University of Sussex)

Ali Safaya (Koc University)

Parang Saraf (Virginia Tech)

Philip Schrodt (Parus Analytical Systems)

Manuela Speranza (Fondazione Bruno Kessler, Trento)

Çağrı Yoltar (Koc University)

Aline Villavicencio (The University of Sheffield)

Kalliopi Zervanou (Eindhoven University of Technology)