CLEF ProtestNews 2019
Extracting Protests from News
The task ProtestNews aims at extracting event information from news articles across multiple countries. We particularly focus on events that are in the scope of contentious politics and characterized by riots and social movements, i.e. the “repertoire of contention” (Giugni 1998, Tarrow 1994, Tilly 1984). Our aim is to develop text classification and information extraction tools on one country and test them on data from different countries. The text data is in English and collected from India, China, and South Africa.
We believe our task will set a baseline in evaluating generalizability of the NLP tools. Another challenge of the task is the handling of the nuanced protest definition used in social science studies, difference in protest types and their expression across countries, and the target information to be extracted. The clues that are needed to discriminate between the relevant and irrelevant information in this context may be either implied without any explicit expression or hinted with a single word in the whole article. For instance, a news article about a protest threat or an open letter written by a single person does not qualify as relevant. A protest should have happened and an open letter should be supported by more than one person to be in-scope.
We split the ProtestNews task in three subtasks:
1) Task 1: News article classification as protest vs. non-protest is a binary classification task that aims at discriminating between protest event related news articles and any other news article.
2) Task 2: Event sentence detection aims at determining event sentences that contain an event trigger or a mention of it.
3) Task 3: Event information extraction is an event information extraction task that targets mainly event
Subtasks two and three will be based on news articles that are labelled as protest or not for subtask 1. Participants can choose to participate in one or more of these subtasks independent of each other.
We use online news archives from India as data source to create the training and test corpora. Moreover, we are gathering available datasets that can be used as additional resources to tackle the proposed subtasks. We will guide the participants through obtaining these data sets.
Please register before April 26, 2019 (http://clef2019-labs-registration.dei.unipd.it/).
Data release: March 25, 2019.
(We have released the data. Please send an e-mail to firstname.lastname@example.org in case you have subscribed and have not received the instructions for obtaining the data.)
Submission deadline is May 10, 2019.
Ali Hürriyetoglu: email@example.com
Deniz Yüret: firstname.lastname@example.org
Erdem Yörük: email@example.com
Çağrı Yoltar: firstname.lastname@example.org
Burak Gürel: email@example.com
Fırat Duruşan: firstname.lastname@example.org
Osman Mutlu: email@example.com
Arda Akdemir: firstname.lastname@example.org
Theresa Gessler: Theresa.Gessler@EUI.eu
Peter Makarov: email@example.com