You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Revise task description and details for enthymeme task
Updated the task description for enthymeme detection and proposition generation, including details on dataset, evaluation methodology, and target audience.
Copy file name to clipboardExpand all lines: _editions/2026/tasks/enthymeme.md
+176-4Lines changed: 176 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,9 +13,181 @@ blurb: "Missing Pieces and Misinformation: Identifying social media posts with i
13
13
<!-- # please respect the structure below-->
14
14
*See the [MediaEval 2026 webpage](https://multimediaeval.github.io/editions/2026/) for information on how to register and participate.*
15
15
16
-
####Task description
16
+
## Task Description
17
17
18
-
This task will release a data set of social media posts annotated with information about the type of arguments they contain. Specifically, the task is focused on identifying arguments arguments with implicit premises
19
-
or conclusions, which are known [enthymemes](https://en.wikipedia.org/wiki/Enthymeme).
18
+
The goal of this task is to develop AI models that are capable of detecting implicit arguments
19
+
(enthymemes) in tweets. The dataset contains tweets and their annotations, and also includes
20
+
reconstructions of the full arguments: propositions, premises, conclusions, and argument
21
+
structures (schemes).
20
22
21
-
More information about the task will be available here shortly.
23
+
Participants are invited to complete two tasks. While they may choose to complete only task 1, completion of task 2 is conditional upon prior completion of task 1.
24
+
25
+
**Task 1: "Enthymeme Detection"** — Detecting the absence or presence of enthymemes in tweets
26
+
(binary classification)
27
+
-*Constrained Run 1:* From the text only, predict the presence or absence of implicit arguments.
28
+
-*Constrained Run 2:* Annotation labels from 3 different annotators should also be leveraged
29
+
and investigated to increase performance.
30
+
-*Open Run:* Using any external data, predict the presence or absence of implicit arguments.
31
+
32
+
**Task 2: "Proposition Generation"** — Generate the missing piece, i.e., the full implicit
33
+
proposition. In the example below, the AI system would not only detect the presence or absence
34
+
of an implicit statement, but in case of detecting its presence, it would generate the text of
35
+
the missing/implicit proposition (i.e., the text of Premise 1 below).
36
+
37
+
**Example:**
38
+
39
+
If the tweet contains the following text: *"Deterring the plans of illegal people smugglers is essential to controlled immigration. We should support all plans to stop them."*
40
+
41
+
This tweet was annotated with the presence of an implicit premise: *" Controlled immigration is desirable
42
+
"*
43
+
44
+
The full argument can be reconstructed as:
45
+
- Premise 1 *(implicit)*: Controlled immigration is desirable
46
+
- Premise 2: Deterring the plans of illegal people smugglers is essential to controlled immigration
47
+
- Conclusion: We should support all plans to stop them.
48
+
49
+
Participating teams will write short working-notes papers that are published in the workshop
50
+
proceedings (optional). We welcome two types of papers: first, conventional benchmarking papers,
51
+
which describe the methods that teams use to address the task (enthymeme detection and implicit
52
+
proposition generation) and analyze the results across the constrained and open runs; and second,
53
+
"Quest for Insight" papers, which address a research question aimed at gaining deeper
54
+
understanding of implicit argumentation, but do not necessarily present complete task results.
55
+
Example questions for "Quest for Insight" papers include: How do different annotators interpret
56
+
implicit premises? What linguistic features best signal the presence of enthymemes?
57
+
58
+
---
59
+
60
+
## Motivation and Background
61
+
62
+
Enthymemes—arguments with missing components (premises or conclusions)—represent a fundamental
63
+
challenge in understanding persuasive discourse and argumentation. These implicit arguments are
64
+
particularly prevalent in social media contexts, where they serve as powerful means of persuasion
65
+
(Lombardi Vallauri et al., 2020). By leaving key premises/conclusions unstated, enthymemes lead
66
+
readers to perceive the implicit content as their own reasoning (Reboul, 2011), making them
67
+
especially effective rhetorical devices.
68
+
69
+
The significance of detecting and reconstructing enthymemes extends beyond theoretical interest
70
+
in argumentation theory. Enthymemes facilitate deceptive argumentation and manipulation, and help
71
+
in spreading disinformation (Lombardi Vallauri et al., 2020). Understanding how implicit premises
72
+
operate in controversial political discourse is therefore crucial for developing tools to combat
73
+
misinformation and promote critical thinking.
74
+
75
+
The task of enthymeme detection can be framed as a binary classification problem: determining
76
+
whether a given text contains an implicit argument or not. This simple formulation is interesting
77
+
for several reasons. First, it provides a foundational step for more complex argument mining
78
+
pipelines—before attempting to reconstruct missing propositions, systems must first identify where
79
+
implicit argumentation occurs. Second, binary classification allows for systematic investigation
80
+
of what linguistic and discourse characteristics signal the presence of enthymemes, enabling both
81
+
interpretable models and empirical validation of theoretical claims about argumentation structure.
82
+
83
+
However, developing computational systems for enthymeme detection and reconstruction presents
84
+
considerable challenges. The task is inherently interpretative, involving natural language
85
+
inference and semantic interpretation where high human disagreement is common (Plank et al., 2014;
86
+
Aroyo & Welty, 2015). Language tasks of this nature involve interpretation, multiple plausible
87
+
answers, and indirect meanings (Pavlick & Kwiatkowski, 2019), and relying on a single "correct"
88
+
label ignores rich variation in human judgments (Uma et al., 2021).
89
+
90
+
Our approach explicitly acknowledges the interpretative nature of the task by employing three
91
+
independent annotators per instance, a design choice that would enable us to treat human label
92
+
variation as signal rather than noise. This resource adds on an existing dataset of tweets
93
+
(Flaccavento et al., 2025) and provides the first annotated dataset specifically designed for
94
+
investigating enthymemes in controversial political discourse, enabling research into how discourse
95
+
characteristics of enthymemes can improve their detection with NLP methods.
96
+
97
+
---
98
+
99
+
## Target Group
100
+
101
+
This task is interesting to anyone who is interested in text analysis. We expect it to attract
102
+
people working in areas such as natural language processing, argument mining, computational
103
+
linguistics, misinformation detection, and social media analysis, from both academic and
104
+
industrial settings.
105
+
106
+
We especially welcome interdisciplinary teams, including researchers from argumentation theory,
107
+
philosophy, rhetoric, communication studies, political science, and computational social science,
108
+
as these perspectives are essential for understanding how implicit argumentation influences
109
+
persuasion, shapes political discourse, and affects the processes by which audiences interpret and
110
+
reason about controversial topics. The use of explicit structural modeling, linguistic
111
+
feature-based approaches, and even rule-based systems of all sorts are encouraged.
112
+
113
+
---
114
+
115
+
## Data
116
+
117
+
The dataset consists of tweets that have been annotated by multiple annotators who judged whether
118
+
or not the tweet contains an enthymeme. For each enthymeme, the annotators also propose a
119
+
reconstruction of the implicit and explicit propositional content and argument structure in cases
120
+
of enthymeme presence. The tweets are a subset of the tropes dataset by Flaccavento et al.
121
+
(2025), which was selected to include a balance of tweets on the topics of immigration in the UK
122
+
and the COVID-19 vaccine.
123
+
124
+
The data will be released in three parts:
125
+
126
+
-**Data sample (1 March):** A small collection of tweets that have been annotated by two
127
+
annotators, so that participants can read the data and understand the challenge of the task.
128
+
-**First data release (Mid-March):** A larger collection of data that has been annotated by five
129
+
annotators. It is split into train and dev.
130
+
-**Final dataset (Beginning April):** The full dataset, which is split into train, dev, and test
131
+
set. The train set is a superset of the training set released with the first data release.
132
+
Likewise, the dev set is a superset of the development set released with the first data release.
133
+
Train and dev data has been annotated by five annotators. The test set is the test set for the
134
+
task: the participants are required to submit their predictions on the test set.
135
+
136
+
> ⚠️ Participants should be aware that the data contains language hurtful towards immigrants and
137
+
> should be ready for this when reading the data.
138
+
139
+
---
140
+
141
+
## Evaluation Methodology
142
+
143
+
**Task 1:** Since this is a label prediction task, we will evaluate using F1 concerning the
144
+
presence or absence of enthymemes. Three labels are considered in the basic setting:
145
+
`implicit_premise`, `implicit_conclusion`, and `none`.
146
+
147
+
**Task 2:** The generated propositions will be evaluated in two ways. First, BERTScore will be
148
+
used to compare the reconstructions provided by the annotators with the propositions generated by
149
+
the participants. Second, a subset of the test set will be sampled and evaluated by hand by
150
+
experienced human annotators.
151
+
152
+
---
153
+
154
+
## Quest for Insight
155
+
156
+
- What systematic patterns emerge in label variation across easy-medium-hard cases, and do they
157
+
reveal distinct interpretative frameworks?
158
+
- Does modeling the full distribution of human judgments improve performance on borderline cases
159
+
compared to majority-vote labels?
160
+
- Can annotator disagreement patterns predict which instances will be hardest for NLP models to
161
+
classify?
162
+
- What sorts of linguistic or discourse features can be leveraged to improve classification
163
+
performance?
164
+
- What additional data can be used or is relevant to improve classification performance?
165
+
- How does average inter-annotator agreement compare to other evaluation schemes in assessing
166
+
annotation performance when no single ground truth exists?
167
+
- What is the most effective way to leverage annotator reconstructions to evaluate implicit
168
+
proposition generation performance?
169
+
170
+
---
171
+
172
+
## Task Organizers
173
+
174
+
-**Martial Pastor**, Radboud University — martial.pastor@ru.nl
175
+
-**Nelleke Oostdijk**, Radboud University — nelleke.oostdijk@ru.nl
176
+
177
+
*Data will be made available as of the 1st of March.*
178
+
179
+
---
180
+
181
+
## References
182
+
183
+
[1] Aroyo, L., & Welty, C. (2015). Truth is a lie: Crowd truth and the seven myths of human annotation. *AI Magazine, 36*(1), 15–24.
184
+
185
+
[2] Flaccavento, A., Peskine, Y., Papotti, P., Torlone, R., & Troncy, R. (2025, January). Automated detection of tropes in short texts. In *Proceedings of COLING 2025: 31st International Conference on Computational Linguistics.*
186
+
187
+
[3] Plank, B., Hovy, D., & Søgaard, A. (2014). Linguistically debatable or just plain wrong? In *Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics* (Vol. 2: Short Papers, pp. 507–511). Association for Computational Linguistics.
188
+
189
+
[4] Reboul, A. (2011). A relevance-theoretic account of the evolution of implicit communication. *Studies in Pragmatics, 13*(1).
190
+
191
+
[5] Uma, A., Fornaciari, T., Hovy, D., Paun, S., Plank, B., & Poesio, M. (2021). Learning from disagreement: A survey. *Journal of Artificial Intelligence Research, 72,* 1385–1470.
192
+
193
+
[6] Vallauri, E. L., Baranzini, L., Cimmino, D., Cominetti, F., Coppola, C., & Mannaioli, G. (2020). Implicit argumentation and persuasion: A measuring model. *Journal of Argumentation in Context, 9,* 95–123.
0 commit comments