ir_datasets
: args.meThe args.me corpus is one of the largest argument resources available and contains arguments crawled from debate platforms and parliament discussions.
Bibtex:
@inproceedings{Wachsmuth2017Argument, author = {Henning Wachsmuth and Martin Potthast and Khalid Al-Khatib and Yamen Ajjour and Jana Puschmann and Jiani Qu and Jonas Dorsch and Viorel Morari and Janek Bevendorff and Benno Stein}, booktitle = {4th Workshop on Argument Mining (ArgMining 2017) at EMNLP}, editor = {Kevin Ashley and Claire Cardie and Nancy Green and Iryna Gurevych and Ivan Habernal and Diane Litman and Georgios Petasis and Chris Reed and Noam Slonim and Vern Walker}, month = sep, pages = {49-59}, publisher = {Association for Computational Linguistics}, site = {Copenhagen, Denmark}, title = {{Building an Argument Search Engine for the Web}}, url = {https://www.aclweb.org/anthology/W17-5106}, year = 2017 } @inproceedings{Ajjour2019Acquisition, address = {Berlin Heidelberg New York}, author = {Yamen Ajjour and Henning Wachsmuth and Johannes Kiesel and Martin Potthast and Matthias Hagen and Benno Stein}, booktitle = {42nd German Conference on Artificial Intelligence (KI 2019)}, doi = {10.1007/978-3-030-30179-8\_4}, editor = {Christoph Benzm{\"u}ller and Heiner Stuckenschmidt}, month = sep, pages = {48-59}, publisher = {Springer}, site = {Kassel, Germany}, title = {{Data Acquisition for Argument Search: The args.me corpus}}, year = 2019 }Corpus version 1.0 with 387 606 arguments crawled from Debatewise, IDebate.org, Debatepedia, Debate.org. It was released on July 9, 2019 on Zenodo.
This collection is licensed with the Creative Commons Attribution 4.0 International. Individual rights to the content still apply.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("argsme/1.0")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>
You can find more details about the Python API here.
Corpus version 1.0-cleaned with 382 545 arguments crawled from Debatewise, IDebate.org, Debatepedia, Debate.org. This version contains the same arguments as version 1.0, but cleaned as described in the corresponding publication. It was released on October 27, 2020 on Zenodo.
This collection is licensed with the Creative Commons Attribution 4.0 International. Individual rights to the content still apply.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("argsme/1.0-cleaned")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>
You can find more details about the Python API here.
Version of argsme/2020-04-01/touche-2020-task-1 that uses the argsme/1.0 corpus with uncorrected relevance judgements derived from crowdworkers. This dataset's relevance judgements should not be used without preprocessing.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("argsme/1.0/touche-2020-task-1/uncorrected")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, description, narrative>
You can find more details about the Python API here.
Corpus version 2020-04-01 with 387 740 arguments crawled from Debatewise, IDebate.org, Debatepedia, Debate.org, and from Canadian Parliament discussions. It was released on April 1, 2020 on Zenodo.
This collection is licensed with the Creative Commons Attribution 4.0 International. Individual rights to the content still apply.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>
You can find more details about the Python API here.
Subset of the 338 620 arguments from args.me version 2020-04-01 that were crawled from the debate portal Debate.org.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/debateorg")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>
You can find more details about the Python API here.
Subset of the 21 197 arguments from args.me version 2020-04-01 that were crawled from the debate portal Debatepedia.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/debatepedia")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>
You can find more details about the Python API here.
Subset of the 14 353 arguments from args.me version 2020-04-01 that were crawled from the debate portal Debatewise.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/debatewise")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>
You can find more details about the Python API here.
Subset of the 13 522 arguments from args.me version 2020-04-01 that were crawled from the debate portal IDebate.org.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/idebate")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>
You can find more details about the Python API here.
Subset of the 48 arguments from args.me version 2020-04-01 that were crawled from Canadian Parliament discussions.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/parliamentary")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>
You can find more details about the Python API here.
Decision making processes, be it at the societal or at the personal level, eventually come to a point where one side will challenge the other with a why-question, which is a prompt to justify one's stance. Thus, technologies for argument mining and argumentation processing are maturing at a rapid pace, giving rise for the first time to argument retrieval. Touché 2020 is the first lab on Argument Retrieval at CLEF 2020 featuring two tasks.
Given a question on a controversial topic, retrieve relevant arguments from a focused crawl of online debate portals (argsme/2020-04-01).
Documents are judged based on their general topical relevance.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/touche-2020-task-1")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, description, narrative>
You can find more details about the Python API here.
Version of argsme/2020-04-01/touche-2020-task-1 that uses uncorrected relevance judgements derived from crowdworkers. This dataset's relevance judgements should not be used without preprocessing.
Inherits queries from argsme/2020-04-01/touche-2020-task-1
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/touche-2020-task-1/uncorrected")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, description, narrative>
You can find more details about the Python API here.
Decision making processes, be it at the societal or at the personal level, often come to a point where one side challenges the other with a why-question, which is a prompt to justify some stance based on arguments. Since technologies for argument mining are maturing at a rapid pace, also ad-hoc argument retrieval becomes a feasible task in reach. Touché 2021 is the second lab on argument retrieval at CLEF 2021 featuring two tasks.
Given a question on a controversial topic, retrieve relevant arguments from a focused crawl of online debate portals (argsme/2020-04-01).
Documents are judged based on their general topical relevance and for rhetorical quality, i.e., "well-writtenness" of the document: (1) whether the text has a good style of speech (formal language is preferred over informal), (2) whether the text has a proper sentence structure and is easy to read, (3) whether it includes profanity, has typos, and makes use of other detrimental style choices.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/touche-2021-task-1")
for query in dataset.queries_iter():
query # namedtuple<query_id, title>
You can find more details about the Python API here.