ir_datasets
: args.meThe args.me corpus is one of the largest argument resources available and contains arguments crawled from debate platforms and parliament discussions.
Bibtex:
@inproceedings{Wachsmuth2017Argument, author = {Henning Wachsmuth and Martin Potthast and Khalid Al-Khatib and Yamen Ajjour and Jana Puschmann and Jiani Qu and Jonas Dorsch and Viorel Morari and Janek Bevendorff and Benno Stein}, booktitle = {4th Workshop on Argument Mining (ArgMining 2017) at EMNLP}, editor = {Kevin Ashley and Claire Cardie and Nancy Green and Iryna Gurevych and Ivan Habernal and Diane Litman and Georgios Petasis and Chris Reed and Noam Slonim and Vern Walker}, month = sep, pages = {49-59}, publisher = {Association for Computational Linguistics}, site = {Copenhagen, Denmark}, title = {{Building an Argument Search Engine for the Web}}, url = {https://www.aclweb.org/anthology/W17-5106}, year = 2017 } @inproceedings{Ajjour2019Acquisition, address = {Berlin Heidelberg New York}, author = {Yamen Ajjour and Henning Wachsmuth and Johannes Kiesel and Martin Potthast and Matthias Hagen and Benno Stein}, booktitle = {42nd German Conference on Artificial Intelligence (KI 2019)}, doi = {10.1007/978-3-030-30179-8\_4}, editor = {Christoph Benzm{\"u}ller and Heiner Stuckenschmidt}, month = sep, pages = {48-59}, publisher = {Springer}, site = {Kassel, Germany}, title = {{Data Acquisition for Argument Search: The args.me corpus}}, year = 2019 }Corpus version 1.0 with 387 606 arguments crawled from Debatewise, IDebate.org, Debatepedia, Debate.org. It was released on July 9, 2019 on Zenodo.
This collection is licensed with the Creative Commons Attribution 4.0 International. Individual rights to the content still apply.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("argsme/1.0")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>
You can find more details about the Python API here.
ir_datasets export argsme/1.0 docs
[doc_id] [conclusion] [premises] [premises_texts] [aspects] [aspects_names] [source_id] [source_title] [source_url] [source_previous_argument_id] [source_next_argument_id] [source_domain] [source_text] [source_text_conclusion_start] [source_text_conclusion_end] [source_text_premise_start] [source_text_premise_end] [topic] [acquisition] [date] [author] [author_image_url] [author_organization] [author_role] [mode]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/1.0')
# Index argsme/1.0
indexer = pt.IterDictIndexer('./indices/argsme_1.0', meta={"docno": 39})
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['conclusion', 'premises_texts', 'aspects_names', 'source_id', 'source_title', 'topic'])
You can find more details about PyTerrier indexing here.
Bibtex:
@inproceedings{Wachsmuth2017Argument, author = {Henning Wachsmuth and Martin Potthast and Khalid Al-Khatib and Yamen Ajjour and Jana Puschmann and Jiani Qu and Jonas Dorsch and Viorel Morari and Janek Bevendorff and Benno Stein}, booktitle = {4th Workshop on Argument Mining (ArgMining 2017) at EMNLP}, editor = {Kevin Ashley and Claire Cardie and Nancy Green and Iryna Gurevych and Ivan Habernal and Diane Litman and Georgios Petasis and Chris Reed and Noam Slonim and Vern Walker}, month = sep, pages = {49-59}, publisher = {Association for Computational Linguistics}, site = {Copenhagen, Denmark}, title = {{Building an Argument Search Engine for the Web}}, url = {https://www.aclweb.org/anthology/W17-5106}, year = 2017 } @inproceedings{Ajjour2019Acquisition, address = {Berlin Heidelberg New York}, author = {Yamen Ajjour and Henning Wachsmuth and Johannes Kiesel and Martin Potthast and Matthias Hagen and Benno Stein}, booktitle = {42nd German Conference on Artificial Intelligence (KI 2019)}, doi = {10.1007/978-3-030-30179-8\_4}, editor = {Christoph Benzm{\"u}ller and Heiner Stuckenschmidt}, month = sep, pages = {48-59}, publisher = {Springer}, site = {Kassel, Germany}, title = {{Data Acquisition for Argument Search: The args.me corpus}}, year = 2019 }{ "docs": { "count": 387692, "fields": { "doc_id": { "max_len": 39, "common_prefix": "" } } } }
Corpus version 1.0-cleaned with 382 545 arguments crawled from Debatewise, IDebate.org, Debatepedia, Debate.org. This version contains the same arguments as version 1.0, but cleaned as described in the corresponding publication. It was released on October 27, 2020 on Zenodo.
This collection is licensed with the Creative Commons Attribution 4.0 International. Individual rights to the content still apply.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("argsme/1.0-cleaned")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>
You can find more details about the Python API here.
ir_datasets export argsme/1.0-cleaned docs
[doc_id] [conclusion] [premises] [premises_texts] [aspects] [aspects_names] [source_id] [source_title] [source_url] [source_previous_argument_id] [source_next_argument_id] [source_domain] [source_text] [source_text_conclusion_start] [source_text_conclusion_end] [source_text_premise_start] [source_text_premise_end] [topic] [acquisition] [date] [author] [author_image_url] [author_organization] [author_role] [mode]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/1.0-cleaned')
# Index argsme/1.0-cleaned
indexer = pt.IterDictIndexer('./indices/argsme_1.0-cleaned', meta={"docno": 39})
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['conclusion', 'premises_texts', 'aspects_names', 'source_id', 'source_title', 'topic'])
You can find more details about PyTerrier indexing here.
Bibtex:
@inproceedings{Wachsmuth2017Argument, author = {Henning Wachsmuth and Martin Potthast and Khalid Al-Khatib and Yamen Ajjour and Jana Puschmann and Jiani Qu and Jonas Dorsch and Viorel Morari and Janek Bevendorff and Benno Stein}, booktitle = {4th Workshop on Argument Mining (ArgMining 2017) at EMNLP}, editor = {Kevin Ashley and Claire Cardie and Nancy Green and Iryna Gurevych and Ivan Habernal and Diane Litman and Georgios Petasis and Chris Reed and Noam Slonim and Vern Walker}, month = sep, pages = {49-59}, publisher = {Association for Computational Linguistics}, site = {Copenhagen, Denmark}, title = {{Building an Argument Search Engine for the Web}}, url = {https://www.aclweb.org/anthology/W17-5106}, year = 2017 } @inproceedings{Ajjour2019Acquisition, address = {Berlin Heidelberg New York}, author = {Yamen Ajjour and Henning Wachsmuth and Johannes Kiesel and Martin Potthast and Matthias Hagen and Benno Stein}, booktitle = {42nd German Conference on Artificial Intelligence (KI 2019)}, doi = {10.1007/978-3-030-30179-8\_4}, editor = {Christoph Benzm{\"u}ller and Heiner Stuckenschmidt}, month = sep, pages = {48-59}, publisher = {Springer}, site = {Kassel, Germany}, title = {{Data Acquisition for Argument Search: The args.me corpus}}, year = 2019 }{ "docs": { "count": 382545, "fields": { "doc_id": { "max_len": 39, "common_prefix": "" } } } }
Version of argsme/2020-04-01/touche-2020-task-1 that uses the argsme/1.0 corpus with uncorrected relevance judgements derived from crowdworkers. This dataset's relevance judgements should not be used without preprocessing.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("argsme/1.0/touche-2020-task-1/uncorrected")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, description, narrative>
You can find more details about the Python API here.
ir_datasets export argsme/1.0/touche-2020-task-1/uncorrected queries
[query_id] [title] [description] [narrative]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/1.0/touche-2020-task-1/uncorrected')
index_ref = pt.IndexRef.of('./indices/argsme_1.0') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('title'))
You can find more details about PyTerrier retrieval here.
Inherits docs from argsme/1.0
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("argsme/1.0/touche-2020-task-1/uncorrected")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>
You can find more details about the Python API here.
ir_datasets export argsme/1.0/touche-2020-task-1/uncorrected docs
[doc_id] [conclusion] [premises] [premises_texts] [aspects] [aspects_names] [source_id] [source_title] [source_url] [source_previous_argument_id] [source_next_argument_id] [source_domain] [source_text] [source_text_conclusion_start] [source_text_conclusion_end] [source_text_premise_start] [source_text_premise_end] [topic] [acquisition] [date] [author] [author_image_url] [author_organization] [author_role] [mode]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/1.0/touche-2020-task-1/uncorrected')
# Index argsme/1.0
indexer = pt.IterDictIndexer('./indices/argsme_1.0', meta={"docno": 39})
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['conclusion', 'premises_texts', 'aspects_names', 'source_id', 'source_title', 'topic'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
-2 | spam, non-argument | 551 | 18.6% |
1 | very low relevance | 186 | 6.3% |
2 | low relevance | 195 | 6.6% |
3 | moderate relevance | 628 | 21.2% |
4 | high relevance | 1.0K | 33.9% |
5 | very high relevance | 398 | 13.4% |
Examples:
import ir_datasets
dataset = ir_datasets.load("argsme/1.0/touche-2020-task-1/uncorrected")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export argsme/1.0/touche-2020-task-1/uncorrected qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:argsme/1.0/touche-2020-task-1/uncorrected')
index_ref = pt.IndexRef.of('./indices/argsme_1.0') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('title'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
Bibtex:
@inproceedings{Bondarenko2020Touche, address = {Berlin Heidelberg New York}, author = {Alexander Bondarenko and Maik Fr{\"o}be and Meriem Beloucif and Lukas Gienapp and Yamen Ajjour and Alexander Panchenko and Chris Biemann and Benno Stein and Henning Wachsmuth and Martin Potthast and Matthias Hagen}, booktitle = {Experimental IR Meets Multilinguality, Multimodality, and Interaction. 11th International Conference of the CLEF Association (CLEF 2020)}, doi = {10.1007/978-3-030-58219-7\_26}, editor = {Avi Arampatzis and Evangelos Kanoulas and Theodora Tsikrika and Stefanos Vrochidis and Hideo Joho and Christina Lioma and Carsten Eickhoff and Aur{\'e}lie N{\'e}v{\'e}ol and Linda Cappellato and Nicola Ferro}, month = sep, pages = {384-395}, publisher = {Springer}, series = {Lecture Notes in Computer Science}, site = {Thessaloniki, Greece}, title = {{Overview of Touch{\'e} 2020: Argument Retrieval}}, url = {https://link.springer.com/chapter/10.1007/978-3-030-58219-7_26}, volume = 12260, year = 2020, } @inproceedings{Wachsmuth2017Quality, author = {Henning Wachsmuth and Nona Naderi and Yufang Hou and Yonatan Bilu and Vinodkumar Prabhakaran and Tim Alberdingk Thijm and Graeme Hirst and Benno Stein}, booktitle = {15th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2017)}, editor = {Phil Blunsom and Alexander Koller and Mirella Lapata}, month = apr, pages = {176-187}, site = {Valencia, Spain}, title = {{Computational Argumentation Quality Assessment in Natural Language}}, url = {http://aclweb.org/anthology/E17-1017}, year = 2017 }{ "docs": { "count": 387692, "fields": { "doc_id": { "max_len": 39, "common_prefix": "" } } }, "queries": { "count": 49 }, "qrels": { "count": 2964, "fields": { "relevance": { "counts_by_value": { "4": 1006, "5": 398, "3": 628, "2": 195, "-2": 551, "1": 186 } } } } }
Corpus version 2020-04-01 with 387 740 arguments crawled from Debatewise, IDebate.org, Debatepedia, Debate.org, and from Canadian Parliament discussions. It was released on April 1, 2020 on Zenodo.
This collection is licensed with the Creative Commons Attribution 4.0 International. Individual rights to the content still apply.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>
You can find more details about the Python API here.
ir_datasets export argsme/2020-04-01 docs
[doc_id] [conclusion] [premises] [premises_texts] [aspects] [aspects_names] [source_id] [source_title] [source_url] [source_previous_argument_id] [source_next_argument_id] [source_domain] [source_text] [source_text_conclusion_start] [source_text_conclusion_end] [source_text_premise_start] [source_text_premise_end] [topic] [acquisition] [date] [author] [author_image_url] [author_organization] [author_role] [mode]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/2020-04-01')
# Index argsme/2020-04-01
indexer = pt.IterDictIndexer('./indices/argsme_2020-04-01')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['conclusion', 'premises_texts', 'aspects_names', 'source_id', 'source_title', 'topic'])
You can find more details about PyTerrier indexing here.
Bibtex:
@inproceedings{Wachsmuth2017Argument, author = {Henning Wachsmuth and Martin Potthast and Khalid Al-Khatib and Yamen Ajjour and Jana Puschmann and Jiani Qu and Jonas Dorsch and Viorel Morari and Janek Bevendorff and Benno Stein}, booktitle = {4th Workshop on Argument Mining (ArgMining 2017) at EMNLP}, editor = {Kevin Ashley and Claire Cardie and Nancy Green and Iryna Gurevych and Ivan Habernal and Diane Litman and Georgios Petasis and Chris Reed and Noam Slonim and Vern Walker}, month = sep, pages = {49-59}, publisher = {Association for Computational Linguistics}, site = {Copenhagen, Denmark}, title = {{Building an Argument Search Engine for the Web}}, url = {https://www.aclweb.org/anthology/W17-5106}, year = 2017 } @inproceedings{Ajjour2019Acquisition, address = {Berlin Heidelberg New York}, author = {Yamen Ajjour and Henning Wachsmuth and Johannes Kiesel and Martin Potthast and Matthias Hagen and Benno Stein}, booktitle = {42nd German Conference on Artificial Intelligence (KI 2019)}, doi = {10.1007/978-3-030-30179-8\_4}, editor = {Christoph Benzm{\"u}ller and Heiner Stuckenschmidt}, month = sep, pages = {48-59}, publisher = {Springer}, site = {Kassel, Germany}, title = {{Data Acquisition for Argument Search: The args.me corpus}}, year = 2019 }{ "docs": { "count": 387740, "fields": { "doc_id": { "max_len": 19, "common_prefix": "S" } } } }
Subset of the 338 620 arguments from args.me version 2020-04-01 that were crawled from the debate portal Debate.org.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/debateorg")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>
You can find more details about the Python API here.
ir_datasets export argsme/2020-04-01/debateorg docs
[doc_id] [conclusion] [premises] [premises_texts] [aspects] [aspects_names] [source_id] [source_title] [source_url] [source_previous_argument_id] [source_next_argument_id] [source_domain] [source_text] [source_text_conclusion_start] [source_text_conclusion_end] [source_text_premise_start] [source_text_premise_end] [topic] [acquisition] [date] [author] [author_image_url] [author_organization] [author_role] [mode]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/2020-04-01/debateorg')
# Index argsme/2020-04-01/debateorg
indexer = pt.IterDictIndexer('./indices/argsme_2020-04-01_debateorg')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['conclusion', 'premises_texts', 'aspects_names', 'source_id', 'source_title', 'topic'])
You can find more details about PyTerrier indexing here.
Bibtex:
@inproceedings{Wachsmuth2017Argument, author = {Henning Wachsmuth and Martin Potthast and Khalid Al-Khatib and Yamen Ajjour and Jana Puschmann and Jiani Qu and Jonas Dorsch and Viorel Morari and Janek Bevendorff and Benno Stein}, booktitle = {4th Workshop on Argument Mining (ArgMining 2017) at EMNLP}, editor = {Kevin Ashley and Claire Cardie and Nancy Green and Iryna Gurevych and Ivan Habernal and Diane Litman and Georgios Petasis and Chris Reed and Noam Slonim and Vern Walker}, month = sep, pages = {49-59}, publisher = {Association for Computational Linguistics}, site = {Copenhagen, Denmark}, title = {{Building an Argument Search Engine for the Web}}, url = {https://www.aclweb.org/anthology/W17-5106}, year = 2017 } @inproceedings{Ajjour2019Acquisition, address = {Berlin Heidelberg New York}, author = {Yamen Ajjour and Henning Wachsmuth and Johannes Kiesel and Martin Potthast and Matthias Hagen and Benno Stein}, booktitle = {42nd German Conference on Artificial Intelligence (KI 2019)}, doi = {10.1007/978-3-030-30179-8\_4}, editor = {Christoph Benzm{\"u}ller and Heiner Stuckenschmidt}, month = sep, pages = {48-59}, publisher = {Springer}, site = {Kassel, Germany}, title = {{Data Acquisition for Argument Search: The args.me corpus}}, year = 2019 }{ "docs": { "count": 338620, "fields": { "doc_id": { "max_len": 19, "common_prefix": "S" } } } }
Subset of the 21 197 arguments from args.me version 2020-04-01 that were crawled from the debate portal Debatepedia.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/debatepedia")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>
You can find more details about the Python API here.
ir_datasets export argsme/2020-04-01/debatepedia docs
[doc_id] [conclusion] [premises] [premises_texts] [aspects] [aspects_names] [source_id] [source_title] [source_url] [source_previous_argument_id] [source_next_argument_id] [source_domain] [source_text] [source_text_conclusion_start] [source_text_conclusion_end] [source_text_premise_start] [source_text_premise_end] [topic] [acquisition] [date] [author] [author_image_url] [author_organization] [author_role] [mode]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/2020-04-01/debatepedia')
# Index argsme/2020-04-01/debatepedia
indexer = pt.IterDictIndexer('./indices/argsme_2020-04-01_debatepedia')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['conclusion', 'premises_texts', 'aspects_names', 'source_id', 'source_title', 'topic'])
You can find more details about PyTerrier indexing here.
Bibtex:
@inproceedings{Wachsmuth2017Argument, author = {Henning Wachsmuth and Martin Potthast and Khalid Al-Khatib and Yamen Ajjour and Jana Puschmann and Jiani Qu and Jonas Dorsch and Viorel Morari and Janek Bevendorff and Benno Stein}, booktitle = {4th Workshop on Argument Mining (ArgMining 2017) at EMNLP}, editor = {Kevin Ashley and Claire Cardie and Nancy Green and Iryna Gurevych and Ivan Habernal and Diane Litman and Georgios Petasis and Chris Reed and Noam Slonim and Vern Walker}, month = sep, pages = {49-59}, publisher = {Association for Computational Linguistics}, site = {Copenhagen, Denmark}, title = {{Building an Argument Search Engine for the Web}}, url = {https://www.aclweb.org/anthology/W17-5106}, year = 2017 } @inproceedings{Ajjour2019Acquisition, address = {Berlin Heidelberg New York}, author = {Yamen Ajjour and Henning Wachsmuth and Johannes Kiesel and Martin Potthast and Matthias Hagen and Benno Stein}, booktitle = {42nd German Conference on Artificial Intelligence (KI 2019)}, doi = {10.1007/978-3-030-30179-8\_4}, editor = {Christoph Benzm{\"u}ller and Heiner Stuckenschmidt}, month = sep, pages = {48-59}, publisher = {Springer}, site = {Kassel, Germany}, title = {{Data Acquisition for Argument Search: The args.me corpus}}, year = 2019 }{ "docs": { "count": 21197, "fields": { "doc_id": { "max_len": 19, "common_prefix": "S" } } } }
Subset of the 14 353 arguments from args.me version 2020-04-01 that were crawled from the debate portal Debatewise.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/debatewise")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>
You can find more details about the Python API here.
ir_datasets export argsme/2020-04-01/debatewise docs
[doc_id] [conclusion] [premises] [premises_texts] [aspects] [aspects_names] [source_id] [source_title] [source_url] [source_previous_argument_id] [source_next_argument_id] [source_domain] [source_text] [source_text_conclusion_start] [source_text_conclusion_end] [source_text_premise_start] [source_text_premise_end] [topic] [acquisition] [date] [author] [author_image_url] [author_organization] [author_role] [mode]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/2020-04-01/debatewise')
# Index argsme/2020-04-01/debatewise
indexer = pt.IterDictIndexer('./indices/argsme_2020-04-01_debatewise')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['conclusion', 'premises_texts', 'aspects_names', 'source_id', 'source_title', 'topic'])
You can find more details about PyTerrier indexing here.
Bibtex:
@inproceedings{Wachsmuth2017Argument, author = {Henning Wachsmuth and Martin Potthast and Khalid Al-Khatib and Yamen Ajjour and Jana Puschmann and Jiani Qu and Jonas Dorsch and Viorel Morari and Janek Bevendorff and Benno Stein}, booktitle = {4th Workshop on Argument Mining (ArgMining 2017) at EMNLP}, editor = {Kevin Ashley and Claire Cardie and Nancy Green and Iryna Gurevych and Ivan Habernal and Diane Litman and Georgios Petasis and Chris Reed and Noam Slonim and Vern Walker}, month = sep, pages = {49-59}, publisher = {Association for Computational Linguistics}, site = {Copenhagen, Denmark}, title = {{Building an Argument Search Engine for the Web}}, url = {https://www.aclweb.org/anthology/W17-5106}, year = 2017 } @inproceedings{Ajjour2019Acquisition, address = {Berlin Heidelberg New York}, author = {Yamen Ajjour and Henning Wachsmuth and Johannes Kiesel and Martin Potthast and Matthias Hagen and Benno Stein}, booktitle = {42nd German Conference on Artificial Intelligence (KI 2019)}, doi = {10.1007/978-3-030-30179-8\_4}, editor = {Christoph Benzm{\"u}ller and Heiner Stuckenschmidt}, month = sep, pages = {48-59}, publisher = {Springer}, site = {Kassel, Germany}, title = {{Data Acquisition for Argument Search: The args.me corpus}}, year = 2019 }{ "docs": { "count": 14353, "fields": { "doc_id": { "max_len": 19, "common_prefix": "S" } } } }
Subset of the 13 522 arguments from args.me version 2020-04-01 that were crawled from the debate portal IDebate.org.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/idebate")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>
You can find more details about the Python API here.
ir_datasets export argsme/2020-04-01/idebate docs
[doc_id] [conclusion] [premises] [premises_texts] [aspects] [aspects_names] [source_id] [source_title] [source_url] [source_previous_argument_id] [source_next_argument_id] [source_domain] [source_text] [source_text_conclusion_start] [source_text_conclusion_end] [source_text_premise_start] [source_text_premise_end] [topic] [acquisition] [date] [author] [author_image_url] [author_organization] [author_role] [mode]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/2020-04-01/idebate')
# Index argsme/2020-04-01/idebate
indexer = pt.IterDictIndexer('./indices/argsme_2020-04-01_idebate')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['conclusion', 'premises_texts', 'aspects_names', 'source_id', 'source_title', 'topic'])
You can find more details about PyTerrier indexing here.
Bibtex:
@inproceedings{Wachsmuth2017Argument, author = {Henning Wachsmuth and Martin Potthast and Khalid Al-Khatib and Yamen Ajjour and Jana Puschmann and Jiani Qu and Jonas Dorsch and Viorel Morari and Janek Bevendorff and Benno Stein}, booktitle = {4th Workshop on Argument Mining (ArgMining 2017) at EMNLP}, editor = {Kevin Ashley and Claire Cardie and Nancy Green and Iryna Gurevych and Ivan Habernal and Diane Litman and Georgios Petasis and Chris Reed and Noam Slonim and Vern Walker}, month = sep, pages = {49-59}, publisher = {Association for Computational Linguistics}, site = {Copenhagen, Denmark}, title = {{Building an Argument Search Engine for the Web}}, url = {https://www.aclweb.org/anthology/W17-5106}, year = 2017 } @inproceedings{Ajjour2019Acquisition, address = {Berlin Heidelberg New York}, author = {Yamen Ajjour and Henning Wachsmuth and Johannes Kiesel and Martin Potthast and Matthias Hagen and Benno Stein}, booktitle = {42nd German Conference on Artificial Intelligence (KI 2019)}, doi = {10.1007/978-3-030-30179-8\_4}, editor = {Christoph Benzm{\"u}ller and Heiner Stuckenschmidt}, month = sep, pages = {48-59}, publisher = {Springer}, site = {Kassel, Germany}, title = {{Data Acquisition for Argument Search: The args.me corpus}}, year = 2019 }{ "docs": { "count": 13522, "fields": { "doc_id": { "max_len": 19, "common_prefix": "S" } } } }
Subset of the 48 arguments from args.me version 2020-04-01 that were crawled from Canadian Parliament discussions.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/parliamentary")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>
You can find more details about the Python API here.
ir_datasets export argsme/2020-04-01/parliamentary docs
[doc_id] [conclusion] [premises] [premises_texts] [aspects] [aspects_names] [source_id] [source_title] [source_url] [source_previous_argument_id] [source_next_argument_id] [source_domain] [source_text] [source_text_conclusion_start] [source_text_conclusion_end] [source_text_premise_start] [source_text_premise_end] [topic] [acquisition] [date] [author] [author_image_url] [author_organization] [author_role] [mode]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/2020-04-01/parliamentary')
# Index argsme/2020-04-01/parliamentary
indexer = pt.IterDictIndexer('./indices/argsme_2020-04-01_parliamentary')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['conclusion', 'premises_texts', 'aspects_names', 'source_id', 'source_title', 'topic'])
You can find more details about PyTerrier indexing here.
Bibtex:
@inproceedings{Wachsmuth2017Argument, author = {Henning Wachsmuth and Martin Potthast and Khalid Al-Khatib and Yamen Ajjour and Jana Puschmann and Jiani Qu and Jonas Dorsch and Viorel Morari and Janek Bevendorff and Benno Stein}, booktitle = {4th Workshop on Argument Mining (ArgMining 2017) at EMNLP}, editor = {Kevin Ashley and Claire Cardie and Nancy Green and Iryna Gurevych and Ivan Habernal and Diane Litman and Georgios Petasis and Chris Reed and Noam Slonim and Vern Walker}, month = sep, pages = {49-59}, publisher = {Association for Computational Linguistics}, site = {Copenhagen, Denmark}, title = {{Building an Argument Search Engine for the Web}}, url = {https://www.aclweb.org/anthology/W17-5106}, year = 2017 } @inproceedings{Ajjour2019Acquisition, address = {Berlin Heidelberg New York}, author = {Yamen Ajjour and Henning Wachsmuth and Johannes Kiesel and Martin Potthast and Matthias Hagen and Benno Stein}, booktitle = {42nd German Conference on Artificial Intelligence (KI 2019)}, doi = {10.1007/978-3-030-30179-8\_4}, editor = {Christoph Benzm{\"u}ller and Heiner Stuckenschmidt}, month = sep, pages = {48-59}, publisher = {Springer}, site = {Kassel, Germany}, title = {{Data Acquisition for Argument Search: The args.me corpus}}, year = 2019 }{ "docs": { "count": 48, "fields": { "doc_id": { "max_len": 19, "common_prefix": "S" } } } }
Decision making processes, be it at the societal or at the personal level, eventually come to a point where one side will challenge the other with a why-question, which is a prompt to justify one's stance. Thus, technologies for argument mining and argumentation processing are maturing at a rapid pace, giving rise for the first time to argument retrieval. Touché 2020 is the first lab on Argument Retrieval at CLEF 2020 featuring two tasks.
Given a question on a controversial topic, retrieve relevant arguments from a focused crawl of online debate portals (argsme/2020-04-01).
Documents are judged based on their general topical relevance.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/touche-2020-task-1")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, description, narrative>
You can find more details about the Python API here.
ir_datasets export argsme/2020-04-01/touche-2020-task-1 queries
[query_id] [title] [description] [narrative]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/2020-04-01/touche-2020-task-1')
index_ref = pt.IndexRef.of('./indices/argsme_2020-04-01') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('title'))
You can find more details about PyTerrier retrieval here.
Inherits docs from argsme/2020-04-01
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/touche-2020-task-1")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>
You can find more details about the Python API here.
ir_datasets export argsme/2020-04-01/touche-2020-task-1 docs
[doc_id] [conclusion] [premises] [premises_texts] [aspects] [aspects_names] [source_id] [source_title] [source_url] [source_previous_argument_id] [source_next_argument_id] [source_domain] [source_text] [source_text_conclusion_start] [source_text_conclusion_end] [source_text_premise_start] [source_text_premise_end] [topic] [acquisition] [date] [author] [author_image_url] [author_organization] [author_role] [mode]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/2020-04-01/touche-2020-task-1')
# Index argsme/2020-04-01
indexer = pt.IterDictIndexer('./indices/argsme_2020-04-01')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['conclusion', 'premises_texts', 'aspects_names', 'source_id', 'source_title', 'topic'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
-2 | spam | 751 | 32.7% |
0 | not relevant | 615 | 26.8% |
1 | relevant | 296 | 12.9% |
2 | highly relevant | 636 | 27.7% |
Examples:
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/touche-2020-task-1")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export argsme/2020-04-01/touche-2020-task-1 qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:argsme/2020-04-01/touche-2020-task-1')
index_ref = pt.IndexRef.of('./indices/argsme_2020-04-01') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('title'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
Bibtex:
@inproceedings{Bondarenko2020Touche, address = {Berlin Heidelberg New York}, author = {Alexander Bondarenko and Maik Fr{\"o}be and Meriem Beloucif and Lukas Gienapp and Yamen Ajjour and Alexander Panchenko and Chris Biemann and Benno Stein and Henning Wachsmuth and Martin Potthast and Matthias Hagen}, booktitle = {Experimental IR Meets Multilinguality, Multimodality, and Interaction. 11th International Conference of the CLEF Association (CLEF 2020)}, doi = {10.1007/978-3-030-58219-7\_26}, editor = {Avi Arampatzis and Evangelos Kanoulas and Theodora Tsikrika and Stefanos Vrochidis and Hideo Joho and Christina Lioma and Carsten Eickhoff and Aur{\'e}lie N{\'e}v{\'e}ol and Linda Cappellato and Nicola Ferro}, month = sep, pages = {384-395}, publisher = {Springer}, series = {Lecture Notes in Computer Science}, site = {Thessaloniki, Greece}, title = {{Overview of Touch{\'e} 2020: Argument Retrieval}}, url = {https://link.springer.com/chapter/10.1007/978-3-030-58219-7_26}, volume = 12260, year = 2020, } @inproceedings{Wachsmuth2017Quality, author = {Henning Wachsmuth and Nona Naderi and Yufang Hou and Yonatan Bilu and Vinodkumar Prabhakaran and Tim Alberdingk Thijm and Graeme Hirst and Benno Stein}, booktitle = {15th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2017)}, editor = {Phil Blunsom and Alexander Koller and Mirella Lapata}, month = apr, pages = {176-187}, site = {Valencia, Spain}, title = {{Computational Argumentation Quality Assessment in Natural Language}}, url = {http://aclweb.org/anthology/E17-1017}, year = 2017 }{ "docs": { "count": 387740, "fields": { "doc_id": { "max_len": 19, "common_prefix": "S" } } }, "queries": { "count": 49 }, "qrels": { "count": 2298, "fields": { "relevance": { "counts_by_value": { "0": 615, "1": 296, "-2": 751, "2": 636 } } } } }
Version of argsme/2020-04-01/touche-2020-task-1 that uses uncorrected relevance judgements derived from crowdworkers. This dataset's relevance judgements should not be used without preprocessing.
Inherits queries from argsme/2020-04-01/touche-2020-task-1
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/touche-2020-task-1/uncorrected")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, description, narrative>
You can find more details about the Python API here.
ir_datasets export argsme/2020-04-01/touche-2020-task-1/uncorrected queries
[query_id] [title] [description] [narrative]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/2020-04-01/touche-2020-task-1/uncorrected')
index_ref = pt.IndexRef.of('./indices/argsme_2020-04-01') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('title'))
You can find more details about PyTerrier retrieval here.
Inherits docs from argsme/2020-04-01
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/touche-2020-task-1/uncorrected")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>
You can find more details about the Python API here.
ir_datasets export argsme/2020-04-01/touche-2020-task-1/uncorrected docs
[doc_id] [conclusion] [premises] [premises_texts] [aspects] [aspects_names] [source_id] [source_title] [source_url] [source_previous_argument_id] [source_next_argument_id] [source_domain] [source_text] [source_text_conclusion_start] [source_text_conclusion_end] [source_text_premise_start] [source_text_premise_end] [topic] [acquisition] [date] [author] [author_image_url] [author_organization] [author_role] [mode]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/2020-04-01/touche-2020-task-1/uncorrected')
# Index argsme/2020-04-01
indexer = pt.IterDictIndexer('./indices/argsme_2020-04-01')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['conclusion', 'premises_texts', 'aspects_names', 'source_id', 'source_title', 'topic'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
-2 | spam, non-argument | 380 | 16.5% |
1 | very low relevance | 144 | 6.3% |
2 | low relevance | 199 | 8.7% |
3 | moderate relevance | 485 | 21.1% |
4 | high relevance | 665 | 28.9% |
5 | very high relevance | 425 | 18.5% |
Examples:
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/touche-2020-task-1/uncorrected")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export argsme/2020-04-01/touche-2020-task-1/uncorrected qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:argsme/2020-04-01/touche-2020-task-1/uncorrected')
index_ref = pt.IndexRef.of('./indices/argsme_2020-04-01') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('title'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
Bibtex:
@inproceedings{Bondarenko2020Touche, address = {Berlin Heidelberg New York}, author = {Alexander Bondarenko and Maik Fr{\"o}be and Meriem Beloucif and Lukas Gienapp and Yamen Ajjour and Alexander Panchenko and Chris Biemann and Benno Stein and Henning Wachsmuth and Martin Potthast and Matthias Hagen}, booktitle = {Experimental IR Meets Multilinguality, Multimodality, and Interaction. 11th International Conference of the CLEF Association (CLEF 2020)}, doi = {10.1007/978-3-030-58219-7\_26}, editor = {Avi Arampatzis and Evangelos Kanoulas and Theodora Tsikrika and Stefanos Vrochidis and Hideo Joho and Christina Lioma and Carsten Eickhoff and Aur{\'e}lie N{\'e}v{\'e}ol and Linda Cappellato and Nicola Ferro}, month = sep, pages = {384-395}, publisher = {Springer}, series = {Lecture Notes in Computer Science}, site = {Thessaloniki, Greece}, title = {{Overview of Touch{\'e} 2020: Argument Retrieval}}, url = {https://link.springer.com/chapter/10.1007/978-3-030-58219-7_26}, volume = 12260, year = 2020, } @inproceedings{Wachsmuth2017Quality, author = {Henning Wachsmuth and Nona Naderi and Yufang Hou and Yonatan Bilu and Vinodkumar Prabhakaran and Tim Alberdingk Thijm and Graeme Hirst and Benno Stein}, booktitle = {15th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2017)}, editor = {Phil Blunsom and Alexander Koller and Mirella Lapata}, month = apr, pages = {176-187}, site = {Valencia, Spain}, title = {{Computational Argumentation Quality Assessment in Natural Language}}, url = {http://aclweb.org/anthology/E17-1017}, year = 2017 }{ "docs": { "count": 387740, "fields": { "doc_id": { "max_len": 19, "common_prefix": "S" } } }, "queries": { "count": 49 }, "qrels": { "count": 2298, "fields": { "relevance": { "counts_by_value": { "4": 665, "3": 485, "-2": 380, "5": 425, "2": 199, "1": 144 } } } } }
Decision making processes, be it at the societal or at the personal level, often come to a point where one side challenges the other with a why-question, which is a prompt to justify some stance based on arguments. Since technologies for argument mining are maturing at a rapid pace, also ad-hoc argument retrieval becomes a feasible task in reach. Touché 2021 is the second lab on argument retrieval at CLEF 2021 featuring two tasks.
Given a question on a controversial topic, retrieve relevant arguments from a focused crawl of online debate portals (argsme/2020-04-01).
Documents are judged based on their general topical relevance and for rhetorical quality, i.e., "well-writtenness" of the document: (1) whether the text has a good style of speech (formal language is preferred over informal), (2) whether the text has a proper sentence structure and is easy to read, (3) whether it includes profanity, has typos, and makes use of other detrimental style choices.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/touche-2021-task-1")
for query in dataset.queries_iter():
query # namedtuple<query_id, title>
You can find more details about the Python API here.
ir_datasets export argsme/2020-04-01/touche-2021-task-1 queries
[query_id] [title]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/2020-04-01/touche-2021-task-1')
index_ref = pt.IndexRef.of('./indices/argsme_2020-04-01') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
Inherits docs from argsme/2020-04-01
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/touche-2021-task-1")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>
You can find more details about the Python API here.
ir_datasets export argsme/2020-04-01/touche-2021-task-1 docs
[doc_id] [conclusion] [premises] [premises_texts] [aspects] [aspects_names] [source_id] [source_title] [source_url] [source_previous_argument_id] [source_next_argument_id] [source_domain] [source_text] [source_text_conclusion_start] [source_text_conclusion_end] [source_text_premise_start] [source_text_premise_end] [topic] [acquisition] [date] [author] [author_image_url] [author_organization] [author_role] [mode]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/2020-04-01/touche-2021-task-1')
# Index argsme/2020-04-01
indexer = pt.IterDictIndexer('./indices/argsme_2020-04-01')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['conclusion', 'premises_texts', 'aspects_names', 'source_id', 'source_title', 'topic'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
-2 | spam | 351 | 9.5% |
0 | not relevant | 1.5K | 41.6% |
1 | relevant | 736 | 19.8% |
2 | highly relevant | 1.1K | 29.2% |
Examples:
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/touche-2021-task-1")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, quality, iteration>
You can find more details about the Python API here.
ir_datasets export argsme/2020-04-01/touche-2021-task-1 qrels --format tsv
[query_id] [doc_id] [relevance] [quality] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:argsme/2020-04-01/touche-2021-task-1')
index_ref = pt.IndexRef.of('./indices/argsme_2020-04-01') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
Bibtex:
@inproceedings{Bondarenko2021Touche, address = {Berlin Heidelberg New York}, author = {Alexander Bondarenko and Lukas Gienapp and Maik Fr{\"o}be and Meriem Beloucif and Yamen Ajjour and Alexander Panchenko and Chris Biemann and Benno Stein and Henning Wachsmuth and Martin Potthast and Matthias Hagen}, booktitle = {Experimental IR Meets Multilinguality, Multimodality, and Interaction. 12th International Conference of the CLEF Association (CLEF 2021)}, doi = {10.1007/978-3-030-85251-1\_28}, editor = {{K. Sel{\c{c}}uk} Candan and Bogdan Ionescu and Lorraine Goeuriot and Henning M{\"u}ller and Alexis Joly and Maria Maistro and Florina Piroi and Guglielmo Faggioli and Nicola Ferro}, month = sep, pages = {450-467}, publisher = {Springer}, series = {Lecture Notes in Computer Science}, site = {Bucharest, Romania}, title = {{Overview of Touch{\'e} 2021: Argument Retrieval}}, url = {https://link.springer.com/chapter/10.1007/978-3-030-85251-1_28}, volume = 12880, year = 2021, }{ "docs": { "count": 387740, "fields": { "doc_id": { "max_len": 19, "common_prefix": "S" } } }, "queries": { "count": 50 }, "qrels": { "count": 3711, "fields": { "relevance": { "counts_by_value": { "2": 1082, "0": 1542, "1": 736, "-2": 351 } } } } }