`ir_datasets`: args.me

Index

argsme
argsme/1.0
argsme/1.0-cleaned
argsme/1.0/touche-2020-task-1/uncorrected
argsme/2020-04-01
argsme/2020-04-01/debateorg
argsme/2020-04-01/debatepedia
argsme/2020-04-01/debatewise
argsme/2020-04-01/idebate
argsme/2020-04-01/parliamentary
argsme/2020-04-01/touche-2020-task-1
argsme/2020-04-01/touche-2020-task-1/uncorrected
argsme/2020-04-01/touche-2021-task-1

`"argsme"`

The args.me corpus is one of the largest argument resources available and contains arguments crawled from debate platforms and parliament discussions.

Citation

ir_datasets.bib:

\cite{Wachsmuth2017Argument,Ajjour2019Acquisition}

Bibtex:

@inproceedings{Wachsmuth2017Argument, author = {Henning Wachsmuth and Martin Potthast and Khalid Al-Khatib and Yamen Ajjour and Jana Puschmann and Jiani Qu and Jonas Dorsch and Viorel Morari and Janek Bevendorff and Benno Stein}, booktitle = {4th Workshop on Argument Mining (ArgMining 2017) at EMNLP}, editor = {Kevin Ashley and Claire Cardie and Nancy Green and Iryna Gurevych and Ivan Habernal and Diane Litman and Georgios Petasis and Chris Reed and Noam Slonim and Vern Walker}, month = sep, pages = {49-59}, publisher = {Association for Computational Linguistics}, site = {Copenhagen, Denmark}, title = {{Building an Argument Search Engine for the Web}}, url = {https://www.aclweb.org/anthology/W17-5106}, year = 2017 } @inproceedings{Ajjour2019Acquisition, address = {Berlin Heidelberg New York}, author = {Yamen Ajjour and Henning Wachsmuth and Johannes Kiesel and Martin Potthast and Matthias Hagen and Benno Stein}, booktitle = {42nd German Conference on Artificial Intelligence (KI 2019)}, doi = {10.1007/978-3-030-30179-8\_4}, editor = {Christoph Benzm{\"u}ller and Heiner Stuckenschmidt}, month = sep, pages = {48-59}, publisher = {Springer}, site = {Kassel, Germany}, title = {{Data Acquisition for Argument Search: The args.me corpus}}, year = 2019 }

`"argsme/1.0"`

Corpus version 1.0 with 387 606 arguments crawled from Debatewise, IDebate.org, Debatepedia, Debate.org. It was released on July 9, 2019 on Zenodo.

This collection is licensed with the Creative Commons Attribution 4.0 International. Individual rights to the content still apply.

docs

388K docs

Language: en

Document type:

ArgsMeDoc: (namedtuple)

doc_id: str
conclusion: str
premises: List[
ArgsMePremise: (namedtuple)
1. text: str
2. stance: ArgsMeStance[PRO, CON]
3. annotations: List[
  ArgsMePremiseAnnotation: (namedtuple)
  1. start: int
  2. end: int
  3. source: str
  ]
]
premises_texts: str
aspects: List[
ArgsMeAspect: (namedtuple)
1. name: str
2. weight: float
3. normalized_weight: float
4. rank: int
]
aspects_names: str
source_id: str
source_title: str
source_url: Optional[str]
source_previous_argument_id: Optional[str]
source_next_argument_id: Optional[str]
source_domain: Optional[ArgsMeSourceDomain[debateorg, debatepedia, debatewise, idebate, canadian_parliament]]
source_text: Optional[str]
source_text_conclusion_start: Optional[int]
source_text_conclusion_end: Optional[int]
source_text_premise_start: Optional[int]
source_text_premise_end: Optional[int]
topic: str
acquisition: datetime
date: Optional[datetime]
author: Optional[str]
author_image_url: Optional[str]
author_organization: Optional[str]
author_role: Optional[str]
mode: Optional[ArgsMeMode[person, discussion]]

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("argsme/1.0")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>

You can find more details about the Python API here.

CLI

ir_datasets export argsme/1.0 docs



[doc_id]    [conclusion]    [premises]    [premises_texts]    [aspects]    [aspects_names]    [source_id]    [source_title]    [source_url]    [source_previous_argument_id]    [source_next_argument_id]    [source_domain]    [source_text]    [source_text_conclusion_start]    [source_text_conclusion_end]    [source_text_premise_start]    [source_text_premise_end]    [topic]    [acquisition]    [date]    [author]    [author_image_url]    [author_organization]    [author_role]    [mode]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/1.0')
# Index argsme/1.0
indexer = pt.IterDictIndexer('./indices/argsme_1.0')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['conclusion', 'premises_texts', 'aspects_names', 'source_id', 'source_title', 'topic'])

You can find more details about PyTerrier indexing here.

Citation

ir_datasets.bib:

\cite{Wachsmuth2017Argument,Ajjour2019Acquisition}

Bibtex:

Metadata

{
  "docs": {
    "count": 387692,
    "fields": {
      "doc_id": {
        "max_len": 39,
        "common_prefix": ""
      }
    }
  }
}

`"argsme/1.0-cleaned"`

Corpus version 1.0-cleaned with 382 545 arguments crawled from Debatewise, IDebate.org, Debatepedia, Debate.org. This version contains the same arguments as version 1.0, but cleaned as described in the corresponding publication. It was released on October 27, 2020 on Zenodo.

This collection is licensed with the Creative Commons Attribution 4.0 International. Individual rights to the content still apply.

docs

383K docs

Language: en

Document type:

ArgsMeDoc: (namedtuple)

doc_id: str
conclusion: str
premises: List[
ArgsMePremise: (namedtuple)
1. text: str
2. stance: ArgsMeStance[PRO, CON]
3. annotations: List[
  ArgsMePremiseAnnotation: (namedtuple)
  1. start: int
  2. end: int
  3. source: str
  ]
]
premises_texts: str
aspects: List[
ArgsMeAspect: (namedtuple)
1. name: str
2. weight: float
3. normalized_weight: float
4. rank: int
]
aspects_names: str
source_id: str
source_title: str
source_url: Optional[str]
source_previous_argument_id: Optional[str]
source_next_argument_id: Optional[str]
source_domain: Optional[ArgsMeSourceDomain[debateorg, debatepedia, debatewise, idebate, canadian_parliament]]
source_text: Optional[str]
source_text_conclusion_start: Optional[int]
source_text_conclusion_end: Optional[int]
source_text_premise_start: Optional[int]
source_text_premise_end: Optional[int]
topic: str
acquisition: datetime
date: Optional[datetime]
author: Optional[str]
author_image_url: Optional[str]
author_organization: Optional[str]
author_role: Optional[str]
mode: Optional[ArgsMeMode[person, discussion]]

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("argsme/1.0-cleaned")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>

You can find more details about the Python API here.

CLI

ir_datasets export argsme/1.0-cleaned docs



[doc_id]    [conclusion]    [premises]    [premises_texts]    [aspects]    [aspects_names]    [source_id]    [source_title]    [source_url]    [source_previous_argument_id]    [source_next_argument_id]    [source_domain]    [source_text]    [source_text_conclusion_start]    [source_text_conclusion_end]    [source_text_premise_start]    [source_text_premise_end]    [topic]    [acquisition]    [date]    [author]    [author_image_url]    [author_organization]    [author_role]    [mode]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/1.0-cleaned')
# Index argsme/1.0-cleaned
indexer = pt.IterDictIndexer('./indices/argsme_1.0-cleaned')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['conclusion', 'premises_texts', 'aspects_names', 'source_id', 'source_title', 'topic'])

You can find more details about PyTerrier indexing here.

Citation

ir_datasets.bib:

\cite{Wachsmuth2017Argument,Ajjour2019Acquisition}

Bibtex:

Metadata

{
  "docs": {
    "count": 382545,
    "fields": {
      "doc_id": {
        "max_len": 39,
        "common_prefix": ""
      }
    }
  }
}

`"argsme/1.0/touche-2020-task-1/uncorrected"`

Version of argsme/2020-04-01/touche-2020-task-1 that uses the argsme/1.0 corpus with uncorrected relevance judgements derived from crowdworkers. This dataset's relevance judgements should not be used without preprocessing.

queries

49 queries

Language: en

Query type:

ToucheQuery: (namedtuple)

query_id: str
title: str
description: str
narrative: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("argsme/1.0/touche-2020-task-1/uncorrected")
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, description, narrative>

You can find more details about the Python API here.

CLI

ir_datasets export argsme/1.0/touche-2020-task-1/uncorrected queries



[query_id]    [title]    [description]    [narrative]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/1.0/touche-2020-task-1/uncorrected')
index_ref = pt.IndexRef.of('./indices/argsme_1.0') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('title'))

You can find more details about PyTerrier retrieval here.

docs

388K docs

Inherits docs from argsme/1.0

Language: en

Document type:

ArgsMeDoc: (namedtuple)

doc_id: str
conclusion: str
premises: List[
ArgsMePremise: (namedtuple)
1. text: str
2. stance: ArgsMeStance[PRO, CON]
3. annotations: List[
  ArgsMePremiseAnnotation: (namedtuple)
  1. start: int
  2. end: int
  3. source: str
  ]
]
premises_texts: str
aspects: List[
ArgsMeAspect: (namedtuple)
1. name: str
2. weight: float
3. normalized_weight: float
4. rank: int
]
aspects_names: str
source_id: str
source_title: str
source_url: Optional[str]
source_previous_argument_id: Optional[str]
source_next_argument_id: Optional[str]
source_domain: Optional[ArgsMeSourceDomain[debateorg, debatepedia, debatewise, idebate, canadian_parliament]]
source_text: Optional[str]
source_text_conclusion_start: Optional[int]
source_text_conclusion_end: Optional[int]
source_text_premise_start: Optional[int]
source_text_premise_end: Optional[int]
topic: str
acquisition: datetime
date: Optional[datetime]
author: Optional[str]
author_image_url: Optional[str]
author_organization: Optional[str]
author_role: Optional[str]
mode: Optional[ArgsMeMode[person, discussion]]

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("argsme/1.0/touche-2020-task-1/uncorrected")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>

You can find more details about the Python API here.

CLI

ir_datasets export argsme/1.0/touche-2020-task-1/uncorrected docs



[doc_id]    [conclusion]    [premises]    [premises_texts]    [aspects]    [aspects_names]    [source_id]    [source_title]    [source_url]    [source_previous_argument_id]    [source_next_argument_id]    [source_domain]    [source_text]    [source_text_conclusion_start]    [source_text_conclusion_end]    [source_text_premise_start]    [source_text_premise_end]    [topic]    [acquisition]    [date]    [author]    [author_image_url]    [author_organization]    [author_role]    [mode]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/1.0/touche-2020-task-1/uncorrected')
# Index argsme/1.0
indexer = pt.IterDictIndexer('./indices/argsme_1.0')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['conclusion', 'premises_texts', 'aspects_names', 'source_id', 'source_title', 'topic'])

You can find more details about PyTerrier indexing here.

qrels

3.0K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
-2	spam, non-argument	`551`	18.6%
1	very low relevance	`186`	6.3%
2	low relevance	`195`	6.6%
3	moderate relevance	`628`	21.2%
4	high relevance	`1.0K`	33.9%
5	very high relevance	`398`	13.4%

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("argsme/1.0/touche-2020-task-1/uncorrected")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export argsme/1.0/touche-2020-task-1/uncorrected qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:argsme/1.0/touche-2020-task-1/uncorrected')
index_ref = pt.IndexRef.of('./indices/argsme_1.0') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics('title'),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Bondarenko2020Touche,Wachsmuth2017Quality}

Bibtex:

@inproceedings{Bondarenko2020Touche, address = {Berlin Heidelberg New York}, author = {Alexander Bondarenko and Maik Fr{\"o}be and Meriem Beloucif and Lukas Gienapp and Yamen Ajjour and Alexander Panchenko and Chris Biemann and Benno Stein and Henning Wachsmuth and Martin Potthast and Matthias Hagen}, booktitle = {Experimental IR Meets Multilinguality, Multimodality, and Interaction. 11th International Conference of the CLEF Association (CLEF 2020)}, doi = {10.1007/978-3-030-58219-7\_26}, editor = {Avi Arampatzis and Evangelos Kanoulas and Theodora Tsikrika and Stefanos Vrochidis and Hideo Joho and Christina Lioma and Carsten Eickhoff and Aur{\'e}lie N{\'e}v{\'e}ol and Linda Cappellato and Nicola Ferro}, month = sep, pages = {384-395}, publisher = {Springer}, series = {Lecture Notes in Computer Science}, site = {Thessaloniki, Greece}, title = {{Overview of Touch{\'e} 2020: Argument Retrieval}}, url = {https://link.springer.com/chapter/10.1007/978-3-030-58219-7_26}, volume = 12260, year = 2020, } @inproceedings{Wachsmuth2017Quality, author = {Henning Wachsmuth and Nona Naderi and Yufang Hou and Yonatan Bilu and Vinodkumar Prabhakaran and Tim Alberdingk Thijm and Graeme Hirst and Benno Stein}, booktitle = {15th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2017)}, editor = {Phil Blunsom and Alexander Koller and Mirella Lapata}, month = apr, pages = {176-187}, site = {Valencia, Spain}, title = {{Computational Argumentation Quality Assessment in Natural Language}}, url = {http://aclweb.org/anthology/E17-1017}, year = 2017 }

Metadata

{
  "docs": {
    "count": 387692,
    "fields": {
      "doc_id": {
        "max_len": 39,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 49
  },
  "qrels": {
    "count": 2964,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "4": 1006,
          "5": 398,
          "3": 628,
          "2": 195,
          "-2": 551,
          "1": 186
        }
      }
    }
  }
}

`"argsme/2020-04-01"`

Corpus version 2020-04-01 with 387 740 arguments crawled from Debatewise, IDebate.org, Debatepedia, Debate.org, and from Canadian Parliament discussions. It was released on April 1, 2020 on Zenodo.

This collection is licensed with the Creative Commons Attribution 4.0 International. Individual rights to the content still apply.

docs

388K docs

Language: en

Document type:

ArgsMeDoc: (namedtuple)

doc_id: str
conclusion: str
premises: List[
ArgsMePremise: (namedtuple)
1. text: str
2. stance: ArgsMeStance[PRO, CON]
3. annotations: List[
  ArgsMePremiseAnnotation: (namedtuple)
  1. start: int
  2. end: int
  3. source: str
  ]
]
premises_texts: str
aspects: List[
ArgsMeAspect: (namedtuple)
1. name: str
2. weight: float
3. normalized_weight: float
4. rank: int
]
aspects_names: str
source_id: str
source_title: str
source_url: Optional[str]
source_previous_argument_id: Optional[str]
source_next_argument_id: Optional[str]
source_domain: Optional[ArgsMeSourceDomain[debateorg, debatepedia, debatewise, idebate, canadian_parliament]]
source_text: Optional[str]
source_text_conclusion_start: Optional[int]
source_text_conclusion_end: Optional[int]
source_text_premise_start: Optional[int]
source_text_premise_end: Optional[int]
topic: str
acquisition: datetime
date: Optional[datetime]
author: Optional[str]
author_image_url: Optional[str]
author_organization: Optional[str]
author_role: Optional[str]
mode: Optional[ArgsMeMode[person, discussion]]

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>

You can find more details about the Python API here.

CLI

ir_datasets export argsme/2020-04-01 docs



[doc_id]    [conclusion]    [premises]    [premises_texts]    [aspects]    [aspects_names]    [source_id]    [source_title]    [source_url]    [source_previous_argument_id]    [source_next_argument_id]    [source_domain]    [source_text]    [source_text_conclusion_start]    [source_text_conclusion_end]    [source_text_premise_start]    [source_text_premise_end]    [topic]    [acquisition]    [date]    [author]    [author_image_url]    [author_organization]    [author_role]    [mode]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/2020-04-01')
# Index argsme/2020-04-01
indexer = pt.IterDictIndexer('./indices/argsme_2020-04-01')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['conclusion', 'premises_texts', 'aspects_names', 'source_id', 'source_title', 'topic'])

You can find more details about PyTerrier indexing here.

Citation

ir_datasets.bib:

\cite{Wachsmuth2017Argument,Ajjour2019Acquisition}

Bibtex:

Metadata

{
  "docs": {
    "count": 387740,
    "fields": {
      "doc_id": {
        "max_len": 19,
        "common_prefix": "S"
      }
    }
  }
}

`"argsme/2020-04-01/debateorg"`

Subset of the 338 620 arguments from args.me version 2020-04-01 that were crawled from the debate portal Debate.org.

docs

339K docs

Language: en

Document type:

ArgsMeDoc: (namedtuple)

doc_id: str
conclusion: str
premises: List[
ArgsMePremise: (namedtuple)
1. text: str
2. stance: ArgsMeStance[PRO, CON]
3. annotations: List[
  ArgsMePremiseAnnotation: (namedtuple)
  1. start: int
  2. end: int
  3. source: str
  ]
]
premises_texts: str
aspects: List[
ArgsMeAspect: (namedtuple)
1. name: str
2. weight: float
3. normalized_weight: float
4. rank: int
]
aspects_names: str
source_id: str
source_title: str
source_url: Optional[str]
source_previous_argument_id: Optional[str]
source_next_argument_id: Optional[str]
source_domain: Optional[ArgsMeSourceDomain[debateorg, debatepedia, debatewise, idebate, canadian_parliament]]
source_text: Optional[str]
source_text_conclusion_start: Optional[int]
source_text_conclusion_end: Optional[int]
source_text_premise_start: Optional[int]
source_text_premise_end: Optional[int]
topic: str
acquisition: datetime
date: Optional[datetime]
author: Optional[str]
author_image_url: Optional[str]
author_organization: Optional[str]
author_role: Optional[str]
mode: Optional[ArgsMeMode[person, discussion]]

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/debateorg")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>

You can find more details about the Python API here.

CLI

ir_datasets export argsme/2020-04-01/debateorg docs



[doc_id]    [conclusion]    [premises]    [premises_texts]    [aspects]    [aspects_names]    [source_id]    [source_title]    [source_url]    [source_previous_argument_id]    [source_next_argument_id]    [source_domain]    [source_text]    [source_text_conclusion_start]    [source_text_conclusion_end]    [source_text_premise_start]    [source_text_premise_end]    [topic]    [acquisition]    [date]    [author]    [author_image_url]    [author_organization]    [author_role]    [mode]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/2020-04-01/debateorg')
# Index argsme/2020-04-01/debateorg
indexer = pt.IterDictIndexer('./indices/argsme_2020-04-01_debateorg')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['conclusion', 'premises_texts', 'aspects_names', 'source_id', 'source_title', 'topic'])

You can find more details about PyTerrier indexing here.

Citation

ir_datasets.bib:

\cite{Wachsmuth2017Argument,Ajjour2019Acquisition}

Bibtex:

Metadata

{
  "docs": {
    "count": 338620,
    "fields": {
      "doc_id": {
        "max_len": 19,
        "common_prefix": "S"
      }
    }
  }
}

`"argsme/2020-04-01/debatepedia"`

Subset of the 21 197 arguments from args.me version 2020-04-01 that were crawled from the debate portal Debatepedia.

docs

21K docs

Language: en

Document type:

ArgsMeDoc: (namedtuple)

doc_id: str
conclusion: str
premises: List[
ArgsMePremise: (namedtuple)
1. text: str
2. stance: ArgsMeStance[PRO, CON]
3. annotations: List[
  ArgsMePremiseAnnotation: (namedtuple)
  1. start: int
  2. end: int
  3. source: str
  ]
]
premises_texts: str
aspects: List[
ArgsMeAspect: (namedtuple)
1. name: str
2. weight: float
3. normalized_weight: float
4. rank: int
]
aspects_names: str
source_id: str
source_title: str
source_url: Optional[str]
source_previous_argument_id: Optional[str]
source_next_argument_id: Optional[str]
source_domain: Optional[ArgsMeSourceDomain[debateorg, debatepedia, debatewise, idebate, canadian_parliament]]
source_text: Optional[str]
source_text_conclusion_start: Optional[int]
source_text_conclusion_end: Optional[int]
source_text_premise_start: Optional[int]
source_text_premise_end: Optional[int]
topic: str
acquisition: datetime
date: Optional[datetime]
author: Optional[str]
author_image_url: Optional[str]
author_organization: Optional[str]
author_role: Optional[str]
mode: Optional[ArgsMeMode[person, discussion]]

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/debatepedia")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>

You can find more details about the Python API here.

CLI

ir_datasets export argsme/2020-04-01/debatepedia docs



[doc_id]    [conclusion]    [premises]    [premises_texts]    [aspects]    [aspects_names]    [source_id]    [source_title]    [source_url]    [source_previous_argument_id]    [source_next_argument_id]    [source_domain]    [source_text]    [source_text_conclusion_start]    [source_text_conclusion_end]    [source_text_premise_start]    [source_text_premise_end]    [topic]    [acquisition]    [date]    [author]    [author_image_url]    [author_organization]    [author_role]    [mode]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/2020-04-01/debatepedia')
# Index argsme/2020-04-01/debatepedia
indexer = pt.IterDictIndexer('./indices/argsme_2020-04-01_debatepedia')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['conclusion', 'premises_texts', 'aspects_names', 'source_id', 'source_title', 'topic'])

You can find more details about PyTerrier indexing here.

Citation

ir_datasets.bib:

\cite{Wachsmuth2017Argument,Ajjour2019Acquisition}

Bibtex:

Metadata

{
  "docs": {
    "count": 21197,
    "fields": {
      "doc_id": {
        "max_len": 19,
        "common_prefix": "S"
      }
    }
  }
}

`"argsme/2020-04-01/debatewise"`

Subset of the 14 353 arguments from args.me version 2020-04-01 that were crawled from the debate portal Debatewise.

docs

14K docs

Language: en

Document type:

ArgsMeDoc: (namedtuple)

doc_id: str
conclusion: str
premises: List[
ArgsMePremise: (namedtuple)
1. text: str
2. stance: ArgsMeStance[PRO, CON]
3. annotations: List[
  ArgsMePremiseAnnotation: (namedtuple)
  1. start: int
  2. end: int
  3. source: str
  ]
]
premises_texts: str
aspects: List[
ArgsMeAspect: (namedtuple)
1. name: str
2. weight: float
3. normalized_weight: float
4. rank: int
]
aspects_names: str
source_id: str
source_title: str
source_url: Optional[str]
source_previous_argument_id: Optional[str]
source_next_argument_id: Optional[str]
source_domain: Optional[ArgsMeSourceDomain[debateorg, debatepedia, debatewise, idebate, canadian_parliament]]
source_text: Optional[str]
source_text_conclusion_start: Optional[int]
source_text_conclusion_end: Optional[int]
source_text_premise_start: Optional[int]
source_text_premise_end: Optional[int]
topic: str
acquisition: datetime
date: Optional[datetime]
author: Optional[str]
author_image_url: Optional[str]
author_organization: Optional[str]
author_role: Optional[str]
mode: Optional[ArgsMeMode[person, discussion]]

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/debatewise")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>

You can find more details about the Python API here.

CLI

ir_datasets export argsme/2020-04-01/debatewise docs



[doc_id]    [conclusion]    [premises]    [premises_texts]    [aspects]    [aspects_names]    [source_id]    [source_title]    [source_url]    [source_previous_argument_id]    [source_next_argument_id]    [source_domain]    [source_text]    [source_text_conclusion_start]    [source_text_conclusion_end]    [source_text_premise_start]    [source_text_premise_end]    [topic]    [acquisition]    [date]    [author]    [author_image_url]    [author_organization]    [author_role]    [mode]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/2020-04-01/debatewise')
# Index argsme/2020-04-01/debatewise
indexer = pt.IterDictIndexer('./indices/argsme_2020-04-01_debatewise')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['conclusion', 'premises_texts', 'aspects_names', 'source_id', 'source_title', 'topic'])

You can find more details about PyTerrier indexing here.

Citation

ir_datasets.bib:

\cite{Wachsmuth2017Argument,Ajjour2019Acquisition}

Bibtex:

Metadata

{
  "docs": {
    "count": 14353,
    "fields": {
      "doc_id": {
        "max_len": 19,
        "common_prefix": "S"
      }
    }
  }
}

`"argsme/2020-04-01/idebate"`

Subset of the 13 522 arguments from args.me version 2020-04-01 that were crawled from the debate portal IDebate.org.

docs

14K docs

Language: en

Document type:

ArgsMeDoc: (namedtuple)

doc_id: str
conclusion: str
premises: List[
ArgsMePremise: (namedtuple)
1. text: str
2. stance: ArgsMeStance[PRO, CON]
3. annotations: List[
  ArgsMePremiseAnnotation: (namedtuple)
  1. start: int
  2. end: int
  3. source: str
  ]
]
premises_texts: str
aspects: List[
ArgsMeAspect: (namedtuple)
1. name: str
2. weight: float
3. normalized_weight: float
4. rank: int
]
aspects_names: str
source_id: str
source_title: str
source_url: Optional[str]
source_previous_argument_id: Optional[str]
source_next_argument_id: Optional[str]
source_domain: Optional[ArgsMeSourceDomain[debateorg, debatepedia, debatewise, idebate, canadian_parliament]]
source_text: Optional[str]
source_text_conclusion_start: Optional[int]
source_text_conclusion_end: Optional[int]
source_text_premise_start: Optional[int]
source_text_premise_end: Optional[int]
topic: str
acquisition: datetime
date: Optional[datetime]
author: Optional[str]
author_image_url: Optional[str]
author_organization: Optional[str]
author_role: Optional[str]
mode: Optional[ArgsMeMode[person, discussion]]

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/idebate")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>

You can find more details about the Python API here.

CLI

ir_datasets export argsme/2020-04-01/idebate docs



[doc_id]    [conclusion]    [premises]    [premises_texts]    [aspects]    [aspects_names]    [source_id]    [source_title]    [source_url]    [source_previous_argument_id]    [source_next_argument_id]    [source_domain]    [source_text]    [source_text_conclusion_start]    [source_text_conclusion_end]    [source_text_premise_start]    [source_text_premise_end]    [topic]    [acquisition]    [date]    [author]    [author_image_url]    [author_organization]    [author_role]    [mode]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/2020-04-01/idebate')
# Index argsme/2020-04-01/idebate
indexer = pt.IterDictIndexer('./indices/argsme_2020-04-01_idebate')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['conclusion', 'premises_texts', 'aspects_names', 'source_id', 'source_title', 'topic'])

You can find more details about PyTerrier indexing here.

Citation

ir_datasets.bib:

\cite{Wachsmuth2017Argument,Ajjour2019Acquisition}

Bibtex:

Metadata

{
  "docs": {
    "count": 13522,
    "fields": {
      "doc_id": {
        "max_len": 19,
        "common_prefix": "S"
      }
    }
  }
}

`"argsme/2020-04-01/parliamentary"`

Subset of the 48 arguments from args.me version 2020-04-01 that were crawled from Canadian Parliament discussions.

docs

48 docs

Language: en

Document type:

ArgsMeDoc: (namedtuple)

doc_id: str
conclusion: str
premises: List[
ArgsMePremise: (namedtuple)
1. text: str
2. stance: ArgsMeStance[PRO, CON]
3. annotations: List[
  ArgsMePremiseAnnotation: (namedtuple)
  1. start: int
  2. end: int
  3. source: str
  ]
]
premises_texts: str
aspects: List[
ArgsMeAspect: (namedtuple)
1. name: str
2. weight: float
3. normalized_weight: float
4. rank: int
]
aspects_names: str
source_id: str
source_title: str
source_url: Optional[str]
source_previous_argument_id: Optional[str]
source_next_argument_id: Optional[str]
source_domain: Optional[ArgsMeSourceDomain[debateorg, debatepedia, debatewise, idebate, canadian_parliament]]
source_text: Optional[str]
source_text_conclusion_start: Optional[int]
source_text_conclusion_end: Optional[int]
source_text_premise_start: Optional[int]
source_text_premise_end: Optional[int]
topic: str
acquisition: datetime
date: Optional[datetime]
author: Optional[str]
author_image_url: Optional[str]
author_organization: Optional[str]
author_role: Optional[str]
mode: Optional[ArgsMeMode[person, discussion]]

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/parliamentary")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>

You can find more details about the Python API here.

CLI

ir_datasets export argsme/2020-04-01/parliamentary docs



[doc_id]    [conclusion]    [premises]    [premises_texts]    [aspects]    [aspects_names]    [source_id]    [source_title]    [source_url]    [source_previous_argument_id]    [source_next_argument_id]    [source_domain]    [source_text]    [source_text_conclusion_start]    [source_text_conclusion_end]    [source_text_premise_start]    [source_text_premise_end]    [topic]    [acquisition]    [date]    [author]    [author_image_url]    [author_organization]    [author_role]    [mode]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/2020-04-01/parliamentary')
# Index argsme/2020-04-01/parliamentary
indexer = pt.IterDictIndexer('./indices/argsme_2020-04-01_parliamentary')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['conclusion', 'premises_texts', 'aspects_names', 'source_id', 'source_title', 'topic'])

You can find more details about PyTerrier indexing here.

Citation

ir_datasets.bib:

\cite{Wachsmuth2017Argument,Ajjour2019Acquisition}

Bibtex:

Metadata

{
  "docs": {
    "count": 48,
    "fields": {
      "doc_id": {
        "max_len": 19,
        "common_prefix": "S"
      }
    }
  }
}

`"argsme/2020-04-01/touche-2020-task-1"`

Decision making processes, be it at the societal or at the personal level, eventually come to a point where one side will challenge the other with a why-question, which is a prompt to justify one's stance. Thus, technologies for argument mining and argumentation processing are maturing at a rapid pace, giving rise for the first time to argument retrieval. Touché 2020 is the first lab on Argument Retrieval at CLEF 2020 featuring two tasks.

Given a question on a controversial topic, retrieve relevant arguments from a focused crawl of online debate portals (argsme/2020-04-01).

Documents are judged based on their general topical relevance.

queries

49 queries

Language: en

Query type:

ToucheQuery: (namedtuple)

query_id: str
title: str
description: str
narrative: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/touche-2020-task-1")
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, description, narrative>

You can find more details about the Python API here.

CLI

ir_datasets export argsme/2020-04-01/touche-2020-task-1 queries



[query_id]    [title]    [description]    [narrative]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/2020-04-01/touche-2020-task-1')
index_ref = pt.IndexRef.of('./indices/argsme_2020-04-01') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('title'))

You can find more details about PyTerrier retrieval here.

docs

388K docs

Inherits docs from argsme/2020-04-01

Language: en

Document type:

ArgsMeDoc: (namedtuple)

doc_id: str
conclusion: str
premises: List[
ArgsMePremise: (namedtuple)
1. text: str
2. stance: ArgsMeStance[PRO, CON]
3. annotations: List[
  ArgsMePremiseAnnotation: (namedtuple)
  1. start: int
  2. end: int
  3. source: str
  ]
]
premises_texts: str
aspects: List[
ArgsMeAspect: (namedtuple)
1. name: str
2. weight: float
3. normalized_weight: float
4. rank: int
]
aspects_names: str
source_id: str
source_title: str
source_url: Optional[str]
source_previous_argument_id: Optional[str]
source_next_argument_id: Optional[str]
source_domain: Optional[ArgsMeSourceDomain[debateorg, debatepedia, debatewise, idebate, canadian_parliament]]
source_text: Optional[str]
source_text_conclusion_start: Optional[int]
source_text_conclusion_end: Optional[int]
source_text_premise_start: Optional[int]
source_text_premise_end: Optional[int]
topic: str
acquisition: datetime
date: Optional[datetime]
author: Optional[str]
author_image_url: Optional[str]
author_organization: Optional[str]
author_role: Optional[str]
mode: Optional[ArgsMeMode[person, discussion]]

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/touche-2020-task-1")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>

You can find more details about the Python API here.

CLI

ir_datasets export argsme/2020-04-01/touche-2020-task-1 docs



[doc_id]    [conclusion]    [premises]    [premises_texts]    [aspects]    [aspects_names]    [source_id]    [source_title]    [source_url]    [source_previous_argument_id]    [source_next_argument_id]    [source_domain]    [source_text]    [source_text_conclusion_start]    [source_text_conclusion_end]    [source_text_premise_start]    [source_text_premise_end]    [topic]    [acquisition]    [date]    [author]    [author_image_url]    [author_organization]    [author_role]    [mode]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/2020-04-01/touche-2020-task-1')
# Index argsme/2020-04-01
indexer = pt.IterDictIndexer('./indices/argsme_2020-04-01')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['conclusion', 'premises_texts', 'aspects_names', 'source_id', 'source_title', 'topic'])

You can find more details about PyTerrier indexing here.

qrels

2.3K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
-2	spam	`751`	32.7%
0	not relevant	`615`	26.8%
1	relevant	`296`	12.9%
2	highly relevant	`636`	27.7%

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/touche-2020-task-1")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export argsme/2020-04-01/touche-2020-task-1 qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:argsme/2020-04-01/touche-2020-task-1')
index_ref = pt.IndexRef.of('./indices/argsme_2020-04-01') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics('title'),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Bondarenko2020Touche,Wachsmuth2017Quality}

Bibtex:

Metadata

{
  "docs": {
    "count": 387740,
    "fields": {
      "doc_id": {
        "max_len": 19,
        "common_prefix": "S"
      }
    }
  },
  "queries": {
    "count": 49
  },
  "qrels": {
    "count": 2298,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "0": 615,
          "1": 296,
          "-2": 751,
          "2": 636
        }
      }
    }
  }
}

`"argsme/2020-04-01/touche-2020-task-1/uncorrected"`

Version of argsme/2020-04-01/touche-2020-task-1 that uses uncorrected relevance judgements derived from crowdworkers. This dataset's relevance judgements should not be used without preprocessing.

queries

49 queries

Inherits queries from argsme/2020-04-01/touche-2020-task-1

Language: en

Query type:

ToucheQuery: (namedtuple)

query_id: str
title: str
description: str
narrative: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/touche-2020-task-1/uncorrected")
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, description, narrative>

You can find more details about the Python API here.

CLI

ir_datasets export argsme/2020-04-01/touche-2020-task-1/uncorrected queries



[query_id]    [title]    [description]    [narrative]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/2020-04-01/touche-2020-task-1/uncorrected')
index_ref = pt.IndexRef.of('./indices/argsme_2020-04-01') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('title'))

You can find more details about PyTerrier retrieval here.

docs

388K docs

Inherits docs from argsme/2020-04-01

Language: en

Document type:

ArgsMeDoc: (namedtuple)

doc_id: str
conclusion: str
premises: List[
ArgsMePremise: (namedtuple)
1. text: str
2. stance: ArgsMeStance[PRO, CON]
3. annotations: List[
  ArgsMePremiseAnnotation: (namedtuple)
  1. start: int
  2. end: int
  3. source: str
  ]
]
premises_texts: str
aspects: List[
ArgsMeAspect: (namedtuple)
1. name: str
2. weight: float
3. normalized_weight: float
4. rank: int
]
aspects_names: str
source_id: str
source_title: str
source_url: Optional[str]
source_previous_argument_id: Optional[str]
source_next_argument_id: Optional[str]
source_domain: Optional[ArgsMeSourceDomain[debateorg, debatepedia, debatewise, idebate, canadian_parliament]]
source_text: Optional[str]
source_text_conclusion_start: Optional[int]
source_text_conclusion_end: Optional[int]
source_text_premise_start: Optional[int]
source_text_premise_end: Optional[int]
topic: str
acquisition: datetime
date: Optional[datetime]
author: Optional[str]
author_image_url: Optional[str]
author_organization: Optional[str]
author_role: Optional[str]
mode: Optional[ArgsMeMode[person, discussion]]

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/touche-2020-task-1/uncorrected")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>

You can find more details about the Python API here.

CLI

ir_datasets export argsme/2020-04-01/touche-2020-task-1/uncorrected docs



[doc_id]    [conclusion]    [premises]    [premises_texts]    [aspects]    [aspects_names]    [source_id]    [source_title]    [source_url]    [source_previous_argument_id]    [source_next_argument_id]    [source_domain]    [source_text]    [source_text_conclusion_start]    [source_text_conclusion_end]    [source_text_premise_start]    [source_text_premise_end]    [topic]    [acquisition]    [date]    [author]    [author_image_url]    [author_organization]    [author_role]    [mode]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/2020-04-01/touche-2020-task-1/uncorrected')
# Index argsme/2020-04-01
indexer = pt.IterDictIndexer('./indices/argsme_2020-04-01')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['conclusion', 'premises_texts', 'aspects_names', 'source_id', 'source_title', 'topic'])

You can find more details about PyTerrier indexing here.

qrels

2.3K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
-2	spam, non-argument	`380`	16.5%
1	very low relevance	`144`	6.3%
2	low relevance	`199`	8.7%
3	moderate relevance	`485`	21.1%
4	high relevance	`665`	28.9%
5	very high relevance	`425`	18.5%

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/touche-2020-task-1/uncorrected")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export argsme/2020-04-01/touche-2020-task-1/uncorrected qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:argsme/2020-04-01/touche-2020-task-1/uncorrected')
index_ref = pt.IndexRef.of('./indices/argsme_2020-04-01') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics('title'),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Bondarenko2020Touche,Wachsmuth2017Quality}

Bibtex:

Metadata

{
  "docs": {
    "count": 387740,
    "fields": {
      "doc_id": {
        "max_len": 19,
        "common_prefix": "S"
      }
    }
  },
  "queries": {
    "count": 49
  },
  "qrels": {
    "count": 2298,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "4": 665,
          "3": 485,
          "-2": 380,
          "5": 425,
          "2": 199,
          "1": 144
        }
      }
    }
  }
}

`"argsme/2020-04-01/touche-2021-task-1"`

Decision making processes, be it at the societal or at the personal level, often come to a point where one side challenges the other with a why-question, which is a prompt to justify some stance based on arguments. Since technologies for argument mining are maturing at a rapid pace, also ad-hoc argument retrieval becomes a feasible task in reach. Touché 2021 is the second lab on argument retrieval at CLEF 2021 featuring two tasks.

Given a question on a controversial topic, retrieve relevant arguments from a focused crawl of online debate portals (argsme/2020-04-01).

Documents are judged based on their general topical relevance and for rhetorical quality, i.e., "well-writtenness" of the document: (1) whether the text has a good style of speech (formal language is preferred over informal), (2) whether the text has a proper sentence structure and is easy to read, (3) whether it includes profanity, has typos, and makes use of other detrimental style choices.

queries

50 queries

Language: en

Query type:

ToucheTitleQuery: (namedtuple)

query_id: str
title: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/touche-2021-task-1")
for query in dataset.queries_iter():
    query # namedtuple<query_id, title>

You can find more details about the Python API here.

CLI

ir_datasets export argsme/2020-04-01/touche-2021-task-1 queries



[query_id]    [title]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/2020-04-01/touche-2021-task-1')
index_ref = pt.IndexRef.of('./indices/argsme_2020-04-01') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

388K docs

Inherits docs from argsme/2020-04-01

Language: en

Document type:

ArgsMeDoc: (namedtuple)

doc_id: str
conclusion: str
premises: List[
ArgsMePremise: (namedtuple)
1. text: str
2. stance: ArgsMeStance[PRO, CON]
3. annotations: List[
  ArgsMePremiseAnnotation: (namedtuple)
  1. start: int
  2. end: int
  3. source: str
  ]
]
premises_texts: str
aspects: List[
ArgsMeAspect: (namedtuple)
1. name: str
2. weight: float
3. normalized_weight: float
4. rank: int
]
aspects_names: str
source_id: str
source_title: str
source_url: Optional[str]
source_previous_argument_id: Optional[str]
source_next_argument_id: Optional[str]
source_domain: Optional[ArgsMeSourceDomain[debateorg, debatepedia, debatewise, idebate, canadian_parliament]]
source_text: Optional[str]
source_text_conclusion_start: Optional[int]
source_text_conclusion_end: Optional[int]
source_text_premise_start: Optional[int]
source_text_premise_end: Optional[int]
topic: str
acquisition: datetime
date: Optional[datetime]
author: Optional[str]
author_image_url: Optional[str]
author_organization: Optional[str]
author_role: Optional[str]
mode: Optional[ArgsMeMode[person, discussion]]

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/touche-2021-task-1")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>

You can find more details about the Python API here.

CLI

ir_datasets export argsme/2020-04-01/touche-2021-task-1 docs



[doc_id]    [conclusion]    [premises]    [premises_texts]    [aspects]    [aspects_names]    [source_id]    [source_title]    [source_url]    [source_previous_argument_id]    [source_next_argument_id]    [source_domain]    [source_text]    [source_text_conclusion_start]    [source_text_conclusion_end]    [source_text_premise_start]    [source_text_premise_end]    [topic]    [acquisition]    [date]    [author]    [author_image_url]    [author_organization]    [author_role]    [mode]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/2020-04-01/touche-2021-task-1')
# Index argsme/2020-04-01
indexer = pt.IterDictIndexer('./indices/argsme_2020-04-01')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['conclusion', 'premises_texts', 'aspects_names', 'source_id', 'source_title', 'topic'])

You can find more details about PyTerrier indexing here.

qrels

3.7K qrels

Query relevance judgment type:

ToucheQualityQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
quality: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
-2	spam	`351`	9.5%
0	not relevant	`1.5K`	41.6%
1	relevant	`736`	19.8%
2	highly relevant	`1.1K`	29.2%

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/touche-2021-task-1")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, quality, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export argsme/2020-04-01/touche-2021-task-1 qrels --format tsv



[query_id]    [doc_id]    [relevance]    [quality]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:argsme/2020-04-01/touche-2021-task-1')
index_ref = pt.IndexRef.of('./indices/argsme_2020-04-01') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Bondarenko2021Touche}

Bibtex:

@inproceedings{Bondarenko2021Touche, address = {Berlin Heidelberg New York}, author = {Alexander Bondarenko and Lukas Gienapp and Maik Fr{\"o}be and Meriem Beloucif and Yamen Ajjour and Alexander Panchenko and Chris Biemann and Benno Stein and Henning Wachsmuth and Martin Potthast and Matthias Hagen}, booktitle = {Experimental IR Meets Multilinguality, Multimodality, and Interaction. 12th International Conference of the CLEF Association (CLEF 2021)}, doi = {10.1007/978-3-030-85251-1\_28}, editor = {{K. Sel{\c{c}}uk} Candan and Bogdan Ionescu and Lorraine Goeuriot and Henning M{\"u}ller and Alexis Joly and Maria Maistro and Florina Piroi and Guglielmo Faggioli and Nicola Ferro}, month = sep, pages = {450-467}, publisher = {Springer}, series = {Lecture Notes in Computer Science}, site = {Bucharest, Romania}, title = {{Overview of Touch{\'e} 2021: Argument Retrieval}}, url = {https://link.springer.com/chapter/10.1007/978-3-030-85251-1_28}, volume = 12880, year = 2021, }

Metadata

{
  "docs": {
    "count": 387740,
    "fields": {
      "doc_id": {
        "max_len": 19,
        "common_prefix": "S"
      }
    }
  },
  "queries": {
    "count": 50
  },
  "qrels": {
    "count": 3711,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "2": 1082,
          "0": 1542,
          "1": 736,
          "-2": 351
        }
      }
    }
  }
}

ir_datasets: args.me

"argsme"

"argsme/1.0"

"argsme/1.0-cleaned"

"argsme/1.0/touche-2020-task-1/uncorrected"

"argsme/2020-04-01"

"argsme/2020-04-01/debateorg"

"argsme/2020-04-01/debatepedia"

"argsme/2020-04-01/debatewise"

"argsme/2020-04-01/idebate"

"argsme/2020-04-01/parliamentary"

"argsme/2020-04-01/touche-2020-task-1"

"argsme/2020-04-01/touche-2020-task-1/uncorrected"

"argsme/2020-04-01/touche-2021-task-1"

`ir_datasets`: args.me

`"argsme"`

`"argsme/1.0"`

`"argsme/1.0-cleaned"`

`"argsme/1.0/touche-2020-task-1/uncorrected"`

`"argsme/2020-04-01"`

`"argsme/2020-04-01/debateorg"`

`"argsme/2020-04-01/debatepedia"`

`"argsme/2020-04-01/debatewise"`

`"argsme/2020-04-01/idebate"`

`"argsme/2020-04-01/parliamentary"`

`"argsme/2020-04-01/touche-2020-task-1"`

`"argsme/2020-04-01/touche-2020-task-1/uncorrected"`

`"argsme/2020-04-01/touche-2021-task-1"`