← home
Github: datasets/argsme.py

ir_datasets: args.me

Index
  1. argsme
  2. argsme/1.0
  3. argsme/1.0-cleaned
  4. argsme/1.0/touche-2020-task-1/uncorrected
  5. argsme/2020-04-01
  6. argsme/2020-04-01/debateorg
  7. argsme/2020-04-01/debatepedia
  8. argsme/2020-04-01/debatewise
  9. argsme/2020-04-01/idebate
  10. argsme/2020-04-01/parliamentary
  11. argsme/2020-04-01/touche-2020-task-1
  12. argsme/2020-04-01/touche-2020-task-1/uncorrected
  13. argsme/2020-04-01/touche-2021-task-1

"argsme"

The args.me corpus is one of the largest argument resources available and contains arguments crawled from debate platforms and parliament discussions.

Citation

ir_datasets.bib:

\cite{Wachsmuth2017Argument,Ajjour2019Acquisition}

Bibtex:

@inproceedings{Wachsmuth2017Argument, author = {Henning Wachsmuth and Martin Potthast and Khalid Al-Khatib and Yamen Ajjour and Jana Puschmann and Jiani Qu and Jonas Dorsch and Viorel Morari and Janek Bevendorff and Benno Stein}, booktitle = {4th Workshop on Argument Mining (ArgMining 2017) at EMNLP}, editor = {Kevin Ashley and Claire Cardie and Nancy Green and Iryna Gurevych and Ivan Habernal and Diane Litman and Georgios Petasis and Chris Reed and Noam Slonim and Vern Walker}, month = sep, pages = {49-59}, publisher = {Association for Computational Linguistics}, site = {Copenhagen, Denmark}, title = {{Building an Argument Search Engine for the Web}}, url = {https://www.aclweb.org/anthology/W17-5106}, year = 2017 } @inproceedings{Ajjour2019Acquisition, address = {Berlin Heidelberg New York}, author = {Yamen Ajjour and Henning Wachsmuth and Johannes Kiesel and Martin Potthast and Matthias Hagen and Benno Stein}, booktitle = {42nd German Conference on Artificial Intelligence (KI 2019)}, doi = {10.1007/978-3-030-30179-8\_4}, editor = {Christoph Benzm{\"u}ller and Heiner Stuckenschmidt}, month = sep, pages = {48-59}, publisher = {Springer}, site = {Kassel, Germany}, title = {{Data Acquisition for Argument Search: The args.me corpus}}, year = 2019 }

"argsme/1.0"

Corpus version 1.0 with 387 606 arguments crawled from Debatewise, IDebate.org, Debatepedia, Debate.org. It was released on July 9, 2019 on Zenodo.

This collection is licensed with the Creative Commons Attribution 4.0 International. Individual rights to the content still apply.

docs
388K docs

Language: en

Document type:
ArgsMeDoc: (namedtuple)
  1. doc_id: str
  2. conclusion: str
  3. premises: List[
    ArgsMePremise: (namedtuple)
    1. text: str
    2. stance: ArgsMeStance[PRO, CON]
    3. annotations: List[
      ArgsMePremiseAnnotation: (namedtuple)
      1. start: int
      2. end: int
      3. source: str
      ]
    ]
  4. premises_texts: str
  5. aspects: List[
    ArgsMeAspect: (namedtuple)
    1. name: str
    2. weight: float
    3. normalized_weight: float
    4. rank: int
    ]
  6. aspects_names: str
  7. source_id: str
  8. source_title: str
  9. source_url: Optional[str]
  10. source_previous_argument_id: Optional[str]
  11. source_next_argument_id: Optional[str]
  12. source_domain: Optional[ArgsMeSourceDomain[debateorg, debatepedia, debatewise, idebate, canadian_parliament]]
  13. source_text: Optional[str]
  14. source_text_conclusion_start: Optional[int]
  15. source_text_conclusion_end: Optional[int]
  16. source_text_premise_start: Optional[int]
  17. source_text_premise_end: Optional[int]
  18. topic: str
  19. acquisition: datetime
  20. date: Optional[datetime]
  21. author: Optional[str]
  22. author_image_url: Optional[str]
  23. author_organization: Optional[str]
  24. author_role: Optional[str]
  25. mode: Optional[ArgsMeMode[person, discussion]]

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("argsme/1.0")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>

You can find more details about the Python API here.

CLI
ir_datasets export argsme/1.0 docs
[doc_id]    [conclusion]    [premises]    [premises_texts]    [aspects]    [aspects_names]    [source_id]    [source_title]    [source_url]    [source_previous_argument_id]    [source_next_argument_id]    [source_domain]    [source_text]    [source_text_conclusion_start]    [source_text_conclusion_end]    [source_text_premise_start]    [source_text_premise_end]    [topic]    [acquisition]    [date]    [author]    [author_image_url]    [author_organization]    [author_role]    [mode]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/1.0')
# Index argsme/1.0
indexer = pt.IterDictIndexer('./indices/argsme_1.0')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['conclusion', 'premises_texts', 'aspects_names', 'source_id', 'source_title', 'topic'])

You can find more details about PyTerrier indexing here.

Citation

ir_datasets.bib:

\cite{Wachsmuth2017Argument,Ajjour2019Acquisition}

Bibtex:

@inproceedings{Wachsmuth2017Argument, author = {Henning Wachsmuth and Martin Potthast and Khalid Al-Khatib and Yamen Ajjour and Jana Puschmann and Jiani Qu and Jonas Dorsch and Viorel Morari and Janek Bevendorff and Benno Stein}, booktitle = {4th Workshop on Argument Mining (ArgMining 2017) at EMNLP}, editor = {Kevin Ashley and Claire Cardie and Nancy Green and Iryna Gurevych and Ivan Habernal and Diane Litman and Georgios Petasis and Chris Reed and Noam Slonim and Vern Walker}, month = sep, pages = {49-59}, publisher = {Association for Computational Linguistics}, site = {Copenhagen, Denmark}, title = {{Building an Argument Search Engine for the Web}}, url = {https://www.aclweb.org/anthology/W17-5106}, year = 2017 } @inproceedings{Ajjour2019Acquisition, address = {Berlin Heidelberg New York}, author = {Yamen Ajjour and Henning Wachsmuth and Johannes Kiesel and Martin Potthast and Matthias Hagen and Benno Stein}, booktitle = {42nd German Conference on Artificial Intelligence (KI 2019)}, doi = {10.1007/978-3-030-30179-8\_4}, editor = {Christoph Benzm{\"u}ller and Heiner Stuckenschmidt}, month = sep, pages = {48-59}, publisher = {Springer}, site = {Kassel, Germany}, title = {{Data Acquisition for Argument Search: The args.me corpus}}, year = 2019 }
Metadata

"argsme/1.0-cleaned"

Corpus version 1.0-cleaned with 382 545 arguments crawled from Debatewise, IDebate.org, Debatepedia, Debate.org. This version contains the same arguments as version 1.0, but cleaned as described in the corresponding publication. It was released on October 27, 2020 on Zenodo.

This collection is licensed with the Creative Commons Attribution 4.0 International. Individual rights to the content still apply.

docs
383K docs

Language: en

Document type:
ArgsMeDoc: (namedtuple)
  1. doc_id: str
  2. conclusion: str
  3. premises: List[
    ArgsMePremise: (namedtuple)
    1. text: str
    2. stance: ArgsMeStance[PRO, CON]
    3. annotations: List[
      ArgsMePremiseAnnotation: (namedtuple)
      1. start: int
      2. end: int
      3. source: str
      ]
    ]
  4. premises_texts: str
  5. aspects: List[
    ArgsMeAspect: (namedtuple)
    1. name: str
    2. weight: float
    3. normalized_weight: float
    4. rank: int
    ]
  6. aspects_names: str
  7. source_id: str
  8. source_title: str
  9. source_url: Optional[str]
  10. source_previous_argument_id: Optional[str]
  11. source_next_argument_id: Optional[str]
  12. source_domain: Optional[ArgsMeSourceDomain[debateorg, debatepedia, debatewise, idebate, canadian_parliament]]
  13. source_text: Optional[str]
  14. source_text_conclusion_start: Optional[int]
  15. source_text_conclusion_end: Optional[int]
  16. source_text_premise_start: Optional[int]
  17. source_text_premise_end: Optional[int]
  18. topic: str
  19. acquisition: datetime
  20. date: Optional[datetime]
  21. author: Optional[str]
  22. author_image_url: Optional[str]
  23. author_organization: Optional[str]
  24. author_role: Optional[str]
  25. mode: Optional[ArgsMeMode[person, discussion]]

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("argsme/1.0-cleaned")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>

You can find more details about the Python API here.

CLI
ir_datasets export argsme/1.0-cleaned docs
[doc_id]    [conclusion]    [premises]    [premises_texts]    [aspects]    [aspects_names]    [source_id]    [source_title]    [source_url]    [source_previous_argument_id]    [source_next_argument_id]    [source_domain]    [source_text]    [source_text_conclusion_start]    [source_text_conclusion_end]    [source_text_premise_start]    [source_text_premise_end]    [topic]    [acquisition]    [date]    [author]    [author_image_url]    [author_organization]    [author_role]    [mode]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/1.0-cleaned')
# Index argsme/1.0-cleaned
indexer = pt.IterDictIndexer('./indices/argsme_1.0-cleaned')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['conclusion', 'premises_texts', 'aspects_names', 'source_id', 'source_title', 'topic'])

You can find more details about PyTerrier indexing here.

Citation

ir_datasets.bib:

\cite{Wachsmuth2017Argument,Ajjour2019Acquisition}

Bibtex:

@inproceedings{Wachsmuth2017Argument, author = {Henning Wachsmuth and Martin Potthast and Khalid Al-Khatib and Yamen Ajjour and Jana Puschmann and Jiani Qu and Jonas Dorsch and Viorel Morari and Janek Bevendorff and Benno Stein}, booktitle = {4th Workshop on Argument Mining (ArgMining 2017) at EMNLP}, editor = {Kevin Ashley and Claire Cardie and Nancy Green and Iryna Gurevych and Ivan Habernal and Diane Litman and Georgios Petasis and Chris Reed and Noam Slonim and Vern Walker}, month = sep, pages = {49-59}, publisher = {Association for Computational Linguistics}, site = {Copenhagen, Denmark}, title = {{Building an Argument Search Engine for the Web}}, url = {https://www.aclweb.org/anthology/W17-5106}, year = 2017 } @inproceedings{Ajjour2019Acquisition, address = {Berlin Heidelberg New York}, author = {Yamen Ajjour and Henning Wachsmuth and Johannes Kiesel and Martin Potthast and Matthias Hagen and Benno Stein}, booktitle = {42nd German Conference on Artificial Intelligence (KI 2019)}, doi = {10.1007/978-3-030-30179-8\_4}, editor = {Christoph Benzm{\"u}ller and Heiner Stuckenschmidt}, month = sep, pages = {48-59}, publisher = {Springer}, site = {Kassel, Germany}, title = {{Data Acquisition for Argument Search: The args.me corpus}}, year = 2019 }
Metadata

"argsme/1.0/touche-2020-task-1/uncorrected"

Version of argsme/2020-04-01/touche-2020-task-1 that uses the argsme/1.0 corpus with uncorrected relevance judgements derived from crowdworkers. This dataset's relevance judgements should not be used without preprocessing.

queries
49 queries

Language: en

Query type:
ToucheQuery: (namedtuple)
  1. query_id: str
  2. title: str
  3. description: str
  4. narrative: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("argsme/1.0/touche-2020-task-1/uncorrected")
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, description, narrative>

You can find more details about the Python API here.

CLI
ir_datasets export argsme/1.0/touche-2020-task-1/uncorrected queries
[query_id]    [title]    [description]    [narrative]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/1.0/touche-2020-task-1/uncorrected')
index_ref = pt.IndexRef.of('./indices/argsme_1.0') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('title'))

You can find more details about PyTerrier retrieval here.

docs
388K docs

Inherits docs from argsme/1.0

Language: en

Document type:
ArgsMeDoc: (namedtuple)
  1. doc_id: str
  2. conclusion: str
  3. premises: List[
    ArgsMePremise: (namedtuple)
    1. text: str
    2. stance: ArgsMeStance[PRO, CON]
    3. annotations: List[
      ArgsMePremiseAnnotation: (namedtuple)
      1. start: int
      2. end: int
      3. source: str
      ]
    ]
  4. premises_texts: str
  5. aspects: List[
    ArgsMeAspect: (namedtuple)
    1. name: str
    2. weight: float
    3. normalized_weight: float
    4. rank: int
    ]
  6. aspects_names: str
  7. source_id: str
  8. source_title: str
  9. source_url: Optional[str]
  10. source_previous_argument_id: Optional[str]
  11. source_next_argument_id: Optional[str]
  12. source_domain: Optional[ArgsMeSourceDomain[debateorg, debatepedia, debatewise, idebate, canadian_parliament]]
  13. source_text: Optional[str]
  14. source_text_conclusion_start: Optional[int]
  15. source_text_conclusion_end: Optional[int]
  16. source_text_premise_start: Optional[int]
  17. source_text_premise_end: Optional[int]
  18. topic: str
  19. acquisition: datetime
  20. date: Optional[datetime]
  21. author: Optional[str]
  22. author_image_url: Optional[str]
  23. author_organization: Optional[str]
  24. author_role: Optional[str]
  25. mode: Optional[ArgsMeMode[person, discussion]]

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("argsme/1.0/touche-2020-task-1/uncorrected")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>

You can find more details about the Python API here.

CLI
ir_datasets export argsme/1.0/touche-2020-task-1/uncorrected docs
[doc_id]    [conclusion]    [premises]    [premises_texts]    [aspects]    [aspects_names]    [source_id]    [source_title]    [source_url]    [source_previous_argument_id]    [source_next_argument_id]    [source_domain]    [source_text]    [source_text_conclusion_start]    [source_text_conclusion_end]    [source_text_premise_start]    [source_text_premise_end]    [topic]    [acquisition]    [date]    [author]    [author_image_url]    [author_organization]    [author_role]    [mode]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/1.0/touche-2020-task-1/uncorrected')
# Index argsme/1.0
indexer = pt.IterDictIndexer('./indices/argsme_1.0')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['conclusion', 'premises_texts', 'aspects_names', 'source_id', 'source_title', 'topic'])

You can find more details about PyTerrier indexing here.

qrels
3.0K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
-2spam, non-argument551 18.6%
1very low relevance186 6.3%
2low relevance195 6.6%
3moderate relevance628 21.2%
4high relevance1.0K33.9%
5very high relevance398 13.4%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("argsme/1.0/touche-2020-task-1/uncorrected")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export argsme/1.0/touche-2020-task-1/uncorrected qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:argsme/1.0/touche-2020-task-1/uncorrected')
index_ref = pt.IndexRef.of('./indices/argsme_1.0') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics('title'),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Bondarenko2020Touche,Wachsmuth2017Quality}

Bibtex:

@inproceedings{Bondarenko2020Touche, address = {Berlin Heidelberg New York}, author = {Alexander Bondarenko and Maik Fr{\"o}be and Meriem Beloucif and Lukas Gienapp and Yamen Ajjour and Alexander Panchenko and Chris Biemann and Benno Stein and Henning Wachsmuth and Martin Potthast and Matthias Hagen}, booktitle = {Experimental IR Meets Multilinguality, Multimodality, and Interaction. 11th International Conference of the CLEF Association (CLEF 2020)}, doi = {10.1007/978-3-030-58219-7\_26}, editor = {Avi Arampatzis and Evangelos Kanoulas and Theodora Tsikrika and Stefanos Vrochidis and Hideo Joho and Christina Lioma and Carsten Eickhoff and Aur{\'e}lie N{\'e}v{\'e}ol and Linda Cappellato and Nicola Ferro}, month = sep, pages = {384-395}, publisher = {Springer}, series = {Lecture Notes in Computer Science}, site = {Thessaloniki, Greece}, title = {{Overview of Touch{\'e} 2020: Argument Retrieval}}, url = {https://link.springer.com/chapter/10.1007/978-3-030-58219-7_26}, volume = 12260, year = 2020, } @inproceedings{Wachsmuth2017Quality, author = {Henning Wachsmuth and Nona Naderi and Yufang Hou and Yonatan Bilu and Vinodkumar Prabhakaran and Tim Alberdingk Thijm and Graeme Hirst and Benno Stein}, booktitle = {15th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2017)}, editor = {Phil Blunsom and Alexander Koller and Mirella Lapata}, month = apr, pages = {176-187}, site = {Valencia, Spain}, title = {{Computational Argumentation Quality Assessment in Natural Language}}, url = {http://aclweb.org/anthology/E17-1017}, year = 2017 }
Metadata

"argsme/2020-04-01"

Corpus version 2020-04-01 with 387 740 arguments crawled from Debatewise, IDebate.org, Debatepedia, Debate.org, and from Canadian Parliament discussions. It was released on April 1, 2020 on Zenodo.

This collection is licensed with the Creative Commons Attribution 4.0 International. Individual rights to the content still apply.

docs
388K docs

Language: en

Document type:
ArgsMeDoc: (namedtuple)
  1. doc_id: str
  2. conclusion: str
  3. premises: List[
    ArgsMePremise: (namedtuple)
    1. text: str
    2. stance: ArgsMeStance[PRO, CON]
    3. annotations: List[
      ArgsMePremiseAnnotation: (namedtuple)
      1. start: int
      2. end: int
      3. source: str
      ]
    ]
  4. premises_texts: str
  5. aspects: List[
    ArgsMeAspect: (namedtuple)
    1. name: str
    2. weight: float
    3. normalized_weight: float
    4. rank: int
    ]
  6. aspects_names: str
  7. source_id: str
  8. source_title: str
  9. source_url: Optional[str]
  10. source_previous_argument_id: Optional[str]
  11. source_next_argument_id: Optional[str]
  12. source_domain: Optional[ArgsMeSourceDomain[debateorg, debatepedia, debatewise, idebate, canadian_parliament]]
  13. source_text: Optional[str]
  14. source_text_conclusion_start: Optional[int]
  15. source_text_conclusion_end: Optional[int]
  16. source_text_premise_start: Optional[int]
  17. source_text_premise_end: Optional[int]
  18. topic: str
  19. acquisition: datetime
  20. date: Optional[datetime]
  21. author: Optional[str]
  22. author_image_url: Optional[str]
  23. author_organization: Optional[str]
  24. author_role: Optional[str]
  25. mode: Optional[ArgsMeMode[person, discussion]]

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>

You can find more details about the Python API here.

CLI
ir_datasets export argsme/2020-04-01 docs
[doc_id]    [conclusion]    [premises]    [premises_texts]    [aspects]    [aspects_names]    [source_id]    [source_title]    [source_url]    [source_previous_argument_id]    [source_next_argument_id]    [source_domain]    [source_text]    [source_text_conclusion_start]    [source_text_conclusion_end]    [source_text_premise_start]    [source_text_premise_end]    [topic]    [acquisition]    [date]    [author]    [author_image_url]    [author_organization]    [author_role]    [mode]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/2020-04-01')
# Index argsme/2020-04-01
indexer = pt.IterDictIndexer('./indices/argsme_2020-04-01')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['conclusion', 'premises_texts', 'aspects_names', 'source_id', 'source_title', 'topic'])

You can find more details about PyTerrier indexing here.

Citation

ir_datasets.bib:

\cite{Wachsmuth2017Argument,Ajjour2019Acquisition}

Bibtex:

@inproceedings{Wachsmuth2017Argument, author = {Henning Wachsmuth and Martin Potthast and Khalid Al-Khatib and Yamen Ajjour and Jana Puschmann and Jiani Qu and Jonas Dorsch and Viorel Morari and Janek Bevendorff and Benno Stein}, booktitle = {4th Workshop on Argument Mining (ArgMining 2017) at EMNLP}, editor = {Kevin Ashley and Claire Cardie and Nancy Green and Iryna Gurevych and Ivan Habernal and Diane Litman and Georgios Petasis and Chris Reed and Noam Slonim and Vern Walker}, month = sep, pages = {49-59}, publisher = {Association for Computational Linguistics}, site = {Copenhagen, Denmark}, title = {{Building an Argument Search Engine for the Web}}, url = {https://www.aclweb.org/anthology/W17-5106}, year = 2017 } @inproceedings{Ajjour2019Acquisition, address = {Berlin Heidelberg New York}, author = {Yamen Ajjour and Henning Wachsmuth and Johannes Kiesel and Martin Potthast and Matthias Hagen and Benno Stein}, booktitle = {42nd German Conference on Artificial Intelligence (KI 2019)}, doi = {10.1007/978-3-030-30179-8\_4}, editor = {Christoph Benzm{\"u}ller and Heiner Stuckenschmidt}, month = sep, pages = {48-59}, publisher = {Springer}, site = {Kassel, Germany}, title = {{Data Acquisition for Argument Search: The args.me corpus}}, year = 2019 }
Metadata

"argsme/2020-04-01/debateorg"

Subset of the 338 620 arguments from args.me version 2020-04-01 that were crawled from the debate portal Debate.org.

docs
339K docs

Language: en

Document type:
ArgsMeDoc: (namedtuple)
  1. doc_id: str
  2. conclusion: str
  3. premises: List[
    ArgsMePremise: (namedtuple)
    1. text: str
    2. stance: ArgsMeStance[PRO, CON]
    3. annotations: List[
      ArgsMePremiseAnnotation: (namedtuple)
      1. start: int
      2. end: int
      3. source: str
      ]
    ]
  4. premises_texts: str
  5. aspects: List[
    ArgsMeAspect: (namedtuple)
    1. name: str
    2. weight: float
    3. normalized_weight: float
    4. rank: int
    ]
  6. aspects_names: str
  7. source_id: str
  8. source_title: str
  9. source_url: Optional[str]
  10. source_previous_argument_id: Optional[str]
  11. source_next_argument_id: Optional[str]
  12. source_domain: Optional[ArgsMeSourceDomain[debateorg, debatepedia, debatewise, idebate, canadian_parliament]]
  13. source_text: Optional[str]
  14. source_text_conclusion_start: Optional[int]
  15. source_text_conclusion_end: Optional[int]
  16. source_text_premise_start: Optional[int]
  17. source_text_premise_end: Optional[int]
  18. topic: str
  19. acquisition: datetime
  20. date: Optional[datetime]
  21. author: Optional[str]
  22. author_image_url: Optional[str]
  23. author_organization: Optional[str]
  24. author_role: Optional[str]
  25. mode: Optional[ArgsMeMode[person, discussion]]

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/debateorg")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>

You can find more details about the Python API here.

CLI
ir_datasets export argsme/2020-04-01/debateorg docs
[doc_id]    [conclusion]    [premises]    [premises_texts]    [aspects]    [aspects_names]    [source_id]    [source_title]    [source_url]    [source_previous_argument_id]    [source_next_argument_id]    [source_domain]    [source_text]    [source_text_conclusion_start]    [source_text_conclusion_end]    [source_text_premise_start]    [source_text_premise_end]    [topic]    [acquisition]    [date]    [author]    [author_image_url]    [author_organization]    [author_role]    [mode]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/2020-04-01/debateorg')
# Index argsme/2020-04-01/debateorg
indexer = pt.IterDictIndexer('./indices/argsme_2020-04-01_debateorg')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['conclusion', 'premises_texts', 'aspects_names', 'source_id', 'source_title', 'topic'])

You can find more details about PyTerrier indexing here.

Citation

ir_datasets.bib:

\cite{Wachsmuth2017Argument,Ajjour2019Acquisition}

Bibtex:

@inproceedings{Wachsmuth2017Argument, author = {Henning Wachsmuth and Martin Potthast and Khalid Al-Khatib and Yamen Ajjour and Jana Puschmann and Jiani Qu and Jonas Dorsch and Viorel Morari and Janek Bevendorff and Benno Stein}, booktitle = {4th Workshop on Argument Mining (ArgMining 2017) at EMNLP}, editor = {Kevin Ashley and Claire Cardie and Nancy Green and Iryna Gurevych and Ivan Habernal and Diane Litman and Georgios Petasis and Chris Reed and Noam Slonim and Vern Walker}, month = sep, pages = {49-59}, publisher = {Association for Computational Linguistics}, site = {Copenhagen, Denmark}, title = {{Building an Argument Search Engine for the Web}}, url = {https://www.aclweb.org/anthology/W17-5106}, year = 2017 } @inproceedings{Ajjour2019Acquisition, address = {Berlin Heidelberg New York}, author = {Yamen Ajjour and Henning Wachsmuth and Johannes Kiesel and Martin Potthast and Matthias Hagen and Benno Stein}, booktitle = {42nd German Conference on Artificial Intelligence (KI 2019)}, doi = {10.1007/978-3-030-30179-8\_4}, editor = {Christoph Benzm{\"u}ller and Heiner Stuckenschmidt}, month = sep, pages = {48-59}, publisher = {Springer}, site = {Kassel, Germany}, title = {{Data Acquisition for Argument Search: The args.me corpus}}, year = 2019 }
Metadata

"argsme/2020-04-01/debatepedia"

Subset of the 21 197 arguments from args.me version 2020-04-01 that were crawled from the debate portal Debatepedia.

docs
21K docs

Language: en

Document type:
ArgsMeDoc: (namedtuple)
  1. doc_id: str
  2. conclusion: str
  3. premises: List[
    ArgsMePremise: (namedtuple)
    1. text: str
    2. stance: ArgsMeStance[PRO, CON]
    3. annotations: List[
      ArgsMePremiseAnnotation: (namedtuple)
      1. start: int
      2. end: int
      3. source: str
      ]
    ]
  4. premises_texts: str
  5. aspects: List[
    ArgsMeAspect: (namedtuple)
    1. name: str
    2. weight: float
    3. normalized_weight: float
    4. rank: int
    ]
  6. aspects_names: str
  7. source_id: str
  8. source_title: str
  9. source_url: Optional[str]
  10. source_previous_argument_id: Optional[str]
  11. source_next_argument_id: Optional[str]
  12. source_domain: Optional[ArgsMeSourceDomain[debateorg, debatepedia, debatewise, idebate, canadian_parliament]]
  13. source_text: Optional[str]
  14. source_text_conclusion_start: Optional[int]
  15. source_text_conclusion_end: Optional[int]
  16. source_text_premise_start: Optional[int]
  17. source_text_premise_end: Optional[int]
  18. topic: str
  19. acquisition: datetime
  20. date: Optional[datetime]
  21. author: Optional[str]
  22. author_image_url: Optional[str]
  23. author_organization: Optional[str]
  24. author_role: Optional[str]
  25. mode: Optional[ArgsMeMode[person, discussion]]

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/debatepedia")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>

You can find more details about the Python API here.

CLI
ir_datasets export argsme/2020-04-01/debatepedia docs
[doc_id]    [conclusion]    [premises]    [premises_texts]    [aspects]    [aspects_names]    [source_id]    [source_title]    [source_url]    [source_previous_argument_id]    [source_next_argument_id]    [source_domain]    [source_text]    [source_text_conclusion_start]    [source_text_conclusion_end]    [source_text_premise_start]    [source_text_premise_end]    [topic]    [acquisition]    [date]    [author]    [author_image_url]    [author_organization]    [author_role]    [mode]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/2020-04-01/debatepedia')
# Index argsme/2020-04-01/debatepedia
indexer = pt.IterDictIndexer('./indices/argsme_2020-04-01_debatepedia')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['conclusion', 'premises_texts', 'aspects_names', 'source_id', 'source_title', 'topic'])

You can find more details about PyTerrier indexing here.

Citation

ir_datasets.bib:

\cite{Wachsmuth2017Argument,Ajjour2019Acquisition}

Bibtex:

@inproceedings{Wachsmuth2017Argument, author = {Henning Wachsmuth and Martin Potthast and Khalid Al-Khatib and Yamen Ajjour and Jana Puschmann and Jiani Qu and Jonas Dorsch and Viorel Morari and Janek Bevendorff and Benno Stein}, booktitle = {4th Workshop on Argument Mining (ArgMining 2017) at EMNLP}, editor = {Kevin Ashley and Claire Cardie and Nancy Green and Iryna Gurevych and Ivan Habernal and Diane Litman and Georgios Petasis and Chris Reed and Noam Slonim and Vern Walker}, month = sep, pages = {49-59}, publisher = {Association for Computational Linguistics}, site = {Copenhagen, Denmark}, title = {{Building an Argument Search Engine for the Web}}, url = {https://www.aclweb.org/anthology/W17-5106}, year = 2017 } @inproceedings{Ajjour2019Acquisition, address = {Berlin Heidelberg New York}, author = {Yamen Ajjour and Henning Wachsmuth and Johannes Kiesel and Martin Potthast and Matthias Hagen and Benno Stein}, booktitle = {42nd German Conference on Artificial Intelligence (KI 2019)}, doi = {10.1007/978-3-030-30179-8\_4}, editor = {Christoph Benzm{\"u}ller and Heiner Stuckenschmidt}, month = sep, pages = {48-59}, publisher = {Springer}, site = {Kassel, Germany}, title = {{Data Acquisition for Argument Search: The args.me corpus}}, year = 2019 }
Metadata

"argsme/2020-04-01/debatewise"

Subset of the 14 353 arguments from args.me version 2020-04-01 that were crawled from the debate portal Debatewise.

docs
14K docs

Language: en

Document type:
ArgsMeDoc: (namedtuple)
  1. doc_id: str
  2. conclusion: str
  3. premises: List[
    ArgsMePremise: (namedtuple)
    1. text: str
    2. stance: ArgsMeStance[PRO, CON]
    3. annotations: List[
      ArgsMePremiseAnnotation: (namedtuple)
      1. start: int
      2. end: int
      3. source: str
      ]
    ]
  4. premises_texts: str
  5. aspects: List[
    ArgsMeAspect: (namedtuple)
    1. name: str
    2. weight: float
    3. normalized_weight: float
    4. rank: int
    ]
  6. aspects_names: str
  7. source_id: str
  8. source_title: str
  9. source_url: Optional[str]
  10. source_previous_argument_id: Optional[str]
  11. source_next_argument_id: Optional[str]
  12. source_domain: Optional[ArgsMeSourceDomain[debateorg, debatepedia, debatewise, idebate, canadian_parliament]]
  13. source_text: Optional[str]
  14. source_text_conclusion_start: Optional[int]
  15. source_text_conclusion_end: Optional[int]
  16. source_text_premise_start: Optional[int]
  17. source_text_premise_end: Optional[int]
  18. topic: str
  19. acquisition: datetime
  20. date: Optional[datetime]
  21. author: Optional[str]
  22. author_image_url: Optional[str]
  23. author_organization: Optional[str]
  24. author_role: Optional[str]
  25. mode: Optional[ArgsMeMode[person, discussion]]

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/debatewise")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>

You can find more details about the Python API here.

CLI
ir_datasets export argsme/2020-04-01/debatewise docs
[doc_id]    [conclusion]    [premises]    [premises_texts]    [aspects]    [aspects_names]    [source_id]    [source_title]    [source_url]    [source_previous_argument_id]    [source_next_argument_id]    [source_domain]    [source_text]    [source_text_conclusion_start]    [source_text_conclusion_end]    [source_text_premise_start]    [source_text_premise_end]    [topic]    [acquisition]    [date]    [author]    [author_image_url]    [author_organization]    [author_role]    [mode]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/2020-04-01/debatewise')
# Index argsme/2020-04-01/debatewise
indexer = pt.IterDictIndexer('./indices/argsme_2020-04-01_debatewise')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['conclusion', 'premises_texts', 'aspects_names', 'source_id', 'source_title', 'topic'])

You can find more details about PyTerrier indexing here.

Citation

ir_datasets.bib:

\cite{Wachsmuth2017Argument,Ajjour2019Acquisition}

Bibtex:

@inproceedings{Wachsmuth2017Argument, author = {Henning Wachsmuth and Martin Potthast and Khalid Al-Khatib and Yamen Ajjour and Jana Puschmann and Jiani Qu and Jonas Dorsch and Viorel Morari and Janek Bevendorff and Benno Stein}, booktitle = {4th Workshop on Argument Mining (ArgMining 2017) at EMNLP}, editor = {Kevin Ashley and Claire Cardie and Nancy Green and Iryna Gurevych and Ivan Habernal and Diane Litman and Georgios Petasis and Chris Reed and Noam Slonim and Vern Walker}, month = sep, pages = {49-59}, publisher = {Association for Computational Linguistics}, site = {Copenhagen, Denmark}, title = {{Building an Argument Search Engine for the Web}}, url = {https://www.aclweb.org/anthology/W17-5106}, year = 2017 } @inproceedings{Ajjour2019Acquisition, address = {Berlin Heidelberg New York}, author = {Yamen Ajjour and Henning Wachsmuth and Johannes Kiesel and Martin Potthast and Matthias Hagen and Benno Stein}, booktitle = {42nd German Conference on Artificial Intelligence (KI 2019)}, doi = {10.1007/978-3-030-30179-8\_4}, editor = {Christoph Benzm{\"u}ller and Heiner Stuckenschmidt}, month = sep, pages = {48-59}, publisher = {Springer}, site = {Kassel, Germany}, title = {{Data Acquisition for Argument Search: The args.me corpus}}, year = 2019 }
Metadata

"argsme/2020-04-01/idebate"

Subset of the 13 522 arguments from args.me version 2020-04-01 that were crawled from the debate portal IDebate.org.

docs
14K docs

Language: en

Document type:
ArgsMeDoc: (namedtuple)
  1. doc_id: str
  2. conclusion: str
  3. premises: List[
    ArgsMePremise: (namedtuple)
    1. text: str
    2. stance: ArgsMeStance[PRO, CON]
    3. annotations: List[
      ArgsMePremiseAnnotation: (namedtuple)
      1. start: int
      2. end: int
      3. source: str
      ]
    ]
  4. premises_texts: str
  5. aspects: List[
    ArgsMeAspect: (namedtuple)
    1. name: str
    2. weight: float
    3. normalized_weight: float
    4. rank: int
    ]
  6. aspects_names: str
  7. source_id: str
  8. source_title: str
  9. source_url: Optional[str]
  10. source_previous_argument_id: Optional[str]
  11. source_next_argument_id: Optional[str]
  12. source_domain: Optional[ArgsMeSourceDomain[debateorg, debatepedia, debatewise, idebate, canadian_parliament]]
  13. source_text: Optional[str]
  14. source_text_conclusion_start: Optional[int]
  15. source_text_conclusion_end: Optional[int]
  16. source_text_premise_start: Optional[int]
  17. source_text_premise_end: Optional[int]
  18. topic: str
  19. acquisition: datetime
  20. date: Optional[datetime]
  21. author: Optional[str]
  22. author_image_url: Optional[str]
  23. author_organization: Optional[str]
  24. author_role: Optional[str]
  25. mode: Optional[ArgsMeMode[person, discussion]]

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/idebate")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>

You can find more details about the Python API here.

CLI
ir_datasets export argsme/2020-04-01/idebate docs
[doc_id]    [conclusion]    [premises]    [premises_texts]    [aspects]    [aspects_names]    [source_id]    [source_title]    [source_url]    [source_previous_argument_id]    [source_next_argument_id]    [source_domain]    [source_text]    [source_text_conclusion_start]    [source_text_conclusion_end]    [source_text_premise_start]    [source_text_premise_end]    [topic]    [acquisition]    [date]    [author]    [author_image_url]    [author_organization]    [author_role]    [mode]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/2020-04-01/idebate')
# Index argsme/2020-04-01/idebate
indexer = pt.IterDictIndexer('./indices/argsme_2020-04-01_idebate')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['conclusion', 'premises_texts', 'aspects_names', 'source_id', 'source_title', 'topic'])

You can find more details about PyTerrier indexing here.

Citation

ir_datasets.bib:

\cite{Wachsmuth2017Argument,Ajjour2019Acquisition}

Bibtex:

@inproceedings{Wachsmuth2017Argument, author = {Henning Wachsmuth and Martin Potthast and Khalid Al-Khatib and Yamen Ajjour and Jana Puschmann and Jiani Qu and Jonas Dorsch and Viorel Morari and Janek Bevendorff and Benno Stein}, booktitle = {4th Workshop on Argument Mining (ArgMining 2017) at EMNLP}, editor = {Kevin Ashley and Claire Cardie and Nancy Green and Iryna Gurevych and Ivan Habernal and Diane Litman and Georgios Petasis and Chris Reed and Noam Slonim and Vern Walker}, month = sep, pages = {49-59}, publisher = {Association for Computational Linguistics}, site = {Copenhagen, Denmark}, title = {{Building an Argument Search Engine for the Web}}, url = {https://www.aclweb.org/anthology/W17-5106}, year = 2017 } @inproceedings{Ajjour2019Acquisition, address = {Berlin Heidelberg New York}, author = {Yamen Ajjour and Henning Wachsmuth and Johannes Kiesel and Martin Potthast and Matthias Hagen and Benno Stein}, booktitle = {42nd German Conference on Artificial Intelligence (KI 2019)}, doi = {10.1007/978-3-030-30179-8\_4}, editor = {Christoph Benzm{\"u}ller and Heiner Stuckenschmidt}, month = sep, pages = {48-59}, publisher = {Springer}, site = {Kassel, Germany}, title = {{Data Acquisition for Argument Search: The args.me corpus}}, year = 2019 }
Metadata

"argsme/2020-04-01/parliamentary"

Subset of the 48 arguments from args.me version 2020-04-01 that were crawled from Canadian Parliament discussions.

docs
48 docs

Language: en

Document type:
ArgsMeDoc: (namedtuple)
  1. doc_id: str
  2. conclusion: str
  3. premises: List[
    ArgsMePremise: (namedtuple)
    1. text: str
    2. stance: ArgsMeStance[PRO, CON]
    3. annotations: List[
      ArgsMePremiseAnnotation: (namedtuple)
      1. start: int
      2. end: int
      3. source: str
      ]
    ]
  4. premises_texts: str
  5. aspects: List[
    ArgsMeAspect: (namedtuple)
    1. name: str
    2. weight: float
    3. normalized_weight: float
    4. rank: int
    ]
  6. aspects_names: str
  7. source_id: str
  8. source_title: str
  9. source_url: Optional[str]
  10. source_previous_argument_id: Optional[str]
  11. source_next_argument_id: Optional[str]
  12. source_domain: Optional[ArgsMeSourceDomain[debateorg, debatepedia, debatewise, idebate, canadian_parliament]]
  13. source_text: Optional[str]
  14. source_text_conclusion_start: Optional[int]
  15. source_text_conclusion_end: Optional[int]
  16. source_text_premise_start: Optional[int]
  17. source_text_premise_end: Optional[int]
  18. topic: str
  19. acquisition: datetime
  20. date: Optional[datetime]
  21. author: Optional[str]
  22. author_image_url: Optional[str]
  23. author_organization: Optional[str]
  24. author_role: Optional[str]
  25. mode: Optional[ArgsMeMode[person, discussion]]

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/parliamentary")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>

You can find more details about the Python API here.

CLI
ir_datasets export argsme/2020-04-01/parliamentary docs
[doc_id]    [conclusion]    [premises]    [premises_texts]    [aspects]    [aspects_names]    [source_id]    [source_title]    [source_url]    [source_previous_argument_id]    [source_next_argument_id]    [source_domain]    [source_text]    [source_text_conclusion_start]    [source_text_conclusion_end]    [source_text_premise_start]    [source_text_premise_end]    [topic]    [acquisition]    [date]    [author]    [author_image_url]    [author_organization]    [author_role]    [mode]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/2020-04-01/parliamentary')
# Index argsme/2020-04-01/parliamentary
indexer = pt.IterDictIndexer('./indices/argsme_2020-04-01_parliamentary')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['conclusion', 'premises_texts', 'aspects_names', 'source_id', 'source_title', 'topic'])

You can find more details about PyTerrier indexing here.

Citation

ir_datasets.bib:

\cite{Wachsmuth2017Argument,Ajjour2019Acquisition}

Bibtex:

@inproceedings{Wachsmuth2017Argument, author = {Henning Wachsmuth and Martin Potthast and Khalid Al-Khatib and Yamen Ajjour and Jana Puschmann and Jiani Qu and Jonas Dorsch and Viorel Morari and Janek Bevendorff and Benno Stein}, booktitle = {4th Workshop on Argument Mining (ArgMining 2017) at EMNLP}, editor = {Kevin Ashley and Claire Cardie and Nancy Green and Iryna Gurevych and Ivan Habernal and Diane Litman and Georgios Petasis and Chris Reed and Noam Slonim and Vern Walker}, month = sep, pages = {49-59}, publisher = {Association for Computational Linguistics}, site = {Copenhagen, Denmark}, title = {{Building an Argument Search Engine for the Web}}, url = {https://www.aclweb.org/anthology/W17-5106}, year = 2017 } @inproceedings{Ajjour2019Acquisition, address = {Berlin Heidelberg New York}, author = {Yamen Ajjour and Henning Wachsmuth and Johannes Kiesel and Martin Potthast and Matthias Hagen and Benno Stein}, booktitle = {42nd German Conference on Artificial Intelligence (KI 2019)}, doi = {10.1007/978-3-030-30179-8\_4}, editor = {Christoph Benzm{\"u}ller and Heiner Stuckenschmidt}, month = sep, pages = {48-59}, publisher = {Springer}, site = {Kassel, Germany}, title = {{Data Acquisition for Argument Search: The args.me corpus}}, year = 2019 }
Metadata

"argsme/2020-04-01/touche-2020-task-1"

Decision making processes, be it at the societal or at the personal level, eventually come to a point where one side will challenge the other with a why-question, which is a prompt to justify one's stance. Thus, technologies for argument mining and argumentation processing are maturing at a rapid pace, giving rise for the first time to argument retrieval. Touché 2020 is the first lab on Argument Retrieval at CLEF 2020 featuring two tasks.

Given a question on a controversial topic, retrieve relevant arguments from a focused crawl of online debate portals (argsme/2020-04-01).

Documents are judged based on their general topical relevance.

queries
49 queries

Language: en

Query type:
ToucheQuery: (namedtuple)
  1. query_id: str
  2. title: str
  3. description: str
  4. narrative: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/touche-2020-task-1")
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, description, narrative>

You can find more details about the Python API here.

CLI
ir_datasets export argsme/2020-04-01/touche-2020-task-1 queries
[query_id]    [title]    [description]    [narrative]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/2020-04-01/touche-2020-task-1')
index_ref = pt.IndexRef.of('./indices/argsme_2020-04-01') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('title'))

You can find more details about PyTerrier retrieval here.

docs
388K docs

Inherits docs from argsme/2020-04-01

Language: en

Document type:
ArgsMeDoc: (namedtuple)
  1. doc_id: str
  2. conclusion: str
  3. premises: List[
    ArgsMePremise: (namedtuple)
    1. text: str
    2. stance: ArgsMeStance[PRO, CON]
    3. annotations: List[
      ArgsMePremiseAnnotation: (namedtuple)
      1. start: int
      2. end: int
      3. source: str
      ]
    ]
  4. premises_texts: str
  5. aspects: List[
    ArgsMeAspect: (namedtuple)
    1. name: str
    2. weight: float
    3. normalized_weight: float
    4. rank: int
    ]
  6. aspects_names: str
  7. source_id: str
  8. source_title: str
  9. source_url: Optional[str]
  10. source_previous_argument_id: Optional[str]
  11. source_next_argument_id: Optional[str]
  12. source_domain: Optional[ArgsMeSourceDomain[debateorg, debatepedia, debatewise, idebate, canadian_parliament]]
  13. source_text: Optional[str]
  14. source_text_conclusion_start: Optional[int]
  15. source_text_conclusion_end: Optional[int]
  16. source_text_premise_start: Optional[int]
  17. source_text_premise_end: Optional[int]
  18. topic: str
  19. acquisition: datetime
  20. date: Optional[datetime]
  21. author: Optional[str]
  22. author_image_url: Optional[str]
  23. author_organization: Optional[str]
  24. author_role: Optional[str]
  25. mode: Optional[ArgsMeMode[person, discussion]]

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/touche-2020-task-1")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>

You can find more details about the Python API here.

CLI
ir_datasets export argsme/2020-04-01/touche-2020-task-1 docs
[doc_id]    [conclusion]    [premises]    [premises_texts]    [aspects]    [aspects_names]    [source_id]    [source_title]    [source_url]    [source_previous_argument_id]    [source_next_argument_id]    [source_domain]    [source_text]    [source_text_conclusion_start]    [source_text_conclusion_end]    [source_text_premise_start]    [source_text_premise_end]    [topic]    [acquisition]    [date]    [author]    [author_image_url]    [author_organization]    [author_role]    [mode]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/2020-04-01/touche-2020-task-1')
# Index argsme/2020-04-01
indexer = pt.IterDictIndexer('./indices/argsme_2020-04-01')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['conclusion', 'premises_texts', 'aspects_names', 'source_id', 'source_title', 'topic'])

You can find more details about PyTerrier indexing here.

qrels
2.3K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
-2spam751 32.7%
0not relevant615 26.8%
1relevant296 12.9%
2highly relevant636 27.7%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/touche-2020-task-1")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export argsme/2020-04-01/touche-2020-task-1 qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:argsme/2020-04-01/touche-2020-task-1')
index_ref = pt.IndexRef.of('./indices/argsme_2020-04-01') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics('title'),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Bondarenko2020Touche,Wachsmuth2017Quality}

Bibtex:

@inproceedings{Bondarenko2020Touche, address = {Berlin Heidelberg New York}, author = {Alexander Bondarenko and Maik Fr{\"o}be and Meriem Beloucif and Lukas Gienapp and Yamen Ajjour and Alexander Panchenko and Chris Biemann and Benno Stein and Henning Wachsmuth and Martin Potthast and Matthias Hagen}, booktitle = {Experimental IR Meets Multilinguality, Multimodality, and Interaction. 11th International Conference of the CLEF Association (CLEF 2020)}, doi = {10.1007/978-3-030-58219-7\_26}, editor = {Avi Arampatzis and Evangelos Kanoulas and Theodora Tsikrika and Stefanos Vrochidis and Hideo Joho and Christina Lioma and Carsten Eickhoff and Aur{\'e}lie N{\'e}v{\'e}ol and Linda Cappellato and Nicola Ferro}, month = sep, pages = {384-395}, publisher = {Springer}, series = {Lecture Notes in Computer Science}, site = {Thessaloniki, Greece}, title = {{Overview of Touch{\'e} 2020: Argument Retrieval}}, url = {https://link.springer.com/chapter/10.1007/978-3-030-58219-7_26}, volume = 12260, year = 2020, } @inproceedings{Wachsmuth2017Quality, author = {Henning Wachsmuth and Nona Naderi and Yufang Hou and Yonatan Bilu and Vinodkumar Prabhakaran and Tim Alberdingk Thijm and Graeme Hirst and Benno Stein}, booktitle = {15th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2017)}, editor = {Phil Blunsom and Alexander Koller and Mirella Lapata}, month = apr, pages = {176-187}, site = {Valencia, Spain}, title = {{Computational Argumentation Quality Assessment in Natural Language}}, url = {http://aclweb.org/anthology/E17-1017}, year = 2017 }
Metadata

"argsme/2020-04-01/touche-2020-task-1/uncorrected"

Version of argsme/2020-04-01/touche-2020-task-1 that uses uncorrected relevance judgements derived from crowdworkers. This dataset's relevance judgements should not be used without preprocessing.

queries
49 queries

Inherits queries from argsme/2020-04-01/touche-2020-task-1

Language: en

Query type:
ToucheQuery: (namedtuple)
  1. query_id: str
  2. title: str
  3. description: str
  4. narrative: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/touche-2020-task-1/uncorrected")
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, description, narrative>

You can find more details about the Python API here.

CLI
ir_datasets export argsme/2020-04-01/touche-2020-task-1/uncorrected queries
[query_id]    [title]    [description]    [narrative]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/2020-04-01/touche-2020-task-1/uncorrected')
index_ref = pt.IndexRef.of('./indices/argsme_2020-04-01') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('title'))

You can find more details about PyTerrier retrieval here.

docs
388K docs

Inherits docs from argsme/2020-04-01

Language: en

Document type:
ArgsMeDoc: (namedtuple)
  1. doc_id: str
  2. conclusion: str
  3. premises: List[
    ArgsMePremise: (namedtuple)
    1. text: str
    2. stance: ArgsMeStance[PRO, CON]
    3. annotations: List[
      ArgsMePremiseAnnotation: (namedtuple)
      1. start: int
      2. end: int
      3. source: str
      ]
    ]
  4. premises_texts: str
  5. aspects: List[
    ArgsMeAspect: (namedtuple)
    1. name: str
    2. weight: float
    3. normalized_weight: float
    4. rank: int
    ]
  6. aspects_names: str
  7. source_id: str
  8. source_title: str
  9. source_url: Optional[str]
  10. source_previous_argument_id: Optional[str]
  11. source_next_argument_id: Optional[str]
  12. source_domain: Optional[ArgsMeSourceDomain[debateorg, debatepedia, debatewise, idebate, canadian_parliament]]
  13. source_text: Optional[str]
  14. source_text_conclusion_start: Optional[int]
  15. source_text_conclusion_end: Optional[int]
  16. source_text_premise_start: Optional[int]
  17. source_text_premise_end: Optional[int]
  18. topic: str
  19. acquisition: datetime
  20. date: Optional[datetime]
  21. author: Optional[str]
  22. author_image_url: Optional[str]
  23. author_organization: Optional[str]
  24. author_role: Optional[str]
  25. mode: Optional[ArgsMeMode[person, discussion]]

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/touche-2020-task-1/uncorrected")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>

You can find more details about the Python API here.

CLI
ir_datasets export argsme/2020-04-01/touche-2020-task-1/uncorrected docs
[doc_id]    [conclusion]    [premises]    [premises_texts]    [aspects]    [aspects_names]    [source_id]    [source_title]    [source_url]    [source_previous_argument_id]    [source_next_argument_id]    [source_domain]    [source_text]    [source_text_conclusion_start]    [source_text_conclusion_end]    [source_text_premise_start]    [source_text_premise_end]    [topic]    [acquisition]    [date]    [author]    [author_image_url]    [author_organization]    [author_role]    [mode]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/2020-04-01/touche-2020-task-1/uncorrected')
# Index argsme/2020-04-01
indexer = pt.IterDictIndexer('./indices/argsme_2020-04-01')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['conclusion', 'premises_texts', 'aspects_names', 'source_id', 'source_title', 'topic'])

You can find more details about PyTerrier indexing here.

qrels
2.3K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
-2spam, non-argument380 16.5%
1very low relevance144 6.3%
2low relevance199 8.7%
3moderate relevance485 21.1%
4high relevance665 28.9%
5very high relevance425 18.5%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/touche-2020-task-1/uncorrected")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export argsme/2020-04-01/touche-2020-task-1/uncorrected qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:argsme/2020-04-01/touche-2020-task-1/uncorrected')
index_ref = pt.IndexRef.of('./indices/argsme_2020-04-01') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics('title'),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Bondarenko2020Touche,Wachsmuth2017Quality}

Bibtex:

@inproceedings{Bondarenko2020Touche, address = {Berlin Heidelberg New York}, author = {Alexander Bondarenko and Maik Fr{\"o}be and Meriem Beloucif and Lukas Gienapp and Yamen Ajjour and Alexander Panchenko and Chris Biemann and Benno Stein and Henning Wachsmuth and Martin Potthast and Matthias Hagen}, booktitle = {Experimental IR Meets Multilinguality, Multimodality, and Interaction. 11th International Conference of the CLEF Association (CLEF 2020)}, doi = {10.1007/978-3-030-58219-7\_26}, editor = {Avi Arampatzis and Evangelos Kanoulas and Theodora Tsikrika and Stefanos Vrochidis and Hideo Joho and Christina Lioma and Carsten Eickhoff and Aur{\'e}lie N{\'e}v{\'e}ol and Linda Cappellato and Nicola Ferro}, month = sep, pages = {384-395}, publisher = {Springer}, series = {Lecture Notes in Computer Science}, site = {Thessaloniki, Greece}, title = {{Overview of Touch{\'e} 2020: Argument Retrieval}}, url = {https://link.springer.com/chapter/10.1007/978-3-030-58219-7_26}, volume = 12260, year = 2020, } @inproceedings{Wachsmuth2017Quality, author = {Henning Wachsmuth and Nona Naderi and Yufang Hou and Yonatan Bilu and Vinodkumar Prabhakaran and Tim Alberdingk Thijm and Graeme Hirst and Benno Stein}, booktitle = {15th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2017)}, editor = {Phil Blunsom and Alexander Koller and Mirella Lapata}, month = apr, pages = {176-187}, site = {Valencia, Spain}, title = {{Computational Argumentation Quality Assessment in Natural Language}}, url = {http://aclweb.org/anthology/E17-1017}, year = 2017 }
Metadata

"argsme/2020-04-01/touche-2021-task-1"

Decision making processes, be it at the societal or at the personal level, often come to a point where one side challenges the other with a why-question, which is a prompt to justify some stance based on arguments. Since technologies for argument mining are maturing at a rapid pace, also ad-hoc argument retrieval becomes a feasible task in reach. Touché 2021 is the second lab on argument retrieval at CLEF 2021 featuring two tasks.

Given a question on a controversial topic, retrieve relevant arguments from a focused crawl of online debate portals (argsme/2020-04-01).

Documents are judged based on their general topical relevance and for rhetorical quality, i.e., "well-writtenness" of the document: (1) whether the text has a good style of speech (formal language is preferred over informal), (2) whether the text has a proper sentence structure and is easy to read, (3) whether it includes profanity, has typos, and makes use of other detrimental style choices.

queries
50 queries

Language: en

Query type:
ToucheTitleQuery: (namedtuple)
  1. query_id: str
  2. title: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/touche-2021-task-1")
for query in dataset.queries_iter():
    query # namedtuple<query_id, title>

You can find more details about the Python API here.

CLI
ir_datasets export argsme/2020-04-01/touche-2021-task-1 queries
[query_id]    [title]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/2020-04-01/touche-2021-task-1')
index_ref = pt.IndexRef.of('./indices/argsme_2020-04-01') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs
388K docs

Inherits docs from argsme/2020-04-01

Language: en

Document type:
ArgsMeDoc: (namedtuple)
  1. doc_id: str
  2. conclusion: str
  3. premises: List[
    ArgsMePremise: (namedtuple)
    1. text: str
    2. stance: ArgsMeStance[PRO, CON]
    3. annotations: List[
      ArgsMePremiseAnnotation: (namedtuple)
      1. start: int
      2. end: int
      3. source: str
      ]
    ]
  4. premises_texts: str
  5. aspects: List[
    ArgsMeAspect: (namedtuple)
    1. name: str
    2. weight: float
    3. normalized_weight: float
    4. rank: int
    ]
  6. aspects_names: str
  7. source_id: str
  8. source_title: str
  9. source_url: Optional[str]
  10. source_previous_argument_id: Optional[str]
  11. source_next_argument_id: Optional[str]
  12. source_domain: Optional[ArgsMeSourceDomain[debateorg, debatepedia, debatewise, idebate, canadian_parliament]]
  13. source_text: Optional[str]
  14. source_text_conclusion_start: Optional[int]
  15. source_text_conclusion_end: Optional[int]
  16. source_text_premise_start: Optional[int]
  17. source_text_premise_end: Optional[int]
  18. topic: str
  19. acquisition: datetime
  20. date: Optional[datetime]
  21. author: Optional[str]
  22. author_image_url: Optional[str]
  23. author_organization: Optional[str]
  24. author_role: Optional[str]
  25. mode: Optional[ArgsMeMode[person, discussion]]

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/touche-2021-task-1")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>

You can find more details about the Python API here.

CLI
ir_datasets export argsme/2020-04-01/touche-2021-task-1 docs
[doc_id]    [conclusion]    [premises]    [premises_texts]    [aspects]    [aspects_names]    [source_id]    [source_title]    [source_url]    [source_previous_argument_id]    [source_next_argument_id]    [source_domain]    [source_text]    [source_text_conclusion_start]    [source_text_conclusion_end]    [source_text_premise_start]    [source_text_premise_end]    [topic]    [acquisition]    [date]    [author]    [author_image_url]    [author_organization]    [author_role]    [mode]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:argsme/2020-04-01/touche-2021-task-1')
# Index argsme/2020-04-01
indexer = pt.IterDictIndexer('./indices/argsme_2020-04-01')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['conclusion', 'premises_texts', 'aspects_names', 'source_id', 'source_title', 'topic'])

You can find more details about PyTerrier indexing here.

qrels
3.7K qrels
Query relevance judgment type:
ToucheQualityQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. quality: int
  5. iteration: str

Relevance levels

Rel.DefinitionCount%
-2spam351 9.5%
0not relevant1.5K41.6%
1relevant736 19.8%
2highly relevant1.1K29.2%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/touche-2021-task-1")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, quality, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export argsme/2020-04-01/touche-2021-task-1 qrels --format tsv
[query_id]    [doc_id]    [relevance]    [quality]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:argsme/2020-04-01/touche-2021-task-1')
index_ref = pt.IndexRef.of('./indices/argsme_2020-04-01') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Bondarenko2021Touche}

Bibtex:

@inproceedings{Bondarenko2021Touche, address = {Berlin Heidelberg New York}, author = {Alexander Bondarenko and Lukas Gienapp and Maik Fr{\"o}be and Meriem Beloucif and Yamen Ajjour and Alexander Panchenko and Chris Biemann and Benno Stein and Henning Wachsmuth and Martin Potthast and Matthias Hagen}, booktitle = {Experimental IR Meets Multilinguality, Multimodality, and Interaction. 12th International Conference of the CLEF Association (CLEF 2021)}, doi = {10.1007/978-3-030-85251-1\_28}, editor = {{K. Sel{\c{c}}uk} Candan and Bogdan Ionescu and Lorraine Goeuriot and Henning M{\"u}ller and Alexis Joly and Maria Maistro and Florina Piroi and Guglielmo Faggioli and Nicola Ferro}, month = sep, pages = {450-467}, publisher = {Springer}, series = {Lecture Notes in Computer Science}, site = {Bucharest, Romania}, title = {{Overview of Touch{\'e} 2021: Argument Retrieval}}, url = {https://link.springer.com/chapter/10.1007/978-3-030-85251-1_28}, volume = 12880, year = 2021, }
Metadata