← home
Github: datasets/argsme.py

ir_datasets: args.me

Index
  1. argsme
  2. argsme/1.0
  3. argsme/1.0-cleaned
  4. argsme/1.0/touche-2020-task-1/uncorrected
  5. argsme/2020-04-01
  6. argsme/2020-04-01/debateorg
  7. argsme/2020-04-01/debatepedia
  8. argsme/2020-04-01/debatewise
  9. argsme/2020-04-01/idebate
  10. argsme/2020-04-01/parliamentary
  11. argsme/2020-04-01/touche-2020-task-1
  12. argsme/2020-04-01/touche-2020-task-1/uncorrected
  13. argsme/2020-04-01/touche-2021-task-1

"argsme"

The args.me corpus is one of the largest argument resources available and contains arguments crawled from debate platforms and parliament discussions.

Citation

ir_datasets.bib:

\cite{Wachsmuth2017Argument,Ajjour2019Acquisition}

Bibtex:

@inproceedings{Wachsmuth2017Argument, author = {Henning Wachsmuth and Martin Potthast and Khalid Al-Khatib and Yamen Ajjour and Jana Puschmann and Jiani Qu and Jonas Dorsch and Viorel Morari and Janek Bevendorff and Benno Stein}, booktitle = {4th Workshop on Argument Mining (ArgMining 2017) at EMNLP}, editor = {Kevin Ashley and Claire Cardie and Nancy Green and Iryna Gurevych and Ivan Habernal and Diane Litman and Georgios Petasis and Chris Reed and Noam Slonim and Vern Walker}, month = sep, pages = {49-59}, publisher = {Association for Computational Linguistics}, site = {Copenhagen, Denmark}, title = {{Building an Argument Search Engine for the Web}}, url = {https://www.aclweb.org/anthology/W17-5106}, year = 2017 } @inproceedings{Ajjour2019Acquisition, address = {Berlin Heidelberg New York}, author = {Yamen Ajjour and Henning Wachsmuth and Johannes Kiesel and Martin Potthast and Matthias Hagen and Benno Stein}, booktitle = {42nd German Conference on Artificial Intelligence (KI 2019)}, doi = {10.1007/978-3-030-30179-8\_4}, editor = {Christoph Benzm{\"u}ller and Heiner Stuckenschmidt}, month = sep, pages = {48-59}, publisher = {Springer}, site = {Kassel, Germany}, title = {{Data Acquisition for Argument Search: The args.me corpus}}, year = 2019 }

"argsme/1.0"

Corpus version 1.0 with 387 606 arguments crawled from Debatewise, IDebate.org, Debatepedia, Debate.org. It was released on July 9, 2019 on Zenodo.

This collection is licensed with the Creative Commons Attribution 4.0 International. Individual rights to the content still apply.

docsCitationMetadata
388K docs

Language: en

Document type:
ArgsMeDoc: (namedtuple)
  1. doc_id: str
  2. conclusion: str
  3. premises: List[
    ArgsMePremise: (namedtuple)
    1. text: str
    2. stance: ArgsMeStance[PRO, CON]
    3. annotations: List[
      ArgsMePremiseAnnotation: (namedtuple)
      1. start: int
      2. end: int
      3. source: str
      ]
    ]
  4. premises_texts: str
  5. aspects: List[
    ArgsMeAspect: (namedtuple)
    1. name: str
    2. weight: float
    3. normalized_weight: float
    4. rank: int
    ]
  6. aspects_names: str
  7. source_id: str
  8. source_title: str
  9. source_url: Optional[str]
  10. source_previous_argument_id: Optional[str]
  11. source_next_argument_id: Optional[str]
  12. source_domain: Optional[ArgsMeSourceDomain[debateorg, debatepedia, debatewise, idebate, canadian_parliament]]
  13. source_text: Optional[str]
  14. source_text_conclusion_start: Optional[int]
  15. source_text_conclusion_end: Optional[int]
  16. source_text_premise_start: Optional[int]
  17. source_text_premise_end: Optional[int]
  18. topic: str
  19. acquisition: datetime
  20. date: Optional[datetime]
  21. author: Optional[str]
  22. author_image_url: Optional[str]
  23. author_organization: Optional[str]
  24. author_role: Optional[str]
  25. mode: Optional[ArgsMeMode[person, discussion]]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("argsme/1.0")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>

You can find more details about the Python API here.


"argsme/1.0-cleaned"

Corpus version 1.0-cleaned with 382 545 arguments crawled from Debatewise, IDebate.org, Debatepedia, Debate.org. This version contains the same arguments as version 1.0, but cleaned as described in the corresponding publication. It was released on October 27, 2020 on Zenodo.

This collection is licensed with the Creative Commons Attribution 4.0 International. Individual rights to the content still apply.

docsCitationMetadata
383K docs

Language: en

Document type:
ArgsMeDoc: (namedtuple)
  1. doc_id: str
  2. conclusion: str
  3. premises: List[
    ArgsMePremise: (namedtuple)
    1. text: str
    2. stance: ArgsMeStance[PRO, CON]
    3. annotations: List[
      ArgsMePremiseAnnotation: (namedtuple)
      1. start: int
      2. end: int
      3. source: str
      ]
    ]
  4. premises_texts: str
  5. aspects: List[
    ArgsMeAspect: (namedtuple)
    1. name: str
    2. weight: float
    3. normalized_weight: float
    4. rank: int
    ]
  6. aspects_names: str
  7. source_id: str
  8. source_title: str
  9. source_url: Optional[str]
  10. source_previous_argument_id: Optional[str]
  11. source_next_argument_id: Optional[str]
  12. source_domain: Optional[ArgsMeSourceDomain[debateorg, debatepedia, debatewise, idebate, canadian_parliament]]
  13. source_text: Optional[str]
  14. source_text_conclusion_start: Optional[int]
  15. source_text_conclusion_end: Optional[int]
  16. source_text_premise_start: Optional[int]
  17. source_text_premise_end: Optional[int]
  18. topic: str
  19. acquisition: datetime
  20. date: Optional[datetime]
  21. author: Optional[str]
  22. author_image_url: Optional[str]
  23. author_organization: Optional[str]
  24. author_role: Optional[str]
  25. mode: Optional[ArgsMeMode[person, discussion]]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("argsme/1.0-cleaned")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>

You can find more details about the Python API here.


"argsme/1.0/touche-2020-task-1/uncorrected"

Version of argsme/2020-04-01/touche-2020-task-1 that uses the argsme/1.0 corpus with uncorrected relevance judgements derived from crowdworkers. This dataset's relevance judgements should not be used without preprocessing.

queriesdocsqrelsCitationMetadata
49 queries

Language: en

Query type:
ToucheQuery: (namedtuple)
  1. query_id: str
  2. title: str
  3. description: str
  4. narrative: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("argsme/1.0/touche-2020-task-1/uncorrected")
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, description, narrative>

You can find more details about the Python API here.


"argsme/2020-04-01"

Corpus version 2020-04-01 with 387 740 arguments crawled from Debatewise, IDebate.org, Debatepedia, Debate.org, and from Canadian Parliament discussions. It was released on April 1, 2020 on Zenodo.

This collection is licensed with the Creative Commons Attribution 4.0 International. Individual rights to the content still apply.

docsCitationMetadata
388K docs

Language: en

Document type:
ArgsMeDoc: (namedtuple)
  1. doc_id: str
  2. conclusion: str
  3. premises: List[
    ArgsMePremise: (namedtuple)
    1. text: str
    2. stance: ArgsMeStance[PRO, CON]
    3. annotations: List[
      ArgsMePremiseAnnotation: (namedtuple)
      1. start: int
      2. end: int
      3. source: str
      ]
    ]
  4. premises_texts: str
  5. aspects: List[
    ArgsMeAspect: (namedtuple)
    1. name: str
    2. weight: float
    3. normalized_weight: float
    4. rank: int
    ]
  6. aspects_names: str
  7. source_id: str
  8. source_title: str
  9. source_url: Optional[str]
  10. source_previous_argument_id: Optional[str]
  11. source_next_argument_id: Optional[str]
  12. source_domain: Optional[ArgsMeSourceDomain[debateorg, debatepedia, debatewise, idebate, canadian_parliament]]
  13. source_text: Optional[str]
  14. source_text_conclusion_start: Optional[int]
  15. source_text_conclusion_end: Optional[int]
  16. source_text_premise_start: Optional[int]
  17. source_text_premise_end: Optional[int]
  18. topic: str
  19. acquisition: datetime
  20. date: Optional[datetime]
  21. author: Optional[str]
  22. author_image_url: Optional[str]
  23. author_organization: Optional[str]
  24. author_role: Optional[str]
  25. mode: Optional[ArgsMeMode[person, discussion]]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>

You can find more details about the Python API here.


"argsme/2020-04-01/debateorg"

Subset of the 338 620 arguments from args.me version 2020-04-01 that were crawled from the debate portal Debate.org.

docsCitationMetadata
339K docs

Language: en

Document type:
ArgsMeDoc: (namedtuple)
  1. doc_id: str
  2. conclusion: str
  3. premises: List[
    ArgsMePremise: (namedtuple)
    1. text: str
    2. stance: ArgsMeStance[PRO, CON]
    3. annotations: List[
      ArgsMePremiseAnnotation: (namedtuple)
      1. start: int
      2. end: int
      3. source: str
      ]
    ]
  4. premises_texts: str
  5. aspects: List[
    ArgsMeAspect: (namedtuple)
    1. name: str
    2. weight: float
    3. normalized_weight: float
    4. rank: int
    ]
  6. aspects_names: str
  7. source_id: str
  8. source_title: str
  9. source_url: Optional[str]
  10. source_previous_argument_id: Optional[str]
  11. source_next_argument_id: Optional[str]
  12. source_domain: Optional[ArgsMeSourceDomain[debateorg, debatepedia, debatewise, idebate, canadian_parliament]]
  13. source_text: Optional[str]
  14. source_text_conclusion_start: Optional[int]
  15. source_text_conclusion_end: Optional[int]
  16. source_text_premise_start: Optional[int]
  17. source_text_premise_end: Optional[int]
  18. topic: str
  19. acquisition: datetime
  20. date: Optional[datetime]
  21. author: Optional[str]
  22. author_image_url: Optional[str]
  23. author_organization: Optional[str]
  24. author_role: Optional[str]
  25. mode: Optional[ArgsMeMode[person, discussion]]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/debateorg")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>

You can find more details about the Python API here.


"argsme/2020-04-01/debatepedia"

Subset of the 21 197 arguments from args.me version 2020-04-01 that were crawled from the debate portal Debatepedia.

docsCitationMetadata
21K docs

Language: en

Document type:
ArgsMeDoc: (namedtuple)
  1. doc_id: str
  2. conclusion: str
  3. premises: List[
    ArgsMePremise: (namedtuple)
    1. text: str
    2. stance: ArgsMeStance[PRO, CON]
    3. annotations: List[
      ArgsMePremiseAnnotation: (namedtuple)
      1. start: int
      2. end: int
      3. source: str
      ]
    ]
  4. premises_texts: str
  5. aspects: List[
    ArgsMeAspect: (namedtuple)
    1. name: str
    2. weight: float
    3. normalized_weight: float
    4. rank: int
    ]
  6. aspects_names: str
  7. source_id: str
  8. source_title: str
  9. source_url: Optional[str]
  10. source_previous_argument_id: Optional[str]
  11. source_next_argument_id: Optional[str]
  12. source_domain: Optional[ArgsMeSourceDomain[debateorg, debatepedia, debatewise, idebate, canadian_parliament]]
  13. source_text: Optional[str]
  14. source_text_conclusion_start: Optional[int]
  15. source_text_conclusion_end: Optional[int]
  16. source_text_premise_start: Optional[int]
  17. source_text_premise_end: Optional[int]
  18. topic: str
  19. acquisition: datetime
  20. date: Optional[datetime]
  21. author: Optional[str]
  22. author_image_url: Optional[str]
  23. author_organization: Optional[str]
  24. author_role: Optional[str]
  25. mode: Optional[ArgsMeMode[person, discussion]]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/debatepedia")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>

You can find more details about the Python API here.


"argsme/2020-04-01/debatewise"

Subset of the 14 353 arguments from args.me version 2020-04-01 that were crawled from the debate portal Debatewise.

docsCitationMetadata
14K docs

Language: en

Document type:
ArgsMeDoc: (namedtuple)
  1. doc_id: str
  2. conclusion: str
  3. premises: List[
    ArgsMePremise: (namedtuple)
    1. text: str
    2. stance: ArgsMeStance[PRO, CON]
    3. annotations: List[
      ArgsMePremiseAnnotation: (namedtuple)
      1. start: int
      2. end: int
      3. source: str
      ]
    ]
  4. premises_texts: str
  5. aspects: List[
    ArgsMeAspect: (namedtuple)
    1. name: str
    2. weight: float
    3. normalized_weight: float
    4. rank: int
    ]
  6. aspects_names: str
  7. source_id: str
  8. source_title: str
  9. source_url: Optional[str]
  10. source_previous_argument_id: Optional[str]
  11. source_next_argument_id: Optional[str]
  12. source_domain: Optional[ArgsMeSourceDomain[debateorg, debatepedia, debatewise, idebate, canadian_parliament]]
  13. source_text: Optional[str]
  14. source_text_conclusion_start: Optional[int]
  15. source_text_conclusion_end: Optional[int]
  16. source_text_premise_start: Optional[int]
  17. source_text_premise_end: Optional[int]
  18. topic: str
  19. acquisition: datetime
  20. date: Optional[datetime]
  21. author: Optional[str]
  22. author_image_url: Optional[str]
  23. author_organization: Optional[str]
  24. author_role: Optional[str]
  25. mode: Optional[ArgsMeMode[person, discussion]]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/debatewise")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>

You can find more details about the Python API here.


"argsme/2020-04-01/idebate"

Subset of the 13 522 arguments from args.me version 2020-04-01 that were crawled from the debate portal IDebate.org.

docsCitationMetadata
14K docs

Language: en

Document type:
ArgsMeDoc: (namedtuple)
  1. doc_id: str
  2. conclusion: str
  3. premises: List[
    ArgsMePremise: (namedtuple)
    1. text: str
    2. stance: ArgsMeStance[PRO, CON]
    3. annotations: List[
      ArgsMePremiseAnnotation: (namedtuple)
      1. start: int
      2. end: int
      3. source: str
      ]
    ]
  4. premises_texts: str
  5. aspects: List[
    ArgsMeAspect: (namedtuple)
    1. name: str
    2. weight: float
    3. normalized_weight: float
    4. rank: int
    ]
  6. aspects_names: str
  7. source_id: str
  8. source_title: str
  9. source_url: Optional[str]
  10. source_previous_argument_id: Optional[str]
  11. source_next_argument_id: Optional[str]
  12. source_domain: Optional[ArgsMeSourceDomain[debateorg, debatepedia, debatewise, idebate, canadian_parliament]]
  13. source_text: Optional[str]
  14. source_text_conclusion_start: Optional[int]
  15. source_text_conclusion_end: Optional[int]
  16. source_text_premise_start: Optional[int]
  17. source_text_premise_end: Optional[int]
  18. topic: str
  19. acquisition: datetime
  20. date: Optional[datetime]
  21. author: Optional[str]
  22. author_image_url: Optional[str]
  23. author_organization: Optional[str]
  24. author_role: Optional[str]
  25. mode: Optional[ArgsMeMode[person, discussion]]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/idebate")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>

You can find more details about the Python API here.


"argsme/2020-04-01/parliamentary"

Subset of the 48 arguments from args.me version 2020-04-01 that were crawled from Canadian Parliament discussions.

docsCitationMetadata
48 docs

Language: en

Document type:
ArgsMeDoc: (namedtuple)
  1. doc_id: str
  2. conclusion: str
  3. premises: List[
    ArgsMePremise: (namedtuple)
    1. text: str
    2. stance: ArgsMeStance[PRO, CON]
    3. annotations: List[
      ArgsMePremiseAnnotation: (namedtuple)
      1. start: int
      2. end: int
      3. source: str
      ]
    ]
  4. premises_texts: str
  5. aspects: List[
    ArgsMeAspect: (namedtuple)
    1. name: str
    2. weight: float
    3. normalized_weight: float
    4. rank: int
    ]
  6. aspects_names: str
  7. source_id: str
  8. source_title: str
  9. source_url: Optional[str]
  10. source_previous_argument_id: Optional[str]
  11. source_next_argument_id: Optional[str]
  12. source_domain: Optional[ArgsMeSourceDomain[debateorg, debatepedia, debatewise, idebate, canadian_parliament]]
  13. source_text: Optional[str]
  14. source_text_conclusion_start: Optional[int]
  15. source_text_conclusion_end: Optional[int]
  16. source_text_premise_start: Optional[int]
  17. source_text_premise_end: Optional[int]
  18. topic: str
  19. acquisition: datetime
  20. date: Optional[datetime]
  21. author: Optional[str]
  22. author_image_url: Optional[str]
  23. author_organization: Optional[str]
  24. author_role: Optional[str]
  25. mode: Optional[ArgsMeMode[person, discussion]]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/parliamentary")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, conclusion, premises, premises_texts, aspects, aspects_names, source_id, source_title, source_url, source_previous_argument_id, source_next_argument_id, source_domain, source_text, source_text_conclusion_start, source_text_conclusion_end, source_text_premise_start, source_text_premise_end, topic, acquisition, date, author, author_image_url, author_organization, author_role, mode>

You can find more details about the Python API here.


"argsme/2020-04-01/touche-2020-task-1"

Decision making processes, be it at the societal or at the personal level, eventually come to a point where one side will challenge the other with a why-question, which is a prompt to justify one's stance. Thus, technologies for argument mining and argumentation processing are maturing at a rapid pace, giving rise for the first time to argument retrieval. Touché 2020 is the first lab on Argument Retrieval at CLEF 2020 featuring two tasks.

Given a question on a controversial topic, retrieve relevant arguments from a focused crawl of online debate portals (argsme/2020-04-01).

Documents are judged based on their general topical relevance.

queriesdocsqrelsCitationMetadata
49 queries

Language: en

Query type:
ToucheQuery: (namedtuple)
  1. query_id: str
  2. title: str
  3. description: str
  4. narrative: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/touche-2020-task-1")
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, description, narrative>

You can find more details about the Python API here.


"argsme/2020-04-01/touche-2020-task-1/uncorrected"

Version of argsme/2020-04-01/touche-2020-task-1 that uses uncorrected relevance judgements derived from crowdworkers. This dataset's relevance judgements should not be used without preprocessing.

queriesdocsqrelsCitationMetadata
49 queries

Inherits queries from argsme/2020-04-01/touche-2020-task-1

Language: en

Query type:
ToucheQuery: (namedtuple)
  1. query_id: str
  2. title: str
  3. description: str
  4. narrative: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/touche-2020-task-1/uncorrected")
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, description, narrative>

You can find more details about the Python API here.


"argsme/2020-04-01/touche-2021-task-1"

Decision making processes, be it at the societal or at the personal level, often come to a point where one side challenges the other with a why-question, which is a prompt to justify some stance based on arguments. Since technologies for argument mining are maturing at a rapid pace, also ad-hoc argument retrieval becomes a feasible task in reach. Touché 2021 is the second lab on argument retrieval at CLEF 2021 featuring two tasks.

Given a question on a controversial topic, retrieve relevant arguments from a focused crawl of online debate portals (argsme/2020-04-01).

Documents are judged based on their general topical relevance and for rhetorical quality, i.e., "well-writtenness" of the document: (1) whether the text has a good style of speech (formal language is preferred over informal), (2) whether the text has a proper sentence structure and is easy to read, (3) whether it includes profanity, has typos, and makes use of other detrimental style choices.

queriesdocsqrelsCitationMetadata
50 queries

Language: en

Query type:
ToucheTitleQuery: (namedtuple)
  1. query_id: str
  2. title: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("argsme/2020-04-01/touche-2021-task-1")
for query in dataset.queries_iter():
    query # namedtuple<query_id, title>

You can find more details about the Python API here.