← home
Github: datasets/mmarco.py

ir_datasets: mMARCO

Index
  1. mmarco
  2. mmarco/de
  3. mmarco/de/dev
  4. mmarco/de/dev/small
  5. mmarco/de/train
  6. mmarco/es
  7. mmarco/es/dev
  8. mmarco/es/dev/small
  9. mmarco/es/train
  10. mmarco/fr
  11. mmarco/fr/dev
  12. mmarco/fr/dev/small
  13. mmarco/fr/train
  14. mmarco/id
  15. mmarco/id/dev
  16. mmarco/id/dev/small
  17. mmarco/id/train
  18. mmarco/it
  19. mmarco/it/dev
  20. mmarco/it/dev/small
  21. mmarco/it/train
  22. mmarco/pt
  23. mmarco/pt/dev
  24. mmarco/pt/dev/small
  25. mmarco/pt/dev/small/v1.1
  26. mmarco/pt/dev/v1.1
  27. mmarco/pt/train
  28. mmarco/pt/train/v1.1
  29. mmarco/ru
  30. mmarco/ru/dev
  31. mmarco/ru/dev/small
  32. mmarco/ru/train
  33. mmarco/v2/ar
  34. mmarco/v2/ar/dev
  35. mmarco/v2/ar/dev/small
  36. mmarco/v2/ar/train
  37. mmarco/v2/de
  38. mmarco/v2/de/dev
  39. mmarco/v2/de/dev/small
  40. mmarco/v2/de/train
  41. mmarco/v2/dt
  42. mmarco/v2/dt/dev
  43. mmarco/v2/dt/dev/small
  44. mmarco/v2/dt/train
  45. mmarco/v2/es
  46. mmarco/v2/es/dev
  47. mmarco/v2/es/dev/small
  48. mmarco/v2/es/train
  49. mmarco/v2/fr
  50. mmarco/v2/fr/dev
  51. mmarco/v2/fr/dev/small
  52. mmarco/v2/fr/train
  53. mmarco/v2/hi
  54. mmarco/v2/hi/dev
  55. mmarco/v2/hi/dev/small
  56. mmarco/v2/hi/train
  57. mmarco/v2/id
  58. mmarco/v2/id/dev
  59. mmarco/v2/id/dev/small
  60. mmarco/v2/id/train
  61. mmarco/v2/it
  62. mmarco/v2/it/dev
  63. mmarco/v2/it/dev/small
  64. mmarco/v2/it/train
  65. mmarco/v2/ja
  66. mmarco/v2/ja/dev
  67. mmarco/v2/ja/dev/small
  68. mmarco/v2/ja/train
  69. mmarco/v2/pt
  70. mmarco/v2/pt/dev
  71. mmarco/v2/pt/dev/small
  72. mmarco/v2/pt/train
  73. mmarco/v2/ru
  74. mmarco/v2/ru/dev
  75. mmarco/v2/ru/dev/small
  76. mmarco/v2/ru/train
  77. mmarco/v2/vi
  78. mmarco/v2/vi/dev
  79. mmarco/v2/vi/dev/small
  80. mmarco/v2/vi/train
  81. mmarco/v2/zh
  82. mmarco/v2/zh/dev
  83. mmarco/v2/zh/dev/small
  84. mmarco/v2/zh/train
  85. mmarco/zh
  86. mmarco/zh/dev
  87. mmarco/zh/dev/small
  88. mmarco/zh/dev/small/v1.1
  89. mmarco/zh/dev/v1.1
  90. mmarco/zh/train

"mmarco"

A version of the MS MARCO passage dataset (msmarco-passage) with the queries and documents automatically translated into several languages.

  • Documents: Short passages (from web), translated from English
  • Queries: Natural language questions (from query log), translated from English
  • Repository
  • Dataset Paper
Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }

"mmarco/de"

Version of msmarco-passage, with documents translated into German.

docs
8.8M docs

Language: de

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/de")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/de docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.de')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/de/dev"

Version of msmarco-passage/dev, with queries and documents translated into German.

queries
101K queries

Language: de

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/de/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/de/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.de.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/de

Language: de

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/de/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/de/dev docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.de.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
59K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant59K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/de/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/de/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.de.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/de/dev/small"

Version of msmarco-passage/dev/small, with queries and documents translated into German.

queries
7.0K queries

Language: de

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/de/dev/small")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/de/dev/small queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.de.dev.small.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/de

Language: de

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/de/dev/small")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/de/dev/small docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.de.dev.small')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
7.4K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant7.4K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/de/dev/small")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/de/dev/small qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.de.dev.small.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

scoreddocs
6.6M scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/de/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/de/dev/small scoreddocs --format tsv
[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
import datamaestro # Supposes experimaestro-ir be installed

run = datamaestro.prepare_dataset('irds.mmarco.de.dev.small.scoreddocs') # AdhocRun
# A run is a generic object, and is specialized into final classes
# e.g. TrecAdhocRun 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocRun

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/de/train"

Version of msmarco-passage/train, with queries and documents translated into German.

queries
809K queries

Language: de

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/de/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/de/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.de.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/de

Language: de

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/de/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/de/train docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.de.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
533K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant533K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/de/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/de/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.de.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

docpairs
40M docpairs
Document Pair type:
GenericDocPair: (namedtuple)
  1. query_id: str
  2. doc_id_a: str
  3. doc_id_b: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/de/train")
for docpair in dataset.docpairs_iter():
    docpair # namedtuple<query_id, doc_id_a, doc_id_b>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/de/train docpairs
[query_id]    [doc_id_a]    [doc_id_b]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
import datamaestro # Supposes experimaestro-ir be installed

docpairs = datamaestro.prepare_dataset('irds.mmarco.de.train.docpairs')
next(docpairs.iter())  # Display the first triplet

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/es"

Version of msmarco-passage, with documents translated into Spanish.

docs
8.8M docs

Language: es

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/es")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/es docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.es')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/es/dev"

Version of msmarco-passage/dev, with queries and documents translated into Spanish.

queries
101K queries

Language: es

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/es/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/es/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.es.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/es

Language: es

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/es/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/es/dev docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.es.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
59K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant59K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/es/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/es/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.es.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/es/dev/small"

Version of msmarco-passage/dev/small, with queries and documents translated into Spanish.

queries
7.0K queries

Language: es

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/es/dev/small")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/es/dev/small queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.es.dev.small.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/es

Language: es

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/es/dev/small")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/es/dev/small docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.es.dev.small')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
7.4K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant7.4K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/es/dev/small")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/es/dev/small qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.es.dev.small.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

scoreddocs
6.8M scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/es/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/es/dev/small scoreddocs --format tsv
[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
import datamaestro # Supposes experimaestro-ir be installed

run = datamaestro.prepare_dataset('irds.mmarco.es.dev.small.scoreddocs') # AdhocRun
# A run is a generic object, and is specialized into final classes
# e.g. TrecAdhocRun 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocRun

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/es/train"

Version of msmarco-passage/train, with queries and documents translated into Spanish.

queries
809K queries

Language: es

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/es/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/es/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.es.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/es

Language: es

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/es/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/es/train docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.es.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
533K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant533K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/es/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/es/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.es.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

docpairs
40M docpairs
Document Pair type:
GenericDocPair: (namedtuple)
  1. query_id: str
  2. doc_id_a: str
  3. doc_id_b: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/es/train")
for docpair in dataset.docpairs_iter():
    docpair # namedtuple<query_id, doc_id_a, doc_id_b>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/es/train docpairs
[query_id]    [doc_id_a]    [doc_id_b]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
import datamaestro # Supposes experimaestro-ir be installed

docpairs = datamaestro.prepare_dataset('irds.mmarco.es.train.docpairs')
next(docpairs.iter())  # Display the first triplet

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/fr"

Version of msmarco-passage, with documents translated into French.

docs
8.8M docs

Language: fr

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/fr")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/fr docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.fr')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/fr/dev"

Version of msmarco-passage/dev, with queries and documents translated into French.

queries
101K queries

Language: fr

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/fr/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/fr/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.fr.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/fr

Language: fr

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/fr/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/fr/dev docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.fr.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
59K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant59K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/fr/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/fr/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.fr.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/fr/dev/small"

Version of msmarco-passage/dev/small, with queries and documents translated into French.

queries
7.0K queries

Language: fr

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/fr/dev/small")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/fr/dev/small queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.fr.dev.small.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/fr

Language: fr

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/fr/dev/small")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/fr/dev/small docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.fr.dev.small')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
7.4K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant7.4K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/fr/dev/small")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/fr/dev/small qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.fr.dev.small.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

scoreddocs
6.8M scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/fr/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/fr/dev/small scoreddocs --format tsv
[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
import datamaestro # Supposes experimaestro-ir be installed

run = datamaestro.prepare_dataset('irds.mmarco.fr.dev.small.scoreddocs') # AdhocRun
# A run is a generic object, and is specialized into final classes
# e.g. TrecAdhocRun 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocRun

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/fr/train"

Version of msmarco-passage/train, with queries and documents translated into French.

queries
809K queries

Language: fr

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/fr/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/fr/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.fr.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/fr

Language: fr

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/fr/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/fr/train docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.fr.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
533K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant533K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/fr/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/fr/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.fr.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

docpairs
40M docpairs
Document Pair type:
GenericDocPair: (namedtuple)
  1. query_id: str
  2. doc_id_a: str
  3. doc_id_b: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/fr/train")
for docpair in dataset.docpairs_iter():
    docpair # namedtuple<query_id, doc_id_a, doc_id_b>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/fr/train docpairs
[query_id]    [doc_id_a]    [doc_id_b]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
import datamaestro # Supposes experimaestro-ir be installed

docpairs = datamaestro.prepare_dataset('irds.mmarco.fr.train.docpairs')
next(docpairs.iter())  # Display the first triplet

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/id"

Version of msmarco-passage, with documents translated into Indonesian.

docs
8.8M docs

Language: id

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/id")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/id docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.id')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/id/dev"

Version of msmarco-passage/dev, with queries and documents translated into Indonesian.

queries
101K queries

Language: id

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/id/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/id/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.id.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/id

Language: id

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/id/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/id/dev docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.id.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
59K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant59K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/id/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/id/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.id.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/id/dev/small"

Version of msmarco-passage/dev/small, with queries and documents translated into Indonesian.

queries
7.0K queries

Language: id

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/id/dev/small")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/id/dev/small queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.id.dev.small.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/id

Language: id

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/id/dev/small")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/id/dev/small docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.id.dev.small')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
7.4K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant7.4K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/id/dev/small")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/id/dev/small qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.id.dev.small.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

scoreddocs
6.8M scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/id/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/id/dev/small scoreddocs --format tsv
[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
import datamaestro # Supposes experimaestro-ir be installed

run = datamaestro.prepare_dataset('irds.mmarco.id.dev.small.scoreddocs') # AdhocRun
# A run is a generic object, and is specialized into final classes
# e.g. TrecAdhocRun 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocRun

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/id/train"

Version of msmarco-passage/train, with queries and documents translated into Indonesian.

queries
809K queries

Language: id

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/id/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/id/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.id.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/id

Language: id

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/id/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/id/train docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.id.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
533K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant533K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/id/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/id/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.id.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

docpairs
40M docpairs
Document Pair type:
GenericDocPair: (namedtuple)
  1. query_id: str
  2. doc_id_a: str
  3. doc_id_b: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/id/train")
for docpair in dataset.docpairs_iter():
    docpair # namedtuple<query_id, doc_id_a, doc_id_b>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/id/train docpairs
[query_id]    [doc_id_a]    [doc_id_b]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
import datamaestro # Supposes experimaestro-ir be installed

docpairs = datamaestro.prepare_dataset('irds.mmarco.id.train.docpairs')
next(docpairs.iter())  # Display the first triplet

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/it"

Version of msmarco-passage, with documents translated into Italian.

docs
8.8M docs

Language: it

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/it")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/it docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.it')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/it/dev"

Version of msmarco-passage/dev, with queries and documents translated into Italian.

queries
101K queries

Language: it

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/it/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/it/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.it.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/it

Language: it

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/it/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/it/dev docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.it.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
59K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant59K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/it/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/it/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.it.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/it/dev/small"

Version of msmarco-passage/dev/small, with queries and documents translated into Italian.

queries
7.0K queries

Language: it

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/it/dev/small")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/it/dev/small queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.it.dev.small.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/it

Language: it

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/it/dev/small")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/it/dev/small docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.it.dev.small')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
7.4K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant7.4K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/it/dev/small")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/it/dev/small qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.it.dev.small.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

scoreddocs
7.0M scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/it/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/it/dev/small scoreddocs --format tsv
[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
import datamaestro # Supposes experimaestro-ir be installed

run = datamaestro.prepare_dataset('irds.mmarco.it.dev.small.scoreddocs') # AdhocRun
# A run is a generic object, and is specialized into final classes
# e.g. TrecAdhocRun 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocRun

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/it/train"

Version of msmarco-passage/train, with queries and documents translated into Italian.

queries
809K queries

Language: it

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/it/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/it/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.it.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/it

Language: it

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/it/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/it/train docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.it.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
533K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant533K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/it/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/it/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.it.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

docpairs
40M docpairs
Document Pair type:
GenericDocPair: (namedtuple)
  1. query_id: str
  2. doc_id_a: str
  3. doc_id_b: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/it/train")
for docpair in dataset.docpairs_iter():
    docpair # namedtuple<query_id, doc_id_a, doc_id_b>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/it/train docpairs
[query_id]    [doc_id_a]    [doc_id_b]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
import datamaestro # Supposes experimaestro-ir be installed

docpairs = datamaestro.prepare_dataset('irds.mmarco.it.train.docpairs')
next(docpairs.iter())  # Display the first triplet

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/pt"

Version of msmarco-passage, with documents translated into Portuguese.

docs
8.8M docs

Language: pt

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/pt")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/pt docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.pt')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/pt/dev"

Version of msmarco-passage/dev, with queries and documents translated into Portuguese.

queries
102K queries

Language: pt

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/pt/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.pt.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/pt

Language: pt

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/pt/dev docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.pt.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
59K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant59K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/pt/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.pt.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/pt/dev/small"

Version of msmarco-passage/dev/small, with queries and documents translated into Portuguese.

queries
7.0K queries

Language: pt

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/dev/small")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/pt/dev/small queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.pt.dev.small.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/pt

Language: pt

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/dev/small")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/pt/dev/small docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.pt.dev.small')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
7.4K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant7.4K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/dev/small")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/pt/dev/small qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.pt.dev.small.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/pt/dev/small/v1.1"

Version of msmarco-passage/dev, with queries and documents translated into Portuguese.

Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here. It also removes some duplicated query IDs.

queries
7.0K queries

Language: pt

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/dev/small/v1.1")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/pt/dev/small/v1.1 queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.pt.dev.small.v1.1.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/pt

Language: pt

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/dev/small/v1.1")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/pt/dev/small/v1.1 docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.pt.dev.small.v1.1')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
7.4K qrels

Inherits qrels from mmarco/pt/dev/small

Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant7.4K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/dev/small/v1.1")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/pt/dev/small/v1.1 qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.pt.dev.small.v1.1.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

scoreddocs
7.0M scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/dev/small/v1.1")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/pt/dev/small/v1.1 scoreddocs --format tsv
[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
import datamaestro # Supposes experimaestro-ir be installed

run = datamaestro.prepare_dataset('irds.mmarco.pt.dev.small.v1.1.scoreddocs') # AdhocRun
# A run is a generic object, and is specialized into final classes
# e.g. TrecAdhocRun 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocRun

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/pt/dev/v1.1"

Version of msmarco-passage/dev, with queries and documents translated into Portuguese.

Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here. It also removes some duplicated query IDs.

queries
101K queries

Language: pt

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/dev/v1.1")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/pt/dev/v1.1 queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.pt.dev.v1.1.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/pt

Language: pt

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/dev/v1.1")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/pt/dev/v1.1 docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.pt.dev.v1.1')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
59K qrels

Inherits qrels from mmarco/pt/dev

Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant59K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/dev/v1.1")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/pt/dev/v1.1 qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.pt.dev.v1.1.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/pt/train"

Version of msmarco-passage/train, with queries and documents translated into Portuguese.

queries
812K queries

Language: pt

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/pt/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.pt.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/pt

Language: pt

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/pt/train docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.pt.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
533K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant533K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/pt/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.pt.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

docpairs
40M docpairs
Document Pair type:
GenericDocPair: (namedtuple)
  1. query_id: str
  2. doc_id_a: str
  3. doc_id_b: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/train")
for docpair in dataset.docpairs_iter():
    docpair # namedtuple<query_id, doc_id_a, doc_id_b>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/pt/train docpairs
[query_id]    [doc_id_a]    [doc_id_b]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
import datamaestro # Supposes experimaestro-ir be installed

docpairs = datamaestro.prepare_dataset('irds.mmarco.pt.train.docpairs')
next(docpairs.iter())  # Display the first triplet

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/pt/train/v1.1"

Version of msmarco-passage/train, with queries and documents translated into Portuguese.

Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here. It also removes some duplicated query IDs.

queries
809K queries

Language: pt

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/train/v1.1")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/pt/train/v1.1 queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.pt.train.v1.1.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/pt

Language: pt

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/train/v1.1")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/pt/train/v1.1 docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.pt.train.v1.1')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
533K qrels

Inherits qrels from mmarco/pt/train

Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant533K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/train/v1.1")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/pt/train/v1.1 qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.pt.train.v1.1.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

docpairs
40M docpairs

Inherits docpairs from mmarco/pt/train

Document Pair type:
GenericDocPair: (namedtuple)
  1. query_id: str
  2. doc_id_a: str
  3. doc_id_b: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/train/v1.1")
for docpair in dataset.docpairs_iter():
    docpair # namedtuple<query_id, doc_id_a, doc_id_b>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/pt/train/v1.1 docpairs
[query_id]    [doc_id_a]    [doc_id_b]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
import datamaestro # Supposes experimaestro-ir be installed

docpairs = datamaestro.prepare_dataset('irds.mmarco.pt.train.v1.1.docpairs')
next(docpairs.iter())  # Display the first triplet

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/ru"

Version of msmarco-passage, with documents translated into Russian.

docs
8.8M docs

Language: ru

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/ru")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/ru docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.ru')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/ru/dev"

Version of msmarco-passage/dev, with queries and documents translated into Russian.

queries
101K queries

Language: ru

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/ru/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/ru/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.ru.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/ru

Language: ru

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/ru/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/ru/dev docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.ru.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
59K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant59K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/ru/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/ru/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.ru.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/ru/dev/small"

Version of msmarco-passage/dev/small, with queries and documents translated into Russian.

queries
7.0K queries

Language: ru

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/ru/dev/small")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/ru/dev/small queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.ru.dev.small.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/ru

Language: ru

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/ru/dev/small")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/ru/dev/small docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.ru.dev.small')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
7.4K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant7.4K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/ru/dev/small")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/ru/dev/small qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.ru.dev.small.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

scoreddocs
7.0M scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/ru/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/ru/dev/small scoreddocs --format tsv
[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
import datamaestro # Supposes experimaestro-ir be installed

run = datamaestro.prepare_dataset('irds.mmarco.ru.dev.small.scoreddocs') # AdhocRun
# A run is a generic object, and is specialized into final classes
# e.g. TrecAdhocRun 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocRun

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/ru/train"

Version of msmarco-passage/train, with queries and documents translated into Russian.

queries
809K queries

Language: ru

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/ru/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/ru/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.ru.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/ru

Language: ru

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/ru/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/ru/train docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.ru.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
533K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant533K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/ru/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/ru/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.ru.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

docpairs
40M docpairs
Document Pair type:
GenericDocPair: (namedtuple)
  1. query_id: str
  2. doc_id_a: str
  3. doc_id_b: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/ru/train")
for docpair in dataset.docpairs_iter():
    docpair # namedtuple<query_id, doc_id_a, doc_id_b>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/ru/train docpairs
[query_id]    [doc_id_a]    [doc_id_b]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
import datamaestro # Supposes experimaestro-ir be installed

docpairs = datamaestro.prepare_dataset('irds.mmarco.ru.train.docpairs')
next(docpairs.iter())  # Display the first triplet

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/ar"

Version of msmarco-passage, with queries and documents translated into Arabic.

docs
8.8M docs

Language: ar

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ar")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/ar docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.ar')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/ar/dev"

Version of msmarco-passage/dev, with queries and documents translated into Arabic.

queries
101K queries

Language: ar

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ar/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/ar/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.ar.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/v2/ar

Language: ar

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ar/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/ar/dev docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.ar.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
59K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant59K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ar/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/ar/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.ar.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/ar/dev/small"

Version of msmarco-passage/dev/small, with queries and documents translated into Arabic.

queries
7.0K queries

Language: ar

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ar/dev/small")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/ar/dev/small queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.ar.dev.small.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/v2/ar

Language: ar

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ar/dev/small")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/ar/dev/small docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.ar.dev.small')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
7.4K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant7.4K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ar/dev/small")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/ar/dev/small qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.ar.dev.small.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

scoreddocs
6.8M scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ar/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/ar/dev/small scoreddocs --format tsv
[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
import datamaestro # Supposes experimaestro-ir be installed

run = datamaestro.prepare_dataset('irds.mmarco.v2.ar.dev.small.scoreddocs') # AdhocRun
# A run is a generic object, and is specialized into final classes
# e.g. TrecAdhocRun 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocRun

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/ar/train"

Version of msmarco-passage/train, with queries and documents translated into Arabic.

queries
809K queries

Language: ar

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ar/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/ar/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.ar.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/v2/ar

Language: ar

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ar/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/ar/train docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.ar.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
533K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant533K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ar/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/ar/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.ar.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

docpairs
40M docpairs
Document Pair type:
GenericDocPair: (namedtuple)
  1. query_id: str
  2. doc_id_a: str
  3. doc_id_b: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ar/train")
for docpair in dataset.docpairs_iter():
    docpair # namedtuple<query_id, doc_id_a, doc_id_b>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/ar/train docpairs
[query_id]    [doc_id_a]    [doc_id_b]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
import datamaestro # Supposes experimaestro-ir be installed

docpairs = datamaestro.prepare_dataset('irds.mmarco.v2.ar.train.docpairs')
next(docpairs.iter())  # Display the first triplet

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/de"

Version of msmarco-passage, with queries and documents translated into German.

docs
8.8M docs

Language: de

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/de")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/de docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.de')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/de/dev"

Version of msmarco-passage/dev, with queries and documents translated into German.

queries
101K queries

Language: de

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/de/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/de/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.de.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/v2/de

Language: de

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/de/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/de/dev docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.de.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
59K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant59K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/de/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/de/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.de.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/de/dev/small"

Version of msmarco-passage/dev/small, with queries and documents translated into German.

queries
7.0K queries

Language: de

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/de/dev/small")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/de/dev/small queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.de.dev.small.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/v2/de

Language: de

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/de/dev/small")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/de/dev/small docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.de.dev.small')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
7.4K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant7.4K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/de/dev/small")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/de/dev/small qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.de.dev.small.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

scoreddocs
6.6M scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/de/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/de/dev/small scoreddocs --format tsv
[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
import datamaestro # Supposes experimaestro-ir be installed

run = datamaestro.prepare_dataset('irds.mmarco.v2.de.dev.small.scoreddocs') # AdhocRun
# A run is a generic object, and is specialized into final classes
# e.g. TrecAdhocRun 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocRun

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/de/train"

Version of msmarco-passage/train, with queries and documents translated into German.

queries
809K queries

Language: de

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/de/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/de/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.de.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/v2/de

Language: de

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/de/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/de/train docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.de.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
533K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant533K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/de/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/de/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.de.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

docpairs
40M docpairs
Document Pair type:
GenericDocPair: (namedtuple)
  1. query_id: str
  2. doc_id_a: str
  3. doc_id_b: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/de/train")
for docpair in dataset.docpairs_iter():
    docpair # namedtuple<query_id, doc_id_a, doc_id_b>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/de/train docpairs
[query_id]    [doc_id_a]    [doc_id_b]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
import datamaestro # Supposes experimaestro-ir be installed

docpairs = datamaestro.prepare_dataset('irds.mmarco.v2.de.train.docpairs')
next(docpairs.iter())  # Display the first triplet

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/dt"

Version of msmarco-passage, with queries and documents translated into Dutch.

docs
8.8M docs

Language: dt

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/dt")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/dt docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.dt')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/dt/dev"

Version of msmarco-passage/dev, with queries and documents translated into Dutch.

queries
101K queries

Language: dt

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/dt/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/dt/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.dt.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/v2/dt

Language: dt

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/dt/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/dt/dev docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.dt.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
59K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant59K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/dt/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/dt/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.dt.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/dt/dev/small"

Version of msmarco-passage/dev/small, with queries and documents translated into Dutch.

queries
7.0K queries

Language: dt

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/dt/dev/small")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/dt/dev/small queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.dt.dev.small.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/v2/dt

Language: dt

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/dt/dev/small")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/dt/dev/small docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.dt.dev.small')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
7.4K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant7.4K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/dt/dev/small")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/dt/dev/small qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.dt.dev.small.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

scoreddocs
6.6M scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/dt/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/dt/dev/small scoreddocs --format tsv
[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
import datamaestro # Supposes experimaestro-ir be installed

run = datamaestro.prepare_dataset('irds.mmarco.v2.dt.dev.small.scoreddocs') # AdhocRun
# A run is a generic object, and is specialized into final classes
# e.g. TrecAdhocRun 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocRun

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/dt/train"

Version of msmarco-passage/train, with queries and documents translated into Dutch.

queries
809K queries

Language: dt

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/dt/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/dt/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.dt.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/v2/dt

Language: dt

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/dt/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/dt/train docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.dt.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
533K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant533K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/dt/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/dt/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.dt.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

docpairs
40M docpairs
Document Pair type:
GenericDocPair: (namedtuple)
  1. query_id: str
  2. doc_id_a: str
  3. doc_id_b: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/dt/train")
for docpair in dataset.docpairs_iter():
    docpair # namedtuple<query_id, doc_id_a, doc_id_b>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/dt/train docpairs
[query_id]    [doc_id_a]    [doc_id_b]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
import datamaestro # Supposes experimaestro-ir be installed

docpairs = datamaestro.prepare_dataset('irds.mmarco.v2.dt.train.docpairs')
next(docpairs.iter())  # Display the first triplet

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/es"

Version of msmarco-passage, with queries and documents translated into Spanish.

docs
8.8M docs

Language: es

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/es")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/es docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.es')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/es/dev"

Version of msmarco-passage/dev, with queries and documents translated into Spanish.

queries
101K queries

Language: es

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/es/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/es/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.es.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/v2/es

Language: es

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/es/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/es/dev docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.es.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
59K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant59K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/es/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/es/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.es.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/es/dev/small"

Version of msmarco-passage/dev/small, with queries and documents translated into Spanish.

queries
7.0K queries

Language: es

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/es/dev/small")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/es/dev/small queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.es.dev.small.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/v2/es

Language: es

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/es/dev/small")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/es/dev/small docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.es.dev.small')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
7.4K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant7.4K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/es/dev/small")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/es/dev/small qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.es.dev.small.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

scoreddocs
6.8M scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/es/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/es/dev/small scoreddocs --format tsv
[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
import datamaestro # Supposes experimaestro-ir be installed

run = datamaestro.prepare_dataset('irds.mmarco.v2.es.dev.small.scoreddocs') # AdhocRun
# A run is a generic object, and is specialized into final classes
# e.g. TrecAdhocRun 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocRun

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/es/train"

Version of msmarco-passage/train, with queries and documents translated into Spanish.

queries
809K queries

Language: es

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/es/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/es/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.es.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/v2/es

Language: es

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/es/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/es/train docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.es.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
533K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant533K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/es/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/es/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.es.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

docpairs
40M docpairs
Document Pair type:
GenericDocPair: (namedtuple)
  1. query_id: str
  2. doc_id_a: str
  3. doc_id_b: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/es/train")
for docpair in dataset.docpairs_iter():
    docpair # namedtuple<query_id, doc_id_a, doc_id_b>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/es/train docpairs
[query_id]    [doc_id_a]    [doc_id_b]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
import datamaestro # Supposes experimaestro-ir be installed

docpairs = datamaestro.prepare_dataset('irds.mmarco.v2.es.train.docpairs')
next(docpairs.iter())  # Display the first triplet

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/fr"

Version of msmarco-passage, with queries and documents translated into French.

docs
8.8M docs

Language: fr

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/fr")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/fr docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.fr')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/fr/dev"

Version of msmarco-passage/dev, with queries and documents translated into French.

queries
101K queries

Language: fr

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/fr/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/fr/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.fr.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/v2/fr

Language: fr

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/fr/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/fr/dev docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.fr.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
59K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant59K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/fr/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/fr/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.fr.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/fr/dev/small"

Version of msmarco-passage/dev/small, with queries and documents translated into French.

queries
7.0K queries

Language: fr

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/fr/dev/small")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/fr/dev/small queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.fr.dev.small.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/v2/fr

Language: fr

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/fr/dev/small")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/fr/dev/small docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.fr.dev.small')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
7.4K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant7.4K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/fr/dev/small")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/fr/dev/small qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.fr.dev.small.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

scoreddocs
6.8M scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/fr/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/fr/dev/small scoreddocs --format tsv
[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
import datamaestro # Supposes experimaestro-ir be installed

run = datamaestro.prepare_dataset('irds.mmarco.v2.fr.dev.small.scoreddocs') # AdhocRun
# A run is a generic object, and is specialized into final classes
# e.g. TrecAdhocRun 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocRun

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/fr/train"

Version of msmarco-passage/train, with queries and documents translated into French.

queries
809K queries

Language: fr

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/fr/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/fr/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.fr.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/v2/fr

Language: fr

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/fr/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/fr/train docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.fr.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
533K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant533K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/fr/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/fr/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.fr.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

docpairs
40M docpairs
Document Pair type:
GenericDocPair: (namedtuple)
  1. query_id: str
  2. doc_id_a: str
  3. doc_id_b: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/fr/train")
for docpair in dataset.docpairs_iter():
    docpair # namedtuple<query_id, doc_id_a, doc_id_b>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/fr/train docpairs
[query_id]    [doc_id_a]    [doc_id_b]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
import datamaestro # Supposes experimaestro-ir be installed

docpairs = datamaestro.prepare_dataset('irds.mmarco.v2.fr.train.docpairs')
next(docpairs.iter())  # Display the first triplet

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/hi"

Version of msmarco-passage, with queries and documents translated into Hindi.

docs
8.8M docs

Language: hi

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/hi")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/hi docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.hi')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/hi/dev"

Version of msmarco-passage/dev, with queries and documents translated into Hindi.

queries
101K queries

Language: hi

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/hi/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/hi/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.hi.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/v2/hi

Language: hi

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/hi/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/hi/dev docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.hi.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
59K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant59K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/hi/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/hi/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.hi.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/hi/dev/small"

Version of msmarco-passage/dev/small, with queries and documents translated into Hindi.

queries
7.0K queries

Language: hi

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/hi/dev/small")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/hi/dev/small queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.hi.dev.small.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/v2/hi

Language: hi

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/hi/dev/small")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/hi/dev/small docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.hi.dev.small')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
7.4K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant7.4K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/hi/dev/small")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/hi/dev/small qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.hi.dev.small.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

scoreddocs
7.0M scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/hi/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/hi/dev/small scoreddocs --format tsv
[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
import datamaestro # Supposes experimaestro-ir be installed

run = datamaestro.prepare_dataset('irds.mmarco.v2.hi.dev.small.scoreddocs') # AdhocRun
# A run is a generic object, and is specialized into final classes
# e.g. TrecAdhocRun 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocRun

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/hi/train"

Version of msmarco-passage/train, with queries and documents translated into Hindi.

queries
809K queries

Language: hi

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/hi/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/hi/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.hi.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/v2/hi

Language: hi

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/hi/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/hi/train docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.hi.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
533K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant533K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/hi/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/hi/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.hi.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

docpairs
40M docpairs
Document Pair type:
GenericDocPair: (namedtuple)
  1. query_id: str
  2. doc_id_a: str
  3. doc_id_b: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/hi/train")
for docpair in dataset.docpairs_iter():
    docpair # namedtuple<query_id, doc_id_a, doc_id_b>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/hi/train docpairs
[query_id]    [doc_id_a]    [doc_id_b]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
import datamaestro # Supposes experimaestro-ir be installed

docpairs = datamaestro.prepare_dataset('irds.mmarco.v2.hi.train.docpairs')
next(docpairs.iter())  # Display the first triplet

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/id"

Version of msmarco-passage, with queries and documents translated into Indonesian.

docs
8.8M docs

Language: id

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/id")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/id docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.id')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/id/dev"

Version of msmarco-passage/dev, with queries and documents translated into Indonesian.

queries
101K queries

Language: id

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/id/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/id/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.id.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/v2/id

Language: id

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/id/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/id/dev docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.id.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
59K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant59K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/id/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/id/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.id.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/id/dev/small"

Version of msmarco-passage/dev/small, with queries and documents translated into Indonesian.

queries
7.0K queries

Language: id

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/id/dev/small")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/id/dev/small queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.id.dev.small.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/v2/id

Language: id

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/id/dev/small")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/id/dev/small docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.id.dev.small')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
7.4K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant7.4K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/id/dev/small")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/id/dev/small qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.id.dev.small.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

scoreddocs
6.8M scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/id/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/id/dev/small scoreddocs --format tsv
[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
import datamaestro # Supposes experimaestro-ir be installed

run = datamaestro.prepare_dataset('irds.mmarco.v2.id.dev.small.scoreddocs') # AdhocRun
# A run is a generic object, and is specialized into final classes
# e.g. TrecAdhocRun 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocRun

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/id/train"

Version of msmarco-passage/train, with queries and documents translated into Indonesian.

queries
809K queries

Language: id

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/id/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/id/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.id.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/v2/id

Language: id

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/id/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/id/train docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.id.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
533K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant533K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/id/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/id/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.id.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

docpairs
40M docpairs
Document Pair type:
GenericDocPair: (namedtuple)
  1. query_id: str
  2. doc_id_a: str
  3. doc_id_b: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/id/train")
for docpair in dataset.docpairs_iter():
    docpair # namedtuple<query_id, doc_id_a, doc_id_b>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/id/train docpairs
[query_id]    [doc_id_a]    [doc_id_b]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
import datamaestro # Supposes experimaestro-ir be installed

docpairs = datamaestro.prepare_dataset('irds.mmarco.v2.id.train.docpairs')
next(docpairs.iter())  # Display the first triplet

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/it"

Version of msmarco-passage, with queries and documents translated into Italian.

docs
8.8M docs

Language: it

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/it")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/it docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.it')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/it/dev"

Version of msmarco-passage/dev, with queries and documents translated into Italian.

queries
101K queries

Language: it

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/it/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/it/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.it.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/v2/it

Language: it

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/it/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/it/dev docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.it.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
59K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant59K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/it/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/it/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.it.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/it/dev/small"

Version of msmarco-passage/dev/small, with queries and documents translated into Italian.

queries
7.0K queries

Language: it

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/it/dev/small")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/it/dev/small queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.it.dev.small.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/v2/it

Language: it

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/it/dev/small")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/it/dev/small docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.it.dev.small')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
7.4K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant7.4K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/it/dev/small")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/it/dev/small qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.it.dev.small.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

scoreddocs
7.0M scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/it/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/it/dev/small scoreddocs --format tsv
[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
import datamaestro # Supposes experimaestro-ir be installed

run = datamaestro.prepare_dataset('irds.mmarco.v2.it.dev.small.scoreddocs') # AdhocRun
# A run is a generic object, and is specialized into final classes
# e.g. TrecAdhocRun 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocRun

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/it/train"

Version of msmarco-passage/train, with queries and documents translated into Italian.

queries
809K queries

Language: it

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/it/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/it/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.it.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/v2/it

Language: it

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/it/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/it/train docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.it.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
533K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant533K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/it/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/it/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.it.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

docpairs
40M docpairs
Document Pair type:
GenericDocPair: (namedtuple)
  1. query_id: str
  2. doc_id_a: str
  3. doc_id_b: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/it/train")
for docpair in dataset.docpairs_iter():
    docpair # namedtuple<query_id, doc_id_a, doc_id_b>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/it/train docpairs
[query_id]    [doc_id_a]    [doc_id_b]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
import datamaestro # Supposes experimaestro-ir be installed

docpairs = datamaestro.prepare_dataset('irds.mmarco.v2.it.train.docpairs')
next(docpairs.iter())  # Display the first triplet

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/ja"

Version of msmarco-passage, with queries and documents translated into Japanese.

docs
8.8M docs

Language: ja

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ja")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/ja docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.ja')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/ja/dev"

Version of msmarco-passage/dev, with queries and documents translated into Japanese.

queries
101K queries

Language: ja

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ja/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/ja/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.ja.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/v2/ja

Language: ja

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ja/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/ja/dev docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.ja.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
59K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant59K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ja/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/ja/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.ja.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/ja/dev/small"

Version of msmarco-passage/dev/small, with queries and documents translated into Japanese.

queries
7.0K queries

Language: ja

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ja/dev/small")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/ja/dev/small queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.ja.dev.small.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/v2/ja

Language: ja

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ja/dev/small")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/ja/dev/small docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.ja.dev.small')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
7.4K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant7.4K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ja/dev/small")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/ja/dev/small qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.ja.dev.small.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

scoreddocs
6.8M scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ja/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/ja/dev/small scoreddocs --format tsv
[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
import datamaestro # Supposes experimaestro-ir be installed

run = datamaestro.prepare_dataset('irds.mmarco.v2.ja.dev.small.scoreddocs') # AdhocRun
# A run is a generic object, and is specialized into final classes
# e.g. TrecAdhocRun 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocRun

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/ja/train"

Version of msmarco-passage/train, with queries and documents translated into Japanese.

queries
809K queries

Language: ja

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ja/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/ja/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.ja.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/v2/ja

Language: ja

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ja/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/ja/train docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.ja.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
533K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant533K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ja/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/ja/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.ja.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

docpairs
40M docpairs
Document Pair type:
GenericDocPair: (namedtuple)
  1. query_id: str
  2. doc_id_a: str
  3. doc_id_b: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ja/train")
for docpair in dataset.docpairs_iter():
    docpair # namedtuple<query_id, doc_id_a, doc_id_b>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/ja/train docpairs
[query_id]    [doc_id_a]    [doc_id_b]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
import datamaestro # Supposes experimaestro-ir be installed

docpairs = datamaestro.prepare_dataset('irds.mmarco.v2.ja.train.docpairs')
next(docpairs.iter())  # Display the first triplet

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/pt"

Version of msmarco-passage, with queries and documents translated into Portuguese.

docs
8.8M docs

Language: pt

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/pt")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/pt docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.pt')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/pt/dev"

Version of msmarco-passage/dev, with queries and documents translated into Portuguese.

queries
101K queries

Language: pt

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/pt/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/pt/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.pt.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/v2/pt

Language: pt

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/pt/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/pt/dev docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.pt.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
59K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant59K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/pt/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/pt/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.pt.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/pt/dev/small"

Version of msmarco-passage/dev/small, with queries and documents translated into Portuguese.

queries
7.0K queries

Language: pt

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/pt/dev/small")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/pt/dev/small queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.pt.dev.small.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/v2/pt

Language: pt

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/pt/dev/small")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/pt/dev/small docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.pt.dev.small')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
7.4K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant7.4K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/pt/dev/small")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/pt/dev/small qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.pt.dev.small.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

scoreddocs
7.0M scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/pt/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/pt/dev/small scoreddocs --format tsv
[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
import datamaestro # Supposes experimaestro-ir be installed

run = datamaestro.prepare_dataset('irds.mmarco.v2.pt.dev.small.scoreddocs') # AdhocRun
# A run is a generic object, and is specialized into final classes
# e.g. TrecAdhocRun 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocRun

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/pt/train"

Version of msmarco-passage/train, with queries and documents translated into Portuguese.

queries
809K queries

Language: pt

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/pt/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/pt/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.pt.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/v2/pt

Language: pt

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/pt/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/pt/train docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.pt.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
533K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant533K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/pt/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/pt/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.pt.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

docpairs
40M docpairs
Document Pair type:
GenericDocPair: (namedtuple)
  1. query_id: str
  2. doc_id_a: str
  3. doc_id_b: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/pt/train")
for docpair in dataset.docpairs_iter():
    docpair # namedtuple<query_id, doc_id_a, doc_id_b>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/pt/train docpairs
[query_id]    [doc_id_a]    [doc_id_b]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
import datamaestro # Supposes experimaestro-ir be installed

docpairs = datamaestro.prepare_dataset('irds.mmarco.v2.pt.train.docpairs')
next(docpairs.iter())  # Display the first triplet

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/ru"

Version of msmarco-passage, with queries and documents translated into Russian.

docs
8.8M docs

Language: ru

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ru")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/ru docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.ru')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/ru/dev"

Version of msmarco-passage/dev, with queries and documents translated into Russian.

queries
101K queries

Language: ru

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ru/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/ru/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.ru.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/v2/ru

Language: ru

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ru/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/ru/dev docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.ru.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
59K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant59K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ru/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/ru/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.ru.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/ru/dev/small"

Version of msmarco-passage/dev/small, with queries and documents translated into Russian.

queries
7.0K queries

Language: ru

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ru/dev/small")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/ru/dev/small queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.ru.dev.small.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/v2/ru

Language: ru

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ru/dev/small")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/ru/dev/small docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.ru.dev.small')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
7.4K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant7.4K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ru/dev/small")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/ru/dev/small qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.ru.dev.small.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

scoreddocs
6.9M scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ru/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/ru/dev/small scoreddocs --format tsv
[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
import datamaestro # Supposes experimaestro-ir be installed

run = datamaestro.prepare_dataset('irds.mmarco.v2.ru.dev.small.scoreddocs') # AdhocRun
# A run is a generic object, and is specialized into final classes
# e.g. TrecAdhocRun 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocRun

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/ru/train"

Version of msmarco-passage/train, with queries and documents translated into Russian.

queries
809K queries

Language: ru

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ru/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/ru/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.ru.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/v2/ru

Language: ru

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ru/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/ru/train docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.ru.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
533K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant533K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ru/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/ru/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.ru.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

docpairs
40M docpairs
Document Pair type:
GenericDocPair: (namedtuple)
  1. query_id: str
  2. doc_id_a: str
  3. doc_id_b: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ru/train")
for docpair in dataset.docpairs_iter():
    docpair # namedtuple<query_id, doc_id_a, doc_id_b>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/ru/train docpairs
[query_id]    [doc_id_a]    [doc_id_b]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
import datamaestro # Supposes experimaestro-ir be installed

docpairs = datamaestro.prepare_dataset('irds.mmarco.v2.ru.train.docpairs')
next(docpairs.iter())  # Display the first triplet

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/vi"

Version of msmarco-passage, with queries and documents translated into Vietnamese.

docs
8.8M docs

Language: vi

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/vi")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/vi docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.vi')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/vi/dev"

Version of msmarco-passage/dev, with queries and documents translated into Vietnamese.

queries
101K queries

Language: vi

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/vi/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/vi/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.vi.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/v2/vi

Language: vi

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/vi/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/vi/dev docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.vi.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
59K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant59K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/vi/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/vi/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.vi.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/vi/dev/small"

Version of msmarco-passage/dev/small, with queries and documents translated into Vietnamese.

queries
7.0K queries

Language: vi

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/vi/dev/small")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/vi/dev/small queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.vi.dev.small.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/v2/vi

Language: vi

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/vi/dev/small")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/vi/dev/small docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.vi.dev.small')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
7.4K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant7.4K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/vi/dev/small")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/vi/dev/small qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.vi.dev.small.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

scoreddocs
7.0M scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/vi/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/vi/dev/small scoreddocs --format tsv
[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
import datamaestro # Supposes experimaestro-ir be installed

run = datamaestro.prepare_dataset('irds.mmarco.v2.vi.dev.small.scoreddocs') # AdhocRun
# A run is a generic object, and is specialized into final classes
# e.g. TrecAdhocRun 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocRun

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/vi/train"

Version of msmarco-passage/train, with queries and documents translated into Vietnamese.

queries
809K queries

Language: vi

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/vi/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/vi/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.vi.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/v2/vi

Language: vi

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/vi/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/vi/train docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.vi.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
533K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant533K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/vi/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/vi/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.vi.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

docpairs
40M docpairs
Document Pair type:
GenericDocPair: (namedtuple)
  1. query_id: str
  2. doc_id_a: str
  3. doc_id_b: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/vi/train")
for docpair in dataset.docpairs_iter():
    docpair # namedtuple<query_id, doc_id_a, doc_id_b>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/vi/train docpairs
[query_id]    [doc_id_a]    [doc_id_b]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
import datamaestro # Supposes experimaestro-ir be installed

docpairs = datamaestro.prepare_dataset('irds.mmarco.v2.vi.train.docpairs')
next(docpairs.iter())  # Display the first triplet

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/zh"

Version of msmarco-passage, with queries and documents translated into Chinese.

docs
8.8M docs

Language: zh

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/zh")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/zh docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.zh')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/zh/dev"

Version of msmarco-passage/dev, with queries and documents translated into Chinese.

queries
101K queries

Language: zh

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/zh/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/zh/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.zh.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/v2/zh

Language: zh

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/zh/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/zh/dev docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.zh.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
59K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant59K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/zh/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/zh/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.zh.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/zh/dev/small"

Version of msmarco-passage/dev/small, with queries and documents translated into Chinese.

queries
7.0K queries

Language: zh

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/zh/dev/small")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/zh/dev/small queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.zh.dev.small.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/v2/zh

Language: zh

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/zh/dev/small")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/zh/dev/small docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.zh.dev.small')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
7.4K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant7.4K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/zh/dev/small")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/zh/dev/small qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.zh.dev.small.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

scoreddocs
7.0M scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/zh/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/zh/dev/small scoreddocs --format tsv
[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
import datamaestro # Supposes experimaestro-ir be installed

run = datamaestro.prepare_dataset('irds.mmarco.v2.zh.dev.small.scoreddocs') # AdhocRun
# A run is a generic object, and is specialized into final classes
# e.g. TrecAdhocRun 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocRun

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/v2/zh/train"

Version of msmarco-passage/train, with queries and documents translated into Chinese.

queries
809K queries

Language: zh

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/zh/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/zh/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.zh.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/v2/zh

Language: zh

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/zh/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/zh/train docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.zh.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
533K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant533K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/zh/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/zh/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.zh.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

docpairs
40M docpairs
Document Pair type:
GenericDocPair: (namedtuple)
  1. query_id: str
  2. doc_id_a: str
  3. doc_id_b: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/zh/train")
for docpair in dataset.docpairs_iter():
    docpair # namedtuple<query_id, doc_id_a, doc_id_b>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/v2/zh/train docpairs
[query_id]    [doc_id_a]    [doc_id_b]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
import datamaestro # Supposes experimaestro-ir be installed

docpairs = datamaestro.prepare_dataset('irds.mmarco.v2.zh.train.docpairs')
next(docpairs.iter())  # Display the first triplet

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/zh"

Version of msmarco-passage, with documents translated into Chinese.

docs
8.8M docs

Language: zh

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/zh")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/zh docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.zh')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/zh/dev"

Version of msmarco-passage/dev, with queries and documents translated into Chinese.

queries
101K queries

Language: zh

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/zh/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.zh.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/zh

Language: zh

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/zh/dev docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.zh.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
59K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant59K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/zh/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.zh.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/zh/dev/small"

Version of msmarco-passage/dev/small, with queries and documents translated into Chinese.

queries
7.0K queries

Language: zh

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/dev/small")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/zh/dev/small queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.zh.dev.small.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/zh

Language: zh

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/dev/small")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/zh/dev/small docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.zh.dev.small')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
7.4K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant7.4K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/dev/small")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/zh/dev/small qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.zh.dev.small.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/zh/dev/small/v1.1"

Version of msmarco-passage/dev, with queries and documents translated into Chinese.

Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here.

queries
7.0K queries

Language: zh

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/dev/small/v1.1")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/zh/dev/small/v1.1 queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.zh.dev.small.v1.1.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/zh

Language: zh

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/dev/small/v1.1")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/zh/dev/small/v1.1 docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.zh.dev.small.v1.1')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
7.4K qrels

Inherits qrels from mmarco/zh/dev/small

Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant7.4K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/dev/small/v1.1")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/zh/dev/small/v1.1 qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.zh.dev.small.v1.1.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

scoreddocs
1.0M scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/dev/small/v1.1")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/zh/dev/small/v1.1 scoreddocs --format tsv
[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
import datamaestro # Supposes experimaestro-ir be installed

run = datamaestro.prepare_dataset('irds.mmarco.zh.dev.small.v1.1.scoreddocs') # AdhocRun
# A run is a generic object, and is specialized into final classes
# e.g. TrecAdhocRun 

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocRun

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/zh/dev/v1.1"

Version of msmarco-passage/dev, with queries and documents translated into Chinese.

Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here.

queries
101K queries

Language: zh

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/dev/v1.1")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/zh/dev/v1.1 queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.zh.dev.v1.1.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/zh

Language: zh

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/dev/v1.1")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/zh/dev/v1.1 docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.zh.dev.v1.1')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
59K qrels

Inherits qrels from mmarco/zh/dev

Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant59K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/dev/v1.1")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/zh/dev/v1.1 qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.zh.dev.v1.1.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata

"mmarco/zh/train"

Version of msmarco-passage/train, with queries and documents translated into Chinese.

queries
809K queries

Language: zh

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/zh/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.zh.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.8M docs

Inherits docs from mmarco/zh

Language: zh

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/zh/train docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.zh.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
533K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant533K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/zh/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.zh.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

docpairs
40M docpairs
Document Pair type:
GenericDocPair: (namedtuple)
  1. query_id: str
  2. doc_id_a: str
  3. doc_id_b: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/train")
for docpair in dataset.docpairs_iter():
    docpair # namedtuple<query_id, doc_id_a, doc_id_b>

You can find more details about the Python API here.

CLI
ir_datasets export mmarco/zh/train docpairs
[query_id]    [doc_id_a]    [doc_id_b]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
import datamaestro # Supposes experimaestro-ir be installed

docpairs = datamaestro.prepare_dataset('irds.mmarco.zh.train.docpairs')
next(docpairs.iter())  # Display the first triplet

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets

Citation

ir_datasets.bib:

\cite{Bonifacio2021MMarco}

Bibtex:

@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }
Metadata