ir_datasets
: mMARCOA version of the MS MARCO passage dataset (msmarco-passage) with the queries and documents automatically translated into several languages.
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }Version of msmarco-passage, with documents translated into German.
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/de")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into German.
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/de/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/de/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/de
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/de/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/de/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/de/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/de/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/train, with queries and documents translated into German.
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/de/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/de/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/de
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/de/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/de/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/de/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/de/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } } }
Version of msmarco-passage, with documents translated into Spanish.
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/es")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into Spanish.
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/es/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/es/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/es
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/es/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/es/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/es/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/es/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101092 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/train, with queries and documents translated into Spanish.
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/es/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/es/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/es
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/es/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/es/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/es/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/es/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } } }
Version of msmarco-passage, with documents translated into French.
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/fr")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into French.
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/fr/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/fr/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/fr
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/fr/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/fr/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/fr/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/fr/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/train, with queries and documents translated into French.
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/fr/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/fr/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/fr
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/fr/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/fr/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/fr/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/fr/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } } }
Version of msmarco-passage, with documents translated into Indonesian.
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/id")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into Indonesian.
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/id/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/id/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/id
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/id/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/id/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/id/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/id/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/train, with queries and documents translated into Indonesian.
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/id/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/id/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/id
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/id/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/id/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/id/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/id/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } } }
Version of msmarco-passage, with documents translated into Italian.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/it")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into Italian.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/it/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/it/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/it
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/it/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/it/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/it/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/it/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/train, with queries and documents translated into Italian.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/it/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/it/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/it
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/it/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/it/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/it/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/it/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } } }
Version of msmarco-passage, with documents translated into Portuguese.
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into Portuguese.
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/pt
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101619 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/train, with queries and documents translated into Portuguese.
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/pt
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 811690 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } } }
Version of msmarco-passage, with documents translated into Russian.
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/ru")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into Russian.
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/ru/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/ru/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/ru
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/ru/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/ru/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/ru/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/ru/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/train, with queries and documents translated into Russian.
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/ru/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/ru/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/ru
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/ru/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/ru/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/ru/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/ru/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } } }
Version of msmarco-passage, with documents translated into Chinese.
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/zh")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into Chinese.
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/zh/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/zh
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/zh/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/zh/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/train, with queries and documents translated into Chinese.
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/zh/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/zh
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/zh/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/zh/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } } }