ir_datasets
: mMARCOA version of the MS MARCO passage dataset (msmarco-passage) with the queries and documents automatically translated into several languages.
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }Version of msmarco-passage, with documents translated into German.
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/de")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into German.
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/de/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/de/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/de
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/de/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/de/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/de/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/de/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/dev/small, with queries and documents translated into German.
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/de/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/de/dev/small queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/de
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/de/dev/small")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/de/dev/small docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/de/dev/small")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/de/dev/small qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/de/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export mmarco/de/dev/small scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 6980 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } }, "scoreddocs": { "count": 6594126 } }
Version of msmarco-passage/train, with queries and documents translated into German.
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/de/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/de/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/de
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/de/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/de/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/de/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/de/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/de/train")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export mmarco/de/train docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } }, "docpairs": { "count": 39780811 } }
Version of msmarco-passage, with documents translated into Spanish.
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/es")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into Spanish.
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/es/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/es/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/es
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/es/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/es/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/es/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/es/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101092 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/dev/small, with queries and documents translated into Spanish.
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/es/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/es/dev/small queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/es
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/es/dev/small")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/es/dev/small docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/es/dev/small")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/es/dev/small qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/es/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export mmarco/es/dev/small scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 6980 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } }, "scoreddocs": { "count": 6786720 } }
Version of msmarco-passage/train, with queries and documents translated into Spanish.
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/es/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/es/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/es
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/es/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/es/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/es/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/es/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/es/train")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export mmarco/es/train docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } }, "docpairs": { "count": 39780811 } }
Version of msmarco-passage, with documents translated into French.
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/fr")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into French.
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/fr/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/fr/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/fr
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/fr/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/fr/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/fr/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/fr/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/dev/small, with queries and documents translated into French.
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/fr/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/fr/dev/small queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/fr
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/fr/dev/small")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/fr/dev/small docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/fr/dev/small")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/fr/dev/small qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/fr/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export mmarco/fr/dev/small scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 6980 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } }, "scoreddocs": { "count": 6785763 } }
Version of msmarco-passage/train, with queries and documents translated into French.
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/fr/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/fr/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/fr
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/fr/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/fr/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/fr/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/fr/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/fr/train")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export mmarco/fr/train docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } }, "docpairs": { "count": 39780811 } }
Version of msmarco-passage, with documents translated into Indonesian.
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/id")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into Indonesian.
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/id/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/id/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/id
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/id/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/id/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/id/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/id/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/dev/small, with queries and documents translated into Indonesian.
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/id/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/id/dev/small queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/id
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/id/dev/small")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/id/dev/small docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/id/dev/small")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/id/dev/small qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/id/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export mmarco/id/dev/small scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 6980 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } }, "scoreddocs": { "count": 6841990 } }
Version of msmarco-passage/train, with queries and documents translated into Indonesian.
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/id/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/id/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/id
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/id/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/id/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/id/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/id/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/id/train")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export mmarco/id/train docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } }, "docpairs": { "count": 39780811 } }
Version of msmarco-passage, with documents translated into Italian.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/it")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into Italian.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/it/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/it/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/it
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/it/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/it/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/it/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/it/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/dev/small, with queries and documents translated into Italian.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/it/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/it/dev/small queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/it
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/it/dev/small")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/it/dev/small docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/it/dev/small")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/it/dev/small qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/it/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export mmarco/it/dev/small scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 6980 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } }, "scoreddocs": { "count": 6966491 } }
Version of msmarco-passage/train, with queries and documents translated into Italian.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/it/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/it/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/it
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/it/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/it/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/it/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/it/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/it/train")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export mmarco/it/train docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } }, "docpairs": { "count": 39780811 } }
Version of msmarco-passage, with documents translated into Portuguese.
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into Portuguese.
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/pt
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101619 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/dev/small, with queries and documents translated into Portuguese.
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/dev/small queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/pt
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/dev/small")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/dev/small docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/dev/small")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/dev/small qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 7000 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } } }
Version of msmarco-passage/dev, with queries and documents translated into Portuguese.
Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here. It also removes some duplicated query IDs.
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/dev/small/v1.1")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/dev/small/v1.1 queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/pt
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/dev/small/v1.1")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/dev/small/v1.1 docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits qrels from mmarco/pt/dev/small
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/dev/small/v1.1")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/dev/small/v1.1 qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/dev/small/v1.1")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/dev/small/v1.1 scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 6980 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } }, "scoreddocs": { "count": 6976324 } }
Version of msmarco-passage/dev, with queries and documents translated into Portuguese.
Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here. It also removes some duplicated query IDs.
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/dev/v1.1")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/dev/v1.1 queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/pt
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/dev/v1.1")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/dev/v1.1 docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits qrels from mmarco/pt/dev
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/dev/v1.1")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/dev/v1.1 qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/train, with queries and documents translated into Portuguese.
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/pt
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/train")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/train docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 811690 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } }, "docpairs": { "count": 39780811 } }
Version of msmarco-passage/train, with queries and documents translated into Portuguese.
Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here. It also removes some duplicated query IDs.
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/train/v1.1")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/train/v1.1 queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/pt
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/train/v1.1")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/train/v1.1 docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits qrels from mmarco/pt/train
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/train/v1.1")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/train/v1.1 qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docpairs from mmarco/pt/train
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/train/v1.1")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/train/v1.1 docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } }, "docpairs": { "count": 39780811 } }
Version of msmarco-passage, with documents translated into Russian.
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/ru")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into Russian.
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/ru/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/ru/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/ru
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/ru/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/ru/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/ru/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/ru/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/dev/small, with queries and documents translated into Russian.
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/ru/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/ru/dev/small queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/ru
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/ru/dev/small")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/ru/dev/small docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/ru/dev/small")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/ru/dev/small qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/ru/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export mmarco/ru/dev/small scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 6980 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } }, "scoreddocs": { "count": 6958739 } }
Version of msmarco-passage/train, with queries and documents translated into Russian.
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/ru/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/ru/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/ru
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/ru/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/ru/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/ru/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/ru/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/ru/train")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export mmarco/ru/train docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } }, "docpairs": { "count": 39780811 } }
Version of msmarco-passage, with queries and documents translated into Arabic.
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ar")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ar docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into Arabic.
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ar/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ar/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/v2/ar
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ar/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ar/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ar/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ar/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/dev/small, with queries and documents translated into Arabic.
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ar/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ar/dev/small queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/v2/ar
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ar/dev/small")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ar/dev/small docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ar/dev/small")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ar/dev/small qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ar/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ar/dev/small scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 6980 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } }, "scoreddocs": { "count": 6848687 } }
Version of msmarco-passage/train, with queries and documents translated into Arabic.
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ar/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ar/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/v2/ar
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ar/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ar/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ar/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ar/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ar/train")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ar/train docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } }, "docpairs": { "count": 39780811 } }
Version of msmarco-passage, with queries and documents translated into German.
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/de")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/de docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into German.
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/de/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/de/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/v2/de
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/de/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/de/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/de/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/de/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/dev/small, with queries and documents translated into German.
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/de/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/de/dev/small queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/v2/de
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/de/dev/small")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/de/dev/small docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/de/dev/small")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/de/dev/small qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/de/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/de/dev/small scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 6980 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } }, "scoreddocs": { "count": 6586918 } }
Version of msmarco-passage/train, with queries and documents translated into German.
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/de/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/de/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/v2/de
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/de/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/de/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/de/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/de/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/de/train")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/de/train docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } }, "docpairs": { "count": 39780811 } }
Version of msmarco-passage, with queries and documents translated into Dutch.
Language: dt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/dt")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/dt docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into Dutch.
Language: dt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/dt/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/dt/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/v2/dt
Language: dt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/dt/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/dt/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/dt/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/dt/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/dev/small, with queries and documents translated into Dutch.
Language: dt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/dt/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/dt/dev/small queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/v2/dt
Language: dt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/dt/dev/small")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/dt/dev/small docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/dt/dev/small")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/dt/dev/small qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/dt/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/dt/dev/small scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 6980 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } }, "scoreddocs": { "count": 6608183 } }
Version of msmarco-passage/train, with queries and documents translated into Dutch.
Language: dt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/dt/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/dt/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/v2/dt
Language: dt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/dt/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/dt/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/dt/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/dt/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/dt/train")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/dt/train docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } }, "docpairs": { "count": 39780811 } }
Version of msmarco-passage, with queries and documents translated into Spanish.
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/es")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/es docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into Spanish.
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/es/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/es/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/v2/es
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/es/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/es/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/es/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/es/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/dev/small, with queries and documents translated into Spanish.
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/es/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/es/dev/small queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/v2/es
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/es/dev/small")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/es/dev/small docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/es/dev/small")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/es/dev/small qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/es/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/es/dev/small scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 6980 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } }, "scoreddocs": { "count": 6777044 } }
Version of msmarco-passage/train, with queries and documents translated into Spanish.
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/es/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/es/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/v2/es
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/es/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/es/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/es/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/es/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/es/train")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/es/train docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } }, "docpairs": { "count": 39780811 } }
Version of msmarco-passage, with queries and documents translated into French.
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/fr")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/fr docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into French.
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/fr/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/fr/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/v2/fr
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/fr/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/fr/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/fr/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/fr/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/dev/small, with queries and documents translated into French.
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/fr/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/fr/dev/small queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/v2/fr
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/fr/dev/small")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/fr/dev/small docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/fr/dev/small")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/fr/dev/small qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/fr/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/fr/dev/small scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 6980 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } }, "scoreddocs": { "count": 6831783 } }
Version of msmarco-passage/train, with queries and documents translated into French.
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/fr/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/fr/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/v2/fr
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/fr/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/fr/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/fr/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/fr/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/fr/train")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/fr/train docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } }, "docpairs": { "count": 39780811 } }
Version of msmarco-passage, with queries and documents translated into Hindi.
Language: hi
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/hi")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/hi docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into Hindi.
Language: hi
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/hi/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/hi/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/v2/hi
Language: hi
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/hi/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/hi/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/hi/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/hi/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/dev/small, with queries and documents translated into Hindi.
Language: hi
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/hi/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/hi/dev/small queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/v2/hi
Language: hi
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/hi/dev/small")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/hi/dev/small docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/hi/dev/small")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/hi/dev/small qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/hi/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/hi/dev/small scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 6980 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } }, "scoreddocs": { "count": 6961912 } }
Version of msmarco-passage/train, with queries and documents translated into Hindi.
Language: hi
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/hi/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/hi/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/v2/hi
Language: hi
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/hi/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/hi/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/hi/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/hi/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/hi/train")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/hi/train docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } }, "docpairs": { "count": 39780811 } }
Version of msmarco-passage, with queries and documents translated into Indonesian.
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/id")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/id docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into Indonesian.
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/id/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/id/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/v2/id
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/id/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/id/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/id/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/id/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/dev/small, with queries and documents translated into Indonesian.
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/id/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/id/dev/small queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/v2/id
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/id/dev/small")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/id/dev/small docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/id/dev/small")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/id/dev/small qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/id/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/id/dev/small scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 6980 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } }, "scoreddocs": { "count": 6791487 } }
Version of msmarco-passage/train, with queries and documents translated into Indonesian.
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/id/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/id/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/v2/id
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/id/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/id/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/id/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/id/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/id/train")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/id/train docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } }, "docpairs": { "count": 39780811 } }
Version of msmarco-passage, with queries and documents translated into Italian.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/it")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/it docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into Italian.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/it/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/it/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/v2/it
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/it/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/it/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/it/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/it/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/dev/small, with queries and documents translated into Italian.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/it/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/it/dev/small queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/v2/it
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/it/dev/small")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/it/dev/small docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/it/dev/small")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/it/dev/small qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/it/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/it/dev/small scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 6980 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } }, "scoreddocs": { "count": 6952771 } }
Version of msmarco-passage/train, with queries and documents translated into Italian.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/it/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/it/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/v2/it
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/it/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/it/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/it/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/it/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/it/train")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/it/train docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } }, "docpairs": { "count": 39780811 } }
Version of msmarco-passage, with queries and documents translated into Japanese.
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ja")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ja docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into Japanese.
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ja/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ja/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/v2/ja
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ja/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ja/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ja/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ja/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/dev/small, with queries and documents translated into Japanese.
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ja/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ja/dev/small queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/v2/ja
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ja/dev/small")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ja/dev/small docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ja/dev/small")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ja/dev/small qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ja/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ja/dev/small scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 6980 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } }, "scoreddocs": { "count": 6817446 } }
Version of msmarco-passage/train, with queries and documents translated into Japanese.
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ja/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ja/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/v2/ja
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ja/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ja/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ja/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ja/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ja/train")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ja/train docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } }, "docpairs": { "count": 39780811 } }
Version of msmarco-passage, with queries and documents translated into Portuguese.
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/pt")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/pt docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into Portuguese.
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/pt/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/pt/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/v2/pt
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/pt/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/pt/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/pt/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/pt/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/dev/small, with queries and documents translated into Portuguese.
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/pt/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/pt/dev/small queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/v2/pt
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/pt/dev/small")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/pt/dev/small docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/pt/dev/small")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/pt/dev/small qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/pt/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/pt/dev/small scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 6980 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } }, "scoreddocs": { "count": 6975268 } }
Version of msmarco-passage/train, with queries and documents translated into Portuguese.
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/pt/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/pt/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/v2/pt
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/pt/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/pt/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/pt/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/pt/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/pt/train")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/pt/train docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } }, "docpairs": { "count": 39780811 } }
Version of msmarco-passage, with queries and documents translated into Russian.
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ru")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ru docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into Russian.
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ru/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ru/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/v2/ru
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ru/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ru/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ru/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ru/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/dev/small, with queries and documents translated into Russian.
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ru/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ru/dev/small queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/v2/ru
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ru/dev/small")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ru/dev/small docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ru/dev/small")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ru/dev/small qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ru/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ru/dev/small scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 6980 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } }, "scoreddocs": { "count": 6931773 } }
Version of msmarco-passage/train, with queries and documents translated into Russian.
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ru/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ru/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/v2/ru
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ru/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ru/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ru/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ru/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ru/train")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ru/train docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } }, "docpairs": { "count": 39780811 } }
Version of msmarco-passage, with queries and documents translated into Vietnamese.
Language: vi
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/vi")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/vi docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into Vietnamese.
Language: vi
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/vi/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/vi/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/v2/vi
Language: vi
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/vi/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/vi/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/vi/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/vi/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/dev/small, with queries and documents translated into Vietnamese.
Language: vi
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/vi/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/vi/dev/small queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/v2/vi
Language: vi
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/vi/dev/small")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/vi/dev/small docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/vi/dev/small")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/vi/dev/small qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/vi/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/vi/dev/small scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 6980 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } }, "scoreddocs": { "count": 6976219 } }
Version of msmarco-passage/train, with queries and documents translated into Vietnamese.
Language: vi
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/vi/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/vi/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/v2/vi
Language: vi
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/vi/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/vi/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/vi/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/vi/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/vi/train")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/vi/train docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } }, "docpairs": { "count": 39780811 } }
Version of msmarco-passage, with queries and documents translated into Chinese.
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/zh")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/zh docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into Chinese.
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/zh/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/zh/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/v2/zh
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/zh/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/zh/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/zh/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/zh/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/dev/small, with queries and documents translated into Chinese.
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/zh/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/zh/dev/small queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/v2/zh
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/zh/dev/small")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/zh/dev/small docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/zh/dev/small")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/zh/dev/small qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/zh/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/zh/dev/small scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 6980 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } }, "scoreddocs": { "count": 6979520 } }
Version of msmarco-passage/train, with queries and documents translated into Chinese.
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/zh/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/zh/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/v2/zh
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/zh/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/zh/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/zh/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/zh/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/zh/train")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/zh/train docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } }, "docpairs": { "count": 39780811 } }
Version of msmarco-passage, with documents translated into Chinese.
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/zh")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into Chinese.
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/zh/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/zh
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/zh/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/zh/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/dev/small, with queries and documents translated into Chinese.
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/zh/dev/small queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/zh
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/dev/small")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/zh/dev/small docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/dev/small")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/zh/dev/small qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 6980 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } } }
Version of msmarco-passage/dev, with queries and documents translated into Chinese.
Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here.
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/dev/small/v1.1")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/zh/dev/small/v1.1 queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/zh
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/dev/small/v1.1")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/zh/dev/small/v1.1 docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits qrels from mmarco/zh/dev/small
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/dev/small/v1.1")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/zh/dev/small/v1.1 qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/dev/small/v1.1")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export mmarco/zh/dev/small/v1.1 scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 6980 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } }, "scoreddocs": { "count": 1034597 } }
Version of msmarco-passage/dev, with queries and documents translated into Chinese.
Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here.
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/dev/v1.1")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/zh/dev/v1.1 queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/zh
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/dev/v1.1")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/zh/dev/v1.1 docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits qrels from mmarco/zh/dev
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/dev/v1.1")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/zh/dev/v1.1 qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/train, with queries and documents translated into Chinese.
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/zh/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from mmarco/zh
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/zh/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/zh/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/train")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export mmarco/zh/train docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } }, "docpairs": { "count": 39780811 } }