ir_datasets
: mMARCOA version of the MS MARCO passage dataset (msmarco-passage) with the queries and documents automatically translated into several languages.
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }Version of msmarco-passage, with documents translated into German.
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/de")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.de')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into German.
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/de/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/de/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.de.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/de
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/de/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/de/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.de.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/de/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/de/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.de.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/dev/small, with queries and documents translated into German.
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/de/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/de/dev/small queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.de.dev.small.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/de
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/de/dev/small")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/de/dev/small docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.de.dev.small')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/de/dev/small")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/de/dev/small qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.de.dev.small.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/de/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export mmarco/de/dev/small scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
run = datamaestro.prepare_dataset('irds.mmarco.de.dev.small.scoreddocs') # AdhocRun
# A run is a generic object, and is specialized into final classes
# e.g. TrecAdhocRun
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocRun
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 6980 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } }, "scoreddocs": { "count": 6594126 } }
Version of msmarco-passage/train, with queries and documents translated into German.
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/de/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/de/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.de.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/de
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/de/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/de/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.de.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/de/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/de/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.de.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/de/train")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export mmarco/de/train docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
docpairs = datamaestro.prepare_dataset('irds.mmarco.de.train.docpairs')
next(docpairs.iter()) # Display the first triplet
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } }, "docpairs": { "count": 39780811 } }
Version of msmarco-passage, with documents translated into Spanish.
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/es")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.es')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into Spanish.
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/es/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/es/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.es.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/es
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/es/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/es/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.es.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/es/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/es/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.es.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101092 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/dev/small, with queries and documents translated into Spanish.
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/es/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/es/dev/small queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.es.dev.small.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/es
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/es/dev/small")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/es/dev/small docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.es.dev.small')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/es/dev/small")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/es/dev/small qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.es.dev.small.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/es/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export mmarco/es/dev/small scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
run = datamaestro.prepare_dataset('irds.mmarco.es.dev.small.scoreddocs') # AdhocRun
# A run is a generic object, and is specialized into final classes
# e.g. TrecAdhocRun
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocRun
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 6980 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } }, "scoreddocs": { "count": 6786720 } }
Version of msmarco-passage/train, with queries and documents translated into Spanish.
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/es/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/es/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.es.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/es
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/es/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/es/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.es.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/es/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/es/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.es.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/es/train")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export mmarco/es/train docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
docpairs = datamaestro.prepare_dataset('irds.mmarco.es.train.docpairs')
next(docpairs.iter()) # Display the first triplet
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } }, "docpairs": { "count": 39780811 } }
Version of msmarco-passage, with documents translated into French.
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/fr")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.fr')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into French.
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/fr/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/fr/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.fr.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/fr
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/fr/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/fr/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.fr.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/fr/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/fr/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.fr.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/dev/small, with queries and documents translated into French.
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/fr/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/fr/dev/small queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.fr.dev.small.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/fr
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/fr/dev/small")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/fr/dev/small docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.fr.dev.small')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/fr/dev/small")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/fr/dev/small qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.fr.dev.small.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/fr/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export mmarco/fr/dev/small scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
run = datamaestro.prepare_dataset('irds.mmarco.fr.dev.small.scoreddocs') # AdhocRun
# A run is a generic object, and is specialized into final classes
# e.g. TrecAdhocRun
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocRun
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 6980 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } }, "scoreddocs": { "count": 6785763 } }
Version of msmarco-passage/train, with queries and documents translated into French.
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/fr/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/fr/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.fr.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/fr
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/fr/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/fr/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.fr.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/fr/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/fr/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.fr.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/fr/train")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export mmarco/fr/train docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
docpairs = datamaestro.prepare_dataset('irds.mmarco.fr.train.docpairs')
next(docpairs.iter()) # Display the first triplet
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } }, "docpairs": { "count": 39780811 } }
Version of msmarco-passage, with documents translated into Indonesian.
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/id")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.id')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into Indonesian.
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/id/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/id/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.id.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/id
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/id/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/id/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.id.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/id/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/id/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.id.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/dev/small, with queries and documents translated into Indonesian.
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/id/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/id/dev/small queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.id.dev.small.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/id
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/id/dev/small")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/id/dev/small docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.id.dev.small')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/id/dev/small")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/id/dev/small qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.id.dev.small.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/id/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export mmarco/id/dev/small scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
run = datamaestro.prepare_dataset('irds.mmarco.id.dev.small.scoreddocs') # AdhocRun
# A run is a generic object, and is specialized into final classes
# e.g. TrecAdhocRun
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocRun
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 6980 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } }, "scoreddocs": { "count": 6841990 } }
Version of msmarco-passage/train, with queries and documents translated into Indonesian.
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/id/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/id/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.id.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/id
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/id/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/id/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.id.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/id/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/id/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.id.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/id/train")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export mmarco/id/train docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
docpairs = datamaestro.prepare_dataset('irds.mmarco.id.train.docpairs')
next(docpairs.iter()) # Display the first triplet
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } }, "docpairs": { "count": 39780811 } }
Version of msmarco-passage, with documents translated into Italian.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/it")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.it')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into Italian.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/it/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/it/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.it.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/it
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/it/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/it/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.it.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/it/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/it/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.it.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/dev/small, with queries and documents translated into Italian.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/it/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/it/dev/small queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.it.dev.small.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/it
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/it/dev/small")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/it/dev/small docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.it.dev.small')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/it/dev/small")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/it/dev/small qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.it.dev.small.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/it/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export mmarco/it/dev/small scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
run = datamaestro.prepare_dataset('irds.mmarco.it.dev.small.scoreddocs') # AdhocRun
# A run is a generic object, and is specialized into final classes
# e.g. TrecAdhocRun
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocRun
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 6980 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } }, "scoreddocs": { "count": 6966491 } }
Version of msmarco-passage/train, with queries and documents translated into Italian.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/it/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/it/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.it.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/it
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/it/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/it/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.it.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/it/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/it/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.it.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/it/train")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export mmarco/it/train docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
docpairs = datamaestro.prepare_dataset('irds.mmarco.it.train.docpairs')
next(docpairs.iter()) # Display the first triplet
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } }, "docpairs": { "count": 39780811 } }
Version of msmarco-passage, with documents translated into Portuguese.
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.pt')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into Portuguese.
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.pt.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/pt
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.pt.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.pt.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101619 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/dev/small, with queries and documents translated into Portuguese.
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/dev/small queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.pt.dev.small.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/pt
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/dev/small")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/dev/small docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.pt.dev.small')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/dev/small")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/dev/small qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.pt.dev.small.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 7000 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } } }
Version of msmarco-passage/dev, with queries and documents translated into Portuguese.
Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here. It also removes some duplicated query IDs.
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/dev/small/v1.1")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/dev/small/v1.1 queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.pt.dev.small.v1.1.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/pt
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/dev/small/v1.1")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/dev/small/v1.1 docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.pt.dev.small.v1.1')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Inherits qrels from mmarco/pt/dev/small
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/dev/small/v1.1")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/dev/small/v1.1 qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.pt.dev.small.v1.1.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/dev/small/v1.1")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/dev/small/v1.1 scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
run = datamaestro.prepare_dataset('irds.mmarco.pt.dev.small.v1.1.scoreddocs') # AdhocRun
# A run is a generic object, and is specialized into final classes
# e.g. TrecAdhocRun
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocRun
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 6980 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } }, "scoreddocs": { "count": 6976324 } }
Version of msmarco-passage/dev, with queries and documents translated into Portuguese.
Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here. It also removes some duplicated query IDs.
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/dev/v1.1")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/dev/v1.1 queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.pt.dev.v1.1.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/pt
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/dev/v1.1")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/dev/v1.1 docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.pt.dev.v1.1')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Inherits qrels from mmarco/pt/dev
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/dev/v1.1")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/dev/v1.1 qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.pt.dev.v1.1.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/train, with queries and documents translated into Portuguese.
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.pt.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/pt
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.pt.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.pt.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/train")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/train docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
docpairs = datamaestro.prepare_dataset('irds.mmarco.pt.train.docpairs')
next(docpairs.iter()) # Display the first triplet
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 811690 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } }, "docpairs": { "count": 39780811 } }
Version of msmarco-passage/train, with queries and documents translated into Portuguese.
Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here. It also removes some duplicated query IDs.
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/train/v1.1")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/train/v1.1 queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.pt.train.v1.1.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/pt
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/train/v1.1")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/train/v1.1 docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.pt.train.v1.1')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Inherits qrels from mmarco/pt/train
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/train/v1.1")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/train/v1.1 qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.pt.train.v1.1.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Inherits docpairs from mmarco/pt/train
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/pt/train/v1.1")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export mmarco/pt/train/v1.1 docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
docpairs = datamaestro.prepare_dataset('irds.mmarco.pt.train.v1.1.docpairs')
next(docpairs.iter()) # Display the first triplet
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } }, "docpairs": { "count": 39780811 } }
Version of msmarco-passage, with documents translated into Russian.
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/ru")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.ru')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into Russian.
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/ru/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/ru/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.ru.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/ru
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/ru/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/ru/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.ru.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/ru/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/ru/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.ru.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/dev/small, with queries and documents translated into Russian.
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/ru/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/ru/dev/small queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.ru.dev.small.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/ru
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/ru/dev/small")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/ru/dev/small docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.ru.dev.small')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/ru/dev/small")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/ru/dev/small qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.ru.dev.small.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/ru/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export mmarco/ru/dev/small scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
run = datamaestro.prepare_dataset('irds.mmarco.ru.dev.small.scoreddocs') # AdhocRun
# A run is a generic object, and is specialized into final classes
# e.g. TrecAdhocRun
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocRun
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 6980 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } }, "scoreddocs": { "count": 6958739 } }
Version of msmarco-passage/train, with queries and documents translated into Russian.
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/ru/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/ru/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.ru.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/ru
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/ru/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/ru/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.ru.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/ru/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/ru/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.ru.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/ru/train")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export mmarco/ru/train docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
docpairs = datamaestro.prepare_dataset('irds.mmarco.ru.train.docpairs')
next(docpairs.iter()) # Display the first triplet
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } }, "docpairs": { "count": 39780811 } }
Version of msmarco-passage, with queries and documents translated into Arabic.
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ar")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ar docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.ar')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into Arabic.
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ar/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ar/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.ar.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/v2/ar
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ar/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ar/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.ar.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ar/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ar/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.ar.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/dev/small, with queries and documents translated into Arabic.
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ar/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ar/dev/small queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.ar.dev.small.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/v2/ar
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ar/dev/small")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ar/dev/small docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.ar.dev.small')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ar/dev/small")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ar/dev/small qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.ar.dev.small.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ar/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ar/dev/small scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
run = datamaestro.prepare_dataset('irds.mmarco.v2.ar.dev.small.scoreddocs') # AdhocRun
# A run is a generic object, and is specialized into final classes
# e.g. TrecAdhocRun
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocRun
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 6980 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } }, "scoreddocs": { "count": 6848687 } }
Version of msmarco-passage/train, with queries and documents translated into Arabic.
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ar/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ar/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.ar.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/v2/ar
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ar/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ar/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.ar.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ar/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ar/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.ar.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ar/train")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ar/train docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
docpairs = datamaestro.prepare_dataset('irds.mmarco.v2.ar.train.docpairs')
next(docpairs.iter()) # Display the first triplet
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } }, "docpairs": { "count": 39780811 } }
Version of msmarco-passage, with queries and documents translated into German.
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/de")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/de docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.de')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into German.
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/de/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/de/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.de.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/v2/de
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/de/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/de/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.de.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/de/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/de/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.de.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/dev/small, with queries and documents translated into German.
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/de/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/de/dev/small queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.de.dev.small.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/v2/de
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/de/dev/small")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/de/dev/small docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.de.dev.small')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/de/dev/small")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/de/dev/small qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.de.dev.small.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/de/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/de/dev/small scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
run = datamaestro.prepare_dataset('irds.mmarco.v2.de.dev.small.scoreddocs') # AdhocRun
# A run is a generic object, and is specialized into final classes
# e.g. TrecAdhocRun
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocRun
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 6980 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } }, "scoreddocs": { "count": 6586918 } }
Version of msmarco-passage/train, with queries and documents translated into German.
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/de/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/de/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.de.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/v2/de
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/de/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/de/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.de.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/de/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/de/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.de.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/de/train")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/de/train docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
docpairs = datamaestro.prepare_dataset('irds.mmarco.v2.de.train.docpairs')
next(docpairs.iter()) # Display the first triplet
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } }, "docpairs": { "count": 39780811 } }
Version of msmarco-passage, with queries and documents translated into Dutch.
Language: dt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/dt")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/dt docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.dt')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into Dutch.
Language: dt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/dt/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/dt/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.dt.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/v2/dt
Language: dt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/dt/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/dt/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.dt.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/dt/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/dt/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.dt.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/dev/small, with queries and documents translated into Dutch.
Language: dt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/dt/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/dt/dev/small queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.dt.dev.small.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/v2/dt
Language: dt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/dt/dev/small")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/dt/dev/small docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.dt.dev.small')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/dt/dev/small")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/dt/dev/small qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.dt.dev.small.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/dt/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/dt/dev/small scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
run = datamaestro.prepare_dataset('irds.mmarco.v2.dt.dev.small.scoreddocs') # AdhocRun
# A run is a generic object, and is specialized into final classes
# e.g. TrecAdhocRun
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocRun
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 6980 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } }, "scoreddocs": { "count": 6608183 } }
Version of msmarco-passage/train, with queries and documents translated into Dutch.
Language: dt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/dt/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/dt/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.dt.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/v2/dt
Language: dt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/dt/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/dt/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.dt.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/dt/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/dt/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.dt.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/dt/train")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/dt/train docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
docpairs = datamaestro.prepare_dataset('irds.mmarco.v2.dt.train.docpairs')
next(docpairs.iter()) # Display the first triplet
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } }, "docpairs": { "count": 39780811 } }
Version of msmarco-passage, with queries and documents translated into Spanish.
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/es")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/es docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.es')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into Spanish.
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/es/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/es/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.es.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/v2/es
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/es/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/es/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.es.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/es/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/es/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.es.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/dev/small, with queries and documents translated into Spanish.
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/es/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/es/dev/small queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.es.dev.small.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/v2/es
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/es/dev/small")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/es/dev/small docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.es.dev.small')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/es/dev/small")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/es/dev/small qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.es.dev.small.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/es/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/es/dev/small scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
run = datamaestro.prepare_dataset('irds.mmarco.v2.es.dev.small.scoreddocs') # AdhocRun
# A run is a generic object, and is specialized into final classes
# e.g. TrecAdhocRun
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocRun
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 6980 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } }, "scoreddocs": { "count": 6777044 } }
Version of msmarco-passage/train, with queries and documents translated into Spanish.
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/es/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/es/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.es.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/v2/es
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/es/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/es/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.es.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/es/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/es/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.es.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/es/train")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/es/train docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
docpairs = datamaestro.prepare_dataset('irds.mmarco.v2.es.train.docpairs')
next(docpairs.iter()) # Display the first triplet
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } }, "docpairs": { "count": 39780811 } }
Version of msmarco-passage, with queries and documents translated into French.
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/fr")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/fr docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.fr')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into French.
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/fr/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/fr/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.fr.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/v2/fr
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/fr/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/fr/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.fr.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/fr/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/fr/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.fr.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/dev/small, with queries and documents translated into French.
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/fr/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/fr/dev/small queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.fr.dev.small.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/v2/fr
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/fr/dev/small")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/fr/dev/small docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.fr.dev.small')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/fr/dev/small")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/fr/dev/small qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.fr.dev.small.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/fr/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/fr/dev/small scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
run = datamaestro.prepare_dataset('irds.mmarco.v2.fr.dev.small.scoreddocs') # AdhocRun
# A run is a generic object, and is specialized into final classes
# e.g. TrecAdhocRun
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocRun
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 6980 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } }, "scoreddocs": { "count": 6831783 } }
Version of msmarco-passage/train, with queries and documents translated into French.
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/fr/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/fr/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.fr.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/v2/fr
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/fr/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/fr/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.fr.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/fr/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/fr/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.fr.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/fr/train")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/fr/train docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
docpairs = datamaestro.prepare_dataset('irds.mmarco.v2.fr.train.docpairs')
next(docpairs.iter()) # Display the first triplet
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } }, "docpairs": { "count": 39780811 } }
Version of msmarco-passage, with queries and documents translated into Hindi.
Language: hi
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/hi")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/hi docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.hi')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into Hindi.
Language: hi
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/hi/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/hi/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.hi.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/v2/hi
Language: hi
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/hi/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/hi/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.hi.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/hi/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/hi/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.hi.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/dev/small, with queries and documents translated into Hindi.
Language: hi
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/hi/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/hi/dev/small queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.hi.dev.small.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/v2/hi
Language: hi
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/hi/dev/small")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/hi/dev/small docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.hi.dev.small')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/hi/dev/small")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/hi/dev/small qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.hi.dev.small.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/hi/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/hi/dev/small scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
run = datamaestro.prepare_dataset('irds.mmarco.v2.hi.dev.small.scoreddocs') # AdhocRun
# A run is a generic object, and is specialized into final classes
# e.g. TrecAdhocRun
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocRun
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 6980 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } }, "scoreddocs": { "count": 6961912 } }
Version of msmarco-passage/train, with queries and documents translated into Hindi.
Language: hi
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/hi/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/hi/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.hi.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/v2/hi
Language: hi
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/hi/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/hi/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.hi.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/hi/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/hi/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.hi.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/hi/train")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/hi/train docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
docpairs = datamaestro.prepare_dataset('irds.mmarco.v2.hi.train.docpairs')
next(docpairs.iter()) # Display the first triplet
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } }, "docpairs": { "count": 39780811 } }
Version of msmarco-passage, with queries and documents translated into Indonesian.
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/id")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/id docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.id')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into Indonesian.
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/id/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/id/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.id.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/v2/id
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/id/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/id/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.id.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/id/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/id/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.id.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/dev/small, with queries and documents translated into Indonesian.
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/id/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/id/dev/small queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.id.dev.small.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/v2/id
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/id/dev/small")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/id/dev/small docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.id.dev.small')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/id/dev/small")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/id/dev/small qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.id.dev.small.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/id/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/id/dev/small scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
run = datamaestro.prepare_dataset('irds.mmarco.v2.id.dev.small.scoreddocs') # AdhocRun
# A run is a generic object, and is specialized into final classes
# e.g. TrecAdhocRun
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocRun
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 6980 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } }, "scoreddocs": { "count": 6791487 } }
Version of msmarco-passage/train, with queries and documents translated into Indonesian.
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/id/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/id/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.id.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/v2/id
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/id/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/id/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.id.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/id/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/id/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.id.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/id/train")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/id/train docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
docpairs = datamaestro.prepare_dataset('irds.mmarco.v2.id.train.docpairs')
next(docpairs.iter()) # Display the first triplet
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } }, "docpairs": { "count": 39780811 } }
Version of msmarco-passage, with queries and documents translated into Italian.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/it")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/it docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.it')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into Italian.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/it/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/it/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.it.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/v2/it
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/it/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/it/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.it.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/it/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/it/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.it.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/dev/small, with queries and documents translated into Italian.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/it/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/it/dev/small queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.it.dev.small.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/v2/it
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/it/dev/small")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/it/dev/small docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.it.dev.small')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/it/dev/small")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/it/dev/small qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.it.dev.small.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/it/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/it/dev/small scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
run = datamaestro.prepare_dataset('irds.mmarco.v2.it.dev.small.scoreddocs') # AdhocRun
# A run is a generic object, and is specialized into final classes
# e.g. TrecAdhocRun
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocRun
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 6980 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } }, "scoreddocs": { "count": 6952771 } }
Version of msmarco-passage/train, with queries and documents translated into Italian.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/it/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/it/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.it.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/v2/it
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/it/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/it/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.it.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/it/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/it/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.it.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/it/train")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/it/train docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
docpairs = datamaestro.prepare_dataset('irds.mmarco.v2.it.train.docpairs')
next(docpairs.iter()) # Display the first triplet
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } }, "docpairs": { "count": 39780811 } }
Version of msmarco-passage, with queries and documents translated into Japanese.
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ja")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ja docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.ja')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into Japanese.
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ja/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ja/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.ja.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/v2/ja
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ja/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ja/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.ja.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ja/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ja/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.ja.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/dev/small, with queries and documents translated into Japanese.
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ja/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ja/dev/small queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.ja.dev.small.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/v2/ja
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ja/dev/small")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ja/dev/small docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.ja.dev.small')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ja/dev/small")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ja/dev/small qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.ja.dev.small.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ja/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ja/dev/small scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
run = datamaestro.prepare_dataset('irds.mmarco.v2.ja.dev.small.scoreddocs') # AdhocRun
# A run is a generic object, and is specialized into final classes
# e.g. TrecAdhocRun
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocRun
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 6980 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } }, "scoreddocs": { "count": 6817446 } }
Version of msmarco-passage/train, with queries and documents translated into Japanese.
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ja/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ja/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.ja.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/v2/ja
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ja/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ja/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.ja.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ja/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ja/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.ja.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ja/train")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ja/train docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
docpairs = datamaestro.prepare_dataset('irds.mmarco.v2.ja.train.docpairs')
next(docpairs.iter()) # Display the first triplet
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } }, "docpairs": { "count": 39780811 } }
Version of msmarco-passage, with queries and documents translated into Portuguese.
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/pt")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/pt docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.pt')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into Portuguese.
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/pt/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/pt/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.pt.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/v2/pt
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/pt/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/pt/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.pt.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/pt/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/pt/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.pt.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/dev/small, with queries and documents translated into Portuguese.
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/pt/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/pt/dev/small queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.pt.dev.small.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/v2/pt
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/pt/dev/small")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/pt/dev/small docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.pt.dev.small')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/pt/dev/small")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/pt/dev/small qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.pt.dev.small.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/pt/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/pt/dev/small scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
run = datamaestro.prepare_dataset('irds.mmarco.v2.pt.dev.small.scoreddocs') # AdhocRun
# A run is a generic object, and is specialized into final classes
# e.g. TrecAdhocRun
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocRun
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 6980 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } }, "scoreddocs": { "count": 6975268 } }
Version of msmarco-passage/train, with queries and documents translated into Portuguese.
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/pt/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/pt/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.pt.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/v2/pt
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/pt/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/pt/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.pt.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/pt/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/pt/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.pt.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/pt/train")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/pt/train docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
docpairs = datamaestro.prepare_dataset('irds.mmarco.v2.pt.train.docpairs')
next(docpairs.iter()) # Display the first triplet
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } }, "docpairs": { "count": 39780811 } }
Version of msmarco-passage, with queries and documents translated into Russian.
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ru")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ru docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.ru')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into Russian.
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ru/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ru/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.ru.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/v2/ru
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ru/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ru/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.ru.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ru/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ru/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.ru.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/dev/small, with queries and documents translated into Russian.
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ru/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ru/dev/small queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.ru.dev.small.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/v2/ru
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ru/dev/small")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ru/dev/small docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.ru.dev.small')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ru/dev/small")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ru/dev/small qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.ru.dev.small.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ru/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ru/dev/small scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
run = datamaestro.prepare_dataset('irds.mmarco.v2.ru.dev.small.scoreddocs') # AdhocRun
# A run is a generic object, and is specialized into final classes
# e.g. TrecAdhocRun
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocRun
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 6980 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } }, "scoreddocs": { "count": 6931773 } }
Version of msmarco-passage/train, with queries and documents translated into Russian.
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ru/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ru/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.ru.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/v2/ru
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ru/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ru/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.ru.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ru/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ru/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.ru.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/ru/train")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/ru/train docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
docpairs = datamaestro.prepare_dataset('irds.mmarco.v2.ru.train.docpairs')
next(docpairs.iter()) # Display the first triplet
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } }, "docpairs": { "count": 39780811 } }
Version of msmarco-passage, with queries and documents translated into Vietnamese.
Language: vi
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/vi")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/vi docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.vi')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into Vietnamese.
Language: vi
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/vi/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/vi/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.vi.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/v2/vi
Language: vi
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/vi/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/vi/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.vi.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/vi/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/vi/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.vi.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/dev/small, with queries and documents translated into Vietnamese.
Language: vi
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/vi/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/vi/dev/small queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.vi.dev.small.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/v2/vi
Language: vi
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/vi/dev/small")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/vi/dev/small docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.vi.dev.small')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/vi/dev/small")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/vi/dev/small qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.vi.dev.small.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/vi/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/vi/dev/small scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
run = datamaestro.prepare_dataset('irds.mmarco.v2.vi.dev.small.scoreddocs') # AdhocRun
# A run is a generic object, and is specialized into final classes
# e.g. TrecAdhocRun
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocRun
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 6980 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } }, "scoreddocs": { "count": 6976219 } }
Version of msmarco-passage/train, with queries and documents translated into Vietnamese.
Language: vi
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/vi/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/vi/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.vi.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/v2/vi
Language: vi
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/vi/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/vi/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.vi.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/vi/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/vi/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.vi.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/vi/train")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/vi/train docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
docpairs = datamaestro.prepare_dataset('irds.mmarco.v2.vi.train.docpairs')
next(docpairs.iter()) # Display the first triplet
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } }, "docpairs": { "count": 39780811 } }
Version of msmarco-passage, with queries and documents translated into Chinese.
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/zh")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/zh docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.zh')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into Chinese.
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/zh/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/zh/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.zh.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/v2/zh
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/zh/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/zh/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.zh.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/zh/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/zh/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.zh.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/dev/small, with queries and documents translated into Chinese.
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/zh/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/zh/dev/small queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.zh.dev.small.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/v2/zh
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/zh/dev/small")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/zh/dev/small docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.zh.dev.small')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/zh/dev/small")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/zh/dev/small qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.zh.dev.small.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/zh/dev/small")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/zh/dev/small scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
run = datamaestro.prepare_dataset('irds.mmarco.v2.zh.dev.small.scoreddocs') # AdhocRun
# A run is a generic object, and is specialized into final classes
# e.g. TrecAdhocRun
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocRun
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 6980 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } }, "scoreddocs": { "count": 6979520 } }
Version of msmarco-passage/train, with queries and documents translated into Chinese.
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/zh/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/zh/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.v2.zh.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/v2/zh
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/zh/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/zh/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.v2.zh.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/zh/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/zh/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.v2.zh.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/v2/zh/train")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export mmarco/v2/zh/train docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
docpairs = datamaestro.prepare_dataset('irds.mmarco.v2.zh.train.docpairs')
next(docpairs.iter()) # Display the first triplet
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } }, "docpairs": { "count": 39780811 } }
Version of msmarco-passage, with documents translated into Chinese.
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/zh")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.zh')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Version of msmarco-passage/dev, with queries and documents translated into Chinese.
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/zh/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.zh.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/zh
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/zh/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.zh.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/zh/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.zh.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/dev/small, with queries and documents translated into Chinese.
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/zh/dev/small queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.zh.dev.small.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/zh
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/dev/small")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/zh/dev/small docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.zh.dev.small')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/dev/small")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/zh/dev/small qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.zh.dev.small.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 6980 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } } }
Version of msmarco-passage/dev, with queries and documents translated into Chinese.
Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here.
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/dev/small/v1.1")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/zh/dev/small/v1.1 queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.zh.dev.small.v1.1.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/zh
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/dev/small/v1.1")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/zh/dev/small/v1.1 docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.zh.dev.small.v1.1')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Inherits qrels from mmarco/zh/dev/small
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/dev/small/v1.1")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/zh/dev/small/v1.1 qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.zh.dev.small.v1.1.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/dev/small/v1.1")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export mmarco/zh/dev/small/v1.1 scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
run = datamaestro.prepare_dataset('irds.mmarco.zh.dev.small.v1.1.scoreddocs') # AdhocRun
# A run is a generic object, and is specialized into final classes
# e.g. TrecAdhocRun
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocRun
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 6980 }, "qrels": { "count": 7437, "fields": { "relevance": { "counts_by_value": { "1": 7437 } } } }, "scoreddocs": { "count": 1034597 } }
Version of msmarco-passage/dev, with queries and documents translated into Chinese.
Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here.
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/dev/v1.1")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/zh/dev/v1.1 queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.zh.dev.v1.1.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/zh
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/dev/v1.1")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/zh/dev/v1.1 docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.zh.dev.v1.1')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Inherits qrels from mmarco/zh/dev
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/dev/v1.1")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/zh/dev/v1.1 qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.zh.dev.v1.1.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 101093 }, "qrels": { "count": 59273, "fields": { "relevance": { "counts_by_value": { "1": 59273 } } } } }
Version of msmarco-passage/train, with queries and documents translated into Chinese.
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/zh/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mmarco.zh.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mmarco/zh
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mmarco/zh/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mmarco.zh.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mmarco/zh/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mmarco.zh.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Examples:
import ir_datasets
dataset = ir_datasets.load("mmarco/zh/train")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export mmarco/zh/train docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
docpairs = datamaestro.prepare_dataset('irds.mmarco.zh.train.docpairs')
next(docpairs.iter()) # Display the first triplet
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets
Bibtex:
@article{Bonifacio2021MMarco, title={{mMARCO}: A Multilingual Version of {MS MARCO} Passage Ranking Dataset}, author={Luiz Henrique Bonifacio and Israel Campiotti and Roberto Lotufo and Rodrigo Nogueira}, year={2021}, journal={arXiv:2108.13897} }{ "docs": { "count": 8841823, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808731 }, "qrels": { "count": 532761, "fields": { "relevance": { "counts_by_value": { "1": 532761 } } } }, "docpairs": { "count": 39780811 } }