ir_datasets
: MSMARCO (passage, version 2)Version 2 of the MS MARCO passage ranking dataset. The corpus contains 138M passages, which can be linked up with documents in msmarco-document-v2.
Change Log
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("msmarco-passage-v2")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, spans, msmarco_document_id>
You can find more details about the Python API here.
Official dev1 set with 3,903 queries.
Note that that qrels in this dataset are not directly human-assessed; labels from msmarco-passage are mapped to documents via URL, these documents are re-passaged, and then the best approximate match is identified.
Official evaluation measures: RR@10
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("msmarco-passage-v2/dev1")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Official dev2 set with 4,281 queries.
Note that that qrels in this dataset are not directly human-assessed; labels from msmarco-passage are mapped to documents via URL, these documents are re-passaged, and then the best approximate match is identified.
Official evaluation measures: RR@10
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("msmarco-passage-v2/dev2")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Official train set with 277,144 queries.
Note that that qrels in this dataset are not directly human-assessed; labels from msmarco-passage are mapped to documents via URL, these documents are re-passaged, and then the best approximate match is identified.
Official evaluation measures: RR@10
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("msmarco-passage-v2/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Official topics for the TREC Deep Learning (DL) 2021 shared task.
Note that at this time, qrels are only available to those with TREC active participant login credentials.
Official evaluation measures: AP@100, nDCG@10, P(rel=2)@10, RR(rel=2)
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("msmarco-passage-v2/trec-dl-2021")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
msmarco-passage-v2/trec-dl-2021, but filtered down to the 53 queries with qrels.
Note that at this time, this is only available to those with TREC active participant login credentials.
Official evaluation measures: AP@100, nDCG@10, P(rel=2)@10, RR(rel=2)
Language: multiple/other/unknown
Examples:
import ir_datasets
dataset = ir_datasets.load("msmarco-passage-v2/trec-dl-2021/judged")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.