ir_datasets
: MSMARCO (passage)A passage ranking benchmark with a collection of 8.8 million passages and question queries. Most relevance judgments are shallow (typically at most 1-2 per query), but the TREC Deep Learning track adds deep judgments. Evaluation typically conducted using MRR@10.
Note that the original document source files for this collection contain a double-encoding error that cause strange sequences like "å¬" and "ðºð". These are automatically corrrected (properly converting previous examples to "公" and "🇺🇸").
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("msmarco-passage")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
Official dev set.
scoreddocs are the top 1000 results from BM25. These are used for the "re-ranking" setting. Note that these are sub-sampled to about 1/8 of the total avaialable dev queries by the MSMARCO authors for faster evaluation. The BM25 scores from scoreddocs are not available (all have a score of 0).
Official evaluation measures: RR@10
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("msmarco-passage/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Subset of msmarco-passage/dev that only includes queries that have at least one qrel.
Official evaluation measures: RR@10
Language: multiple/other/unknown
Examples:
import ir_datasets
dataset = ir_datasets.load("msmarco-passage/dev/judged")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Official "small" version of the dev set, consisting of 6,980 queries (6.9% of the full dev set).
Official evaluation measures: RR@10
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("msmarco-passage/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Official eval set for submission to MS MARCO leaderboard. Relevance judgments are hidden.
scoreddocs are the top 1000 results from BM25. These are used for the "re-ranking" setting. Note that these are sub-sampled to about 1/8 of the total avaialable eval queries by the MSMARCO authors for faster evaluation. The BM25 scores from scoreddocs are not available (all have a score of 0).
Official evaluation measures: RR@10
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("msmarco-passage/eval")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Official "small" version of the eval set, consisting of 6,837 queries (6.8% of the full eval set).
Official evaluation measures: RR@10
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("msmarco-passage/eval/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Official train set.
Not all queries have relevance judgments. Use msmarco-passage/train/judged for a filtered list that only includes documents that have at least one qrel.
scoreddocs are the top 1000 results from BM25. These are used for the "re-ranking" setting. Note that these are sub-sampled to about 1/8 of the total avaialable train queries by the MSMARCO authors for faster evaluation. The BM25 scores from scoreddocs are not available (all have a score of 0).
docpairs provides access to the "official" sequence for pairwise training.
Official evaluation measures: RR@10
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("msmarco-passage/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Subset of msmarco-passage/train that only includes queries that have at least one qrel.
Official evaluation measures: RR@10
Language: multiple/other/unknown
Examples:
import ir_datasets
dataset = ir_datasets.load("msmarco-passage/train/judged")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Subset of msmarco-passage/train that only includes queries that have a layman or expert medical term. Note that this includes about 20% false matches due to terms with multiple senses.
Official evaluation measures: RR@10
Language: multiple/other/unknown
Examples:
import ir_datasets
dataset = ir_datasets.load("msmarco-passage/train/medical")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Subset of msmarco-passage/train without 200 queries that are meant to be used as a small validation set. From various works.
Official evaluation measures: RR@10
Language: multiple/other/unknown
Examples:
import ir_datasets
dataset = ir_datasets.load("msmarco-passage/train/split200-train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Subset of msmarco-passage/train with only 200 queries that are meant to be used as a small validation set. From various works.
Official evaluation measures: RR@10
Language: multiple/other/unknown
Examples:
import ir_datasets
dataset = ir_datasets.load("msmarco-passage/train/split200-valid")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Queries from the TREC Deep Learning (DL) 2019 shared task, which were sampled from msmarco-passage/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-passage/trec-dl-2019/judged).
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("msmarco-passage/trec-dl-2019")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Subset of msmarco-passage/trec-dl-2019, only including queries with qrels.
Language: multiple/other/unknown
Examples:
import ir_datasets
dataset = ir_datasets.load("msmarco-passage/trec-dl-2019/judged")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Queries from the TREC Deep Learning (DL) 2020 shared task, which were sampled from msmarco-passage/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-passage/trec-dl-2020/judged).
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("msmarco-passage/trec-dl-2020")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Subset of msmarco-passage/trec-dl-2020, only including queries with qrels.
Language: multiple/other/unknown
Examples:
import ir_datasets
dataset = ir_datasets.load("msmarco-passage/trec-dl-2020/judged")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
A more challenging subset of msmarco-passage/trec-dl-2019 and msmarco-document/trec-dl-2020.
Language: multiple/other/unknown
Examples:
import ir_datasets
dataset = ir_datasets.load("msmarco-passage/trec-dl-hard")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Fold 1 of msmarco-passage/trec-dl-hard
Language: multiple/other/unknown
Examples:
import ir_datasets
dataset = ir_datasets.load("msmarco-passage/trec-dl-hard/fold1")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Fold 2 of msmarco-passage/trec-dl-hard
Language: multiple/other/unknown
Examples:
import ir_datasets
dataset = ir_datasets.load("msmarco-passage/trec-dl-hard/fold2")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Fold 3 of msmarco-passage/trec-dl-hard
Language: multiple/other/unknown
Examples:
import ir_datasets
dataset = ir_datasets.load("msmarco-passage/trec-dl-hard/fold3")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Fold 4 of msmarco-passage/trec-dl-hard
Language: multiple/other/unknown
Examples:
import ir_datasets
dataset = ir_datasets.load("msmarco-passage/trec-dl-hard/fold4")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Fold 5 of msmarco-passage/trec-dl-hard
Language: multiple/other/unknown
Examples:
import ir_datasets
dataset = ir_datasets.load("msmarco-passage/trec-dl-hard/fold5")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.