ir_datasets
: MSMARCO (passage)A passage ranking benchmark with a collection of 8.8 million passages and question queries. Most relevance judgments are shallow (typically at most 1-2 per query), but the TREC Deep Learning track adds deep judgments. Evaluation typically conducted using MRR@10.
Note that the original document source files for this collection contain a double-encoding error that cause strange sequences like "å¬" and "ðºð". These are automatically corrrected (properly converting previous examples to "公" and "🇺🇸").
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage')
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
Official dev set.
scoreddocs are the top 1000 results from BM25. These are used for the "re-ranking" setting. Note that these are sub-sampled to about 1/8 of the total avaialable dev queries by the MSMARCO authors for faster evaluation. The BM25 scores from scoreddocs are not available (all have a score of 0).
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/dev')
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/dev')
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
Relevance levels
Rel. | Definition |
---|---|
1 | Labeled by crowd worker as relevant |
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/dev')
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/dev')
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
Subset of msmarco-passage/dev that only includes queries that have at least one qrel.
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/dev/judged')
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/dev/judged')
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
Relevance levels
Rel. | Definition |
---|---|
1 | Labeled by crowd worker as relevant |
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/dev/judged')
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/dev/judged')
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
Official "small" version of the dev set, consisting of 6,980 queries (6.9% of the full dev set).
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/dev/small')
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/dev/small')
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
Relevance levels
Rel. | Definition |
---|---|
1 | Labeled by crowd worker as relevant |
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/dev/small')
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
Official eval set for submission to MS MARCO leaderboard. Relevance judgments are hidden.
scoreddocs are the top 1000 results from BM25. These are used for the "re-ranking" setting. Note that these are sub-sampled to about 1/8 of the total avaialable eval queries by the MSMARCO authors for faster evaluation. The BM25 scores from scoreddocs are not available (all have a score of 0).
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/eval')
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/eval')
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/eval')
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
Official "small" version of the eval set, consisting of 6,837 queries (6.8% of the full eval set).
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/eval/small')
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/eval/small')
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
Official train set.
Not all queries have relevance judgments. Use msmarco-passage/train/judged for a filtered list that only includes documents that have at least one qrel.
scoreddocs are the top 1000 results from BM25. These are used for the "re-ranking" setting. Note that these are sub-sampled to about 1/8 of the total avaialable train queries by the MSMARCO authors for faster evaluation. The BM25 scores from scoreddocs are not available (all have a score of 0).
docpairs provides access to the "official" sequence for pairwise training.
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/train')
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/train')
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
Relevance levels
Rel. | Definition |
---|---|
1 | Labeled by crowd worker as relevant |
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/train')
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/train')
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/train')
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
Subset of msmarco-passage/train that only includes queries that have at least one qrel.
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/train/judged')
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/train/judged')
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
Relevance levels
Rel. | Definition |
---|---|
1 | Labeled by crowd worker as relevant |
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/train/judged')
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/train/judged')
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/train/judged')
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
Subset of msmarco-passage/train that only includes queries that have a layman or expert medical term. Note that this includes about 20% false matches due to terms with multiple senses.
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/train/medical')
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/train/medical')
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
Relevance levels
Rel. | Definition |
---|---|
1 | Labeled by crowd worker as relevant |
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/train/medical')
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/train/medical')
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/train/medical')
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
Subset of msmarco-passage/train without 200 queries that are meant to be used as a small validation set. From various works.
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/train/split200-train')
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/train/split200-train')
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
Relevance levels
Rel. | Definition |
---|---|
1 | Labeled by crowd worker as relevant |
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/train/split200-train')
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/train/split200-train')
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/train/split200-train')
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
Subset of msmarco-passage/train with only 200 queries that are meant to be used as a small validation set. From various works.
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/train/split200-valid')
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/train/split200-valid')
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
Relevance levels
Rel. | Definition |
---|---|
1 | Labeled by crowd worker as relevant |
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/train/split200-valid')
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/train/split200-valid')
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/train/split200-valid')
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
Queries from the TREC Deep Learning (DL) 2019 shared task, which were sampled from msmarco-passage/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-passage/trec-dl-2019/judged).
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/trec-dl-2019')
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/trec-dl-2019')
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
Relevance levels
Rel. | Definition |
---|---|
0 | Irrelevant: The passage has nothing to do with the query. |
1 | Related: The passage seems related to the query but does not answer it. |
2 | Highly relevant: The passage has some answer for the query, but the answer may be a bit unclear, or hidden amongst extraneous information. |
3 | Perfectly relevant: The passage is dedicated to the query and contains the exact answer. |
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/trec-dl-2019')
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/trec-dl-2019')
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
Subset of msmarco-passage/trec-dl-2019, only including queries with qrels.
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/trec-dl-2019/judged')
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/trec-dl-2019/judged')
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
Relevance levels
Rel. | Definition |
---|---|
0 | Irrelevant: The passage has nothing to do with the query. |
1 | Related: The passage seems related to the query but does not answer it. |
2 | Highly relevant: The passage has some answer for the query, but the answer may be a bit unclear, or hidden amongst extraneous information. |
3 | Perfectly relevant: The passage is dedicated to the query and contains the exact answer. |
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/trec-dl-2019/judged')
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/trec-dl-2019/judged')
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
Queries from the TREC Deep Learning (DL) 2020 shared task, which were sampled from msmarco-passage/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-passage/trec-dl-2020/judged).
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/trec-dl-2020')
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/trec-dl-2020')
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
Relevance levels
Rel. | Definition |
---|---|
0 | Irrelevant: The passage has nothing to do with the query. |
1 | Related: The passage seems related to the query but does not answer it. |
2 | Highly relevant: The passage has some answer for the query, but the answer may be a bit unclear, or hidden amongst extraneous information. |
3 | Perfectly relevant: The passage is dedicated to the query and contains the exact answer. |
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/trec-dl-2020')
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/trec-dl-2020')
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
Subset of msmarco-passage/trec-dl-2020, only including queries with qrels.
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/trec-dl-2020/judged')
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/trec-dl-2020/judged')
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
Relevance levels
Rel. | Definition |
---|---|
0 | Irrelevant: The passage has nothing to do with the query. |
1 | Related: The passage seems related to the query but does not answer it. |
2 | Highly relevant: The passage has some answer for the query, but the answer may be a bit unclear, or hidden amongst extraneous information. |
3 | Perfectly relevant: The passage is dedicated to the query and contains the exact answer. |
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/trec-dl-2020/judged')
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
Example
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/trec-dl-2020/judged')
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>