← home
Github: datasets/msmarco_passage.py

ir_datasets: MSMARCO (passage)

Index
  1. msmarco-passage
  2. msmarco-passage/dev
  3. msmarco-passage/dev/judged
  4. msmarco-passage/dev/small
  5. msmarco-passage/eval
  6. msmarco-passage/eval/small
  7. msmarco-passage/train
  8. msmarco-passage/train/judged
  9. msmarco-passage/train/medical
  10. msmarco-passage/train/split200-train
  11. msmarco-passage/train/split200-valid
  12. msmarco-passage/trec-dl-2019
  13. msmarco-passage/trec-dl-2019/judged
  14. msmarco-passage/trec-dl-2020
  15. msmarco-passage/trec-dl-2020/judged
  16. msmarco-passage/trec-dl-hard
  17. msmarco-passage/trec-dl-hard/fold1
  18. msmarco-passage/trec-dl-hard/fold2
  19. msmarco-passage/trec-dl-hard/fold3
  20. msmarco-passage/trec-dl-hard/fold4
  21. msmarco-passage/trec-dl-hard/fold5

"msmarco-passage"

A passage ranking benchmark with a collection of 8.8 million passages and question queries. Most relevance judgments are shallow (typically at most 1-2 per query), but the TREC Deep Learning track adds deep judgments. Evaluation typically conducted using MRR@10.

Note that the original document source files for this collection contain a double-encoding error that cause strange sequences like "å¬" and "ðºð". These are automatically corrrected (properly converting previous examples to "公" and "🇺🇸").

docsCitationMetadata
8.8M docs

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("msmarco-passage")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.


"msmarco-passage/dev"

Official dev set.

scoreddocs are the top 1000 results from BM25. These are used for the "re-ranking" setting. Note that these are sub-sampled to about 1/8 of the total avaialable dev queries by the MSMARCO authors for faster evaluation. The BM25 scores from scoreddocs are not available (all have a score of 0).

Official evaluation measures: RR@10

queriesdocsqrelsscoreddocsCitationMetadata
101K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("msmarco-passage/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"msmarco-passage/dev/judged"

Subset of msmarco-passage/dev that only includes queries that have at least one qrel.

Official evaluation measures: RR@10

queriesdocsqrelsscoreddocsCitationMetadata
56K queries

Language: multiple/other/unknown

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("msmarco-passage/dev/judged")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"msmarco-passage/dev/small"

Official "small" version of the dev set, consisting of 6,980 queries (6.9% of the full dev set).

Official evaluation measures: RR@10

queriesdocsqrelsCitationMetadata
7.0K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("msmarco-passage/dev/small")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"msmarco-passage/eval"

Official eval set for submission to MS MARCO leaderboard. Relevance judgments are hidden.

scoreddocs are the top 1000 results from BM25. These are used for the "re-ranking" setting. Note that these are sub-sampled to about 1/8 of the total avaialable eval queries by the MSMARCO authors for faster evaluation. The BM25 scores from scoreddocs are not available (all have a score of 0).

Official evaluation measures: RR@10

queriesdocsscoreddocsCitationMetadata
101K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("msmarco-passage/eval")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"msmarco-passage/eval/small"

Official "small" version of the eval set, consisting of 6,837 queries (6.8% of the full eval set).

Official evaluation measures: RR@10

queriesdocsCitationMetadata
6.8K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("msmarco-passage/eval/small")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"msmarco-passage/train"

Official train set.

Not all queries have relevance judgments. Use msmarco-passage/train/judged for a filtered list that only includes documents that have at least one qrel.

scoreddocs are the top 1000 results from BM25. These are used for the "re-ranking" setting. Note that these are sub-sampled to about 1/8 of the total avaialable train queries by the MSMARCO authors for faster evaluation. The BM25 scores from scoreddocs are not available (all have a score of 0).

docpairs provides access to the "official" sequence for pairwise training.

Official evaluation measures: RR@10

queriesdocsqrelsscoreddocsdocpairsCitationMetadata
809K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("msmarco-passage/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"msmarco-passage/train/judged"

Subset of msmarco-passage/train that only includes queries that have at least one qrel.

Official evaluation measures: RR@10

queriesdocsqrelsscoreddocsdocpairsCitationMetadata
503K queries

Language: multiple/other/unknown

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("msmarco-passage/train/judged")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"msmarco-passage/train/medical"

Subset of msmarco-passage/train that only includes queries that have a layman or expert medical term. Note that this includes about 20% false matches due to terms with multiple senses.

Official evaluation measures: RR@10

queriesdocsqrelsscoreddocsdocpairsCitationMetadata
79K queries

Language: multiple/other/unknown

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("msmarco-passage/train/medical")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"msmarco-passage/train/split200-train"

Subset of msmarco-passage/train without 200 queries that are meant to be used as a small validation set. From various works.

Official evaluation measures: RR@10

queriesdocsqrelsscoreddocsdocpairsCitationMetadata
809K queries

Language: multiple/other/unknown

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("msmarco-passage/train/split200-train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"msmarco-passage/train/split200-valid"

Subset of msmarco-passage/train with only 200 queries that are meant to be used as a small validation set. From various works.

Official evaluation measures: RR@10

queriesdocsqrelsscoreddocsdocpairsCitationMetadata
200 queries

Language: multiple/other/unknown

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("msmarco-passage/train/split200-valid")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"msmarco-passage/trec-dl-2019"

Queries from the TREC Deep Learning (DL) 2019 shared task, which were sampled from msmarco-passage/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-passage/trec-dl-2019/judged).

Official evaluation measures: nDCG@10, RR(rel=2), AP(rel=2)

queriesdocsqrelsscoreddocsCitationMetadata
200 queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("msmarco-passage/trec-dl-2019")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"msmarco-passage/trec-dl-2019/judged"

Subset of msmarco-passage/trec-dl-2019, only including queries with qrels.

Official evaluation measures: nDCG@10, RR(rel=2), AP(rel=2)

queriesdocsqrelsscoreddocsCitationMetadata
43 queries

Language: multiple/other/unknown

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("msmarco-passage/trec-dl-2019/judged")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"msmarco-passage/trec-dl-2020"

Queries from the TREC Deep Learning (DL) 2020 shared task, which were sampled from msmarco-passage/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-passage/trec-dl-2020/judged).

Official evaluation measures: nDCG@10, RR(rel=2), AP(rel=2)

queriesdocsqrelsscoreddocsCitationMetadata
200 queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("msmarco-passage/trec-dl-2020")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"msmarco-passage/trec-dl-2020/judged"

Subset of msmarco-passage/trec-dl-2020, only including queries with qrels.

Official evaluation measures: nDCG@10, RR(rel=2), AP(rel=2)

queriesdocsqrelsscoreddocsCitationMetadata
54 queries

Language: multiple/other/unknown

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("msmarco-passage/trec-dl-2020/judged")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"msmarco-passage/trec-dl-hard"

A more challenging subset of msmarco-passage/trec-dl-2019 and msmarco-document/trec-dl-2020.

Official evaluation measures: nDCG@10, RR(rel=2)

queriesdocsqrelsCitationMetadata
50 queries

Language: multiple/other/unknown

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("msmarco-passage/trec-dl-hard")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"msmarco-passage/trec-dl-hard/fold1"

Fold 1 of msmarco-passage/trec-dl-hard

Official evaluation measures: nDCG@10, RR(rel=2)

queriesdocsqrelsCitationMetadata
10 queries

Language: multiple/other/unknown

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("msmarco-passage/trec-dl-hard/fold1")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"msmarco-passage/trec-dl-hard/fold2"

Fold 2 of msmarco-passage/trec-dl-hard

Official evaluation measures: nDCG@10, RR(rel=2)

queriesdocsqrelsCitationMetadata
10 queries

Language: multiple/other/unknown

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("msmarco-passage/trec-dl-hard/fold2")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"msmarco-passage/trec-dl-hard/fold3"

Fold 3 of msmarco-passage/trec-dl-hard

Official evaluation measures: nDCG@10, RR(rel=2)

queriesdocsqrelsCitationMetadata
10 queries

Language: multiple/other/unknown

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("msmarco-passage/trec-dl-hard/fold3")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"msmarco-passage/trec-dl-hard/fold4"

Fold 4 of msmarco-passage/trec-dl-hard

Official evaluation measures: nDCG@10, RR(rel=2)

queriesdocsqrelsCitationMetadata
10 queries

Language: multiple/other/unknown

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("msmarco-passage/trec-dl-hard/fold4")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"msmarco-passage/trec-dl-hard/fold5"

Fold 5 of msmarco-passage/trec-dl-hard

Official evaluation measures: nDCG@10, RR(rel=2)

queriesdocsqrelsCitationMetadata
10 queries

Language: multiple/other/unknown

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("msmarco-passage/trec-dl-hard/fold5")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.