← home
Github: datasets/msmarco_qna.py

ir_datasets: MSMARCO (QnA)

Index
  1. msmarco-qna
  2. msmarco-qna/dev
  3. msmarco-qna/eval
  4. msmarco-qna/train

"msmarco-qna"

The MS MARCO Question Answering dataset. This is the source collection of msmarco-passage and msmarco-document.

It is prohibited to use information from this dataset for submissions to the MS MARCO passage and document leaderboards or the TREC DL shared task.

Query IDs in this collection align with those found in msmarco-passage and msmarco-document, but document IDs do not (the QnA collection does not come with document IDs itself; these are assigned sequentially by ir_datasets.)

docs

Language: en

Document type:
MsMarcoQnADoc: (namedtuple)
  1. doc_id: str
  2. text: str
  3. url: str

Example

import ir_datasets
dataset = ir_datasets.load('msmarco-qna')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text, url>
Citation
bibtex: @inproceedings{Bajaj2016MSMA, title={MS MARCO: A Human Generated MAchine Reading COmprehension Dataset}, author={Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, Mir Rosenberg, Xia Song, Alina Stoica, Saurabh Tiwary, Tong Wang}, booktitle={InCoCo@NIPS}, year={2016} }

"msmarco-qna/dev"

Official dev set.

The scoreddocs provides the roughtly 10 passages presented to the user for annotation, where the score indicates the order presented.

queries

Language: en

Query type:
MsMarcoQnAQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. type: str
  4. answers: Tuple[str, ...]

Example

import ir_datasets
dataset = ir_datasets.load('msmarco-qna/dev')
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, type, answers>
docs

Language: en

Document type:
MsMarcoQnADoc: (namedtuple)
  1. doc_id: str
  2. text: str
  3. url: str

Example

import ir_datasets
dataset = ir_datasets.load('msmarco-qna/dev')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text, url>
qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
0Not marked by annotator as a contribution to their answer
1Marked by annotator as a contribution to their answer

Example

import ir_datasets
dataset = ir_datasets.load('msmarco-qna/dev')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Example

import ir_datasets
dataset = ir_datasets.load('msmarco-qna/dev')
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

"msmarco-qna/eval"

Official eval set.

The scoreddocs provides the roughtly 10 passages presented to the user for annotation, where the score indicates the order presented.

queries

Language: en

Query type:
MsMarcoQnAEvalQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. type: str

Example

import ir_datasets
dataset = ir_datasets.load('msmarco-qna/eval')
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, type>
docs

Language: en

Document type:
MsMarcoQnADoc: (namedtuple)
  1. doc_id: str
  2. text: str
  3. url: str

Example

import ir_datasets
dataset = ir_datasets.load('msmarco-qna/eval')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text, url>
scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Example

import ir_datasets
dataset = ir_datasets.load('msmarco-qna/eval')
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

"msmarco-qna/train"

Official train set.

The scoreddocs provides the roughtly 10 passages presented to the user for annotation, where the score indicates the order presented.

queries

Language: en

Query type:
MsMarcoQnAQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. type: str
  4. answers: Tuple[str, ...]

Example

import ir_datasets
dataset = ir_datasets.load('msmarco-qna/train')
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, type, answers>
docs

Language: en

Document type:
MsMarcoQnADoc: (namedtuple)
  1. doc_id: str
  2. text: str
  3. url: str

Example

import ir_datasets
dataset = ir_datasets.load('msmarco-qna/train')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text, url>
qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
0Not marked by annotator as a contribution to their answer
1Marked by annotator as a contribution to their answer

Example

import ir_datasets
dataset = ir_datasets.load('msmarco-qna/train')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Example

import ir_datasets
dataset = ir_datasets.load('msmarco-qna/train')
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>