← home
Github: datasets/car.py

ir_datasets: TREC CAR

Index
  1. car
  2. car/v1.5
  3. car/v1.5/test200
  4. car/v1.5/train/fold0
  5. car/v1.5/trec-y1
  6. car/v1.5/trec-y1/auto
  7. car/v1.5/trec-y1/manual

"car"

An ad-hoc passage retrieval collection, constructed from Wikipedia and used as the basis of the TREC Complex Answer Retrieval (CAR) task.


"car/v1.5"

Version 1.5 of the TREC dataset. This version is used for year 1 (2017) of the TREC CAR shared task.

docs

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('car/v1.5')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
Citation
bibtex: @article{Dietz2017, title={{TREC CAR}: A Data Set for Complex Answer Retrieval}, author={Laura Dietz and Ben Gamari}, year={2017}, note={Version 1.5}, url={http://trec-car.cs.unh.edu} }

"car/v1.5/test200"

Un-official test set consisting of manually-selected articles. Sometimes used as a validation set.

queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('car/v1.5/test200')
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
docs

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('car/v1.5/test200')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
1Paragraph appears under heading

Example

import ir_datasets
dataset = ir_datasets.load('car/v1.5/test200')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
Citation
bibtex: @inproceedings{nanni2017benchmark, title={Benchmark for complex answer retrieval}, author={Nanni, Federico and Mitra, Bhaskar and Magnusson, Matt and Dietz, Laura}, booktitle={ICTIR}, year={2017} }

"car/v1.5/train/fold0"

Official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)

queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('car/v1.5/train/fold0')
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
docs

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('car/v1.5/train/fold0')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
1Paragraph appears under heading

Example

import ir_datasets
dataset = ir_datasets.load('car/v1.5/train/fold0')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

"car/v1.5/trec-y1"

Official test set of TREC CAR 2017 (year 1).

queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('car/v1.5/trec-y1')
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
docs

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('car/v1.5/trec-y1')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
Citation
bibtex: @inproceedings{dietz2017trec, title={TREC Complex Answer Retrieval Overview.}, author={Dietz, Laura and Verma, Manisha and Radlinski, Filip and Craswell, Nick}, booktitle={TREC}, year={2017} }

"car/v1.5/trec-y1/auto"

Official test set of TREC CAR 2017 (year 1), using automatic relevance judgments (assumed from hierarchical structure of pages, i.e., paragraphs under a header are assumed relevant.)

queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('car/v1.5/trec-y1/auto')
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
docs

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('car/v1.5/trec-y1/auto')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
1Paragraph appears under heading

Example

import ir_datasets
dataset = ir_datasets.load('car/v1.5/trec-y1/auto')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

"car/v1.5/trec-y1/manual"

Official test set of TREC CAR 2017 (year 1), using manual graded relevance judgments.

queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('car/v1.5/trec-y1/manual')
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
docs

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('car/v1.5/trec-y1/manual')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
-2Trash
-1NO, non-relevant
0Non-relevant, but roughly on TOPIC
1CAN be mentioned
2SHOULD be mentioned
3MUST be mentioned

Example

import ir_datasets
dataset = ir_datasets.load('car/v1.5/trec-y1/manual')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>