ir_datasets
: TREC CARAn ad-hoc passage retrieval collection, constructed from Wikipedia and used as the basis of the TREC Complex Answer Retrieval (CAR) task.
Version 1.5 of the TREC dataset. This version is used for year 1 (2017) of the TREC CAR shared task.
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('car/v1.5')
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
Un-official test set consisting of manually-selected articles. Sometimes used as a validation set.
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('car/v1.5/test200')
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('car/v1.5/test200')
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
Relevance levels
Rel. | Definition |
---|---|
1 | Paragraph appears under heading |
Example
import ir_datasets
dataset = ir_datasets.load('car/v1.5/test200')
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
Official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('car/v1.5/train/fold0')
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('car/v1.5/train/fold0')
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
Relevance levels
Rel. | Definition |
---|---|
1 | Paragraph appears under heading |
Example
import ir_datasets
dataset = ir_datasets.load('car/v1.5/train/fold0')
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
Official test set of TREC CAR 2017 (year 1).
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('car/v1.5/trec-y1')
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('car/v1.5/trec-y1')
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
Official test set of TREC CAR 2017 (year 1), using automatic relevance judgments (assumed from hierarchical structure of pages, i.e., paragraphs under a header are assumed relevant.)
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('car/v1.5/trec-y1/auto')
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('car/v1.5/trec-y1/auto')
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
Relevance levels
Rel. | Definition |
---|---|
1 | Paragraph appears under heading |
Example
import ir_datasets
dataset = ir_datasets.load('car/v1.5/trec-y1/auto')
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
Official test set of TREC CAR 2017 (year 1), using manual graded relevance judgments.
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('car/v1.5/trec-y1/manual')
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('car/v1.5/trec-y1/manual')
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
Relevance levels
Rel. | Definition |
---|---|
-2 | Trash |
-1 | NO, non-relevant |
0 | Non-relevant, but roughly on TOPIC |
1 | CAN be mentioned |
2 | SHOULD be mentioned |
3 | MUST be mentioned |
Example
import ir_datasets
dataset = ir_datasets.load('car/v1.5/trec-y1/manual')
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>