ir_datasets
: TREC CARAn ad-hoc passage retrieval collection, constructed from Wikipedia and used as the basis of the TREC Complex Answer Retrieval (CAR) task.
Version 1.5 of the TREC dataset. This version is used for year 1 (2017) of the TREC CAR shared task.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("car/v1.5")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
Un-official test set consisting of manually-selected articles. Sometimes used as a validation set.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("car/v1.5/test200")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, title, headings>
You can find more details about the Python API here.
Fold 0 of the official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("car/v1.5/train/fold0")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, title, headings>
You can find more details about the Python API here.
Fold 1 of the official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("car/v1.5/train/fold1")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, title, headings>
You can find more details about the Python API here.
Fold 2 of the official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("car/v1.5/train/fold2")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, title, headings>
You can find more details about the Python API here.
Fold 3 of the official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("car/v1.5/train/fold3")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, title, headings>
You can find more details about the Python API here.
Fold 4 of the official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("car/v1.5/train/fold4")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, title, headings>
You can find more details about the Python API here.
Official test set of TREC CAR 2017 (year 1).
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("car/v1.5/trec-y1")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, title, headings>
You can find more details about the Python API here.
Official test set of TREC CAR 2017 (year 1), using automatic relevance judgments (assumed from hierarchical structure of pages, i.e., paragraphs under a header are assumed relevant.)
Inherits queries from car/v1.5/trec-y1
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("car/v1.5/trec-y1/auto")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, title, headings>
You can find more details about the Python API here.
Official test set of TREC CAR 2017 (year 1), using manual graded relevance judgments.
Inherits queries from car/v1.5/trec-y1
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("car/v1.5/trec-y1/manual")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, title, headings>
You can find more details about the Python API here.
Version 2.0 of the TREC CAR dataset.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("car/v2.0")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.