← home
Github: datasets/car.py

ir_datasets: TREC CAR

Index
  1. car
  2. car/v1.5
  3. car/v1.5/test200
  4. car/v1.5/train/fold0
  5. car/v1.5/train/fold1
  6. car/v1.5/train/fold2
  7. car/v1.5/train/fold3
  8. car/v1.5/train/fold4
  9. car/v1.5/trec-y1
  10. car/v1.5/trec-y1/auto
  11. car/v1.5/trec-y1/manual
  12. car/v2.0

"car"

An ad-hoc passage retrieval collection, constructed from Wikipedia and used as the basis of the TREC Complex Answer Retrieval (CAR) task.


"car/v1.5"

Version 1.5 of the TREC dataset. This version is used for year 1 (2017) of the TREC CAR shared task.

docsCitationMetadata
30M docs

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("car/v1.5")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.


"car/v1.5/test200"

Un-official test set consisting of manually-selected articles. Sometimes used as a validation set.

queriesdocsqrelsCitationMetadata
2.0K queries

Language: en

Query type:
CarQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. title: str
  4. headings: Tuple[str, ...]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("car/v1.5/test200")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, title, headings>

You can find more details about the Python API here.


"car/v1.5/train/fold0"

Fold 0 of the official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)

queriesdocsqrelsCitationMetadata
468K queries

Language: en

Query type:
CarQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. title: str
  4. headings: Tuple[str, ...]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("car/v1.5/train/fold0")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, title, headings>

You can find more details about the Python API here.


"car/v1.5/train/fold1"

Fold 1 of the official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)

queriesdocsqrelsCitationMetadata
467K queries

Language: en

Query type:
CarQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. title: str
  4. headings: Tuple[str, ...]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("car/v1.5/train/fold1")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, title, headings>

You can find more details about the Python API here.


"car/v1.5/train/fold2"

Fold 2 of the official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)

queriesdocsqrelsCitationMetadata
469K queries

Language: en

Query type:
CarQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. title: str
  4. headings: Tuple[str, ...]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("car/v1.5/train/fold2")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, title, headings>

You can find more details about the Python API here.


"car/v1.5/train/fold3"

Fold 3 of the official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)

queriesdocsqrelsCitationMetadata
463K queries

Language: en

Query type:
CarQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. title: str
  4. headings: Tuple[str, ...]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("car/v1.5/train/fold3")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, title, headings>

You can find more details about the Python API here.


"car/v1.5/train/fold4"

Fold 4 of the official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)

queriesdocsqrelsCitationMetadata
469K queries

Language: en

Query type:
CarQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. title: str
  4. headings: Tuple[str, ...]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("car/v1.5/train/fold4")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, title, headings>

You can find more details about the Python API here.


"car/v1.5/trec-y1"

Official test set of TREC CAR 2017 (year 1).

queriesdocsCitationMetadata
2.3K queries

Language: en

Query type:
CarQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. title: str
  4. headings: Tuple[str, ...]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("car/v1.5/trec-y1")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, title, headings>

You can find more details about the Python API here.


"car/v1.5/trec-y1/auto"

Official test set of TREC CAR 2017 (year 1), using automatic relevance judgments (assumed from hierarchical structure of pages, i.e., paragraphs under a header are assumed relevant.)

queriesdocsqrelsCitationMetadata
2.3K queries

Inherits queries from car/v1.5/trec-y1

Language: en

Query type:
CarQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. title: str
  4. headings: Tuple[str, ...]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("car/v1.5/trec-y1/auto")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, title, headings>

You can find more details about the Python API here.


"car/v1.5/trec-y1/manual"

Official test set of TREC CAR 2017 (year 1), using manual graded relevance judgments.

queriesdocsqrelsCitationMetadata
2.3K queries

Inherits queries from car/v1.5/trec-y1

Language: en

Query type:
CarQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. title: str
  4. headings: Tuple[str, ...]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("car/v1.5/trec-y1/manual")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, title, headings>

You can find more details about the Python API here.


"car/v2.0"

Version 2.0 of the TREC CAR dataset.

docsCitationMetadata
30M docs

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("car/v2.0")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.