← home
Github: datasets/trec_robust04.py

ir_datasets: TREC Robust 2004

Index
  1. trec-robust04
  2. trec-robust04/fold1
  3. trec-robust04/fold2
  4. trec-robust04/fold3
  5. trec-robust04/fold4
  6. trec-robust04/fold5

Data Access Information

To use this dataset, you need a copy of TREC disks 4 and 5, provided by NIST.

Your organization may already have a copy. If this is the case, you may only need to complete a new "Individual Argeement". Otherwise, your organization will need to file the "Organizational agreement" with NIST. It can take some time to process, but you will end up with a password-protected download link.

ir_datasets needs the following directories from the source:

ir_datasets expects the above directories to be copied/linked under ~/.ir_datasets/trec-robust04/trec45. The source document files themselves can either be compressed or uncompressed (it seems they have been distributed both ways in the past.) If ir_datasets does not find the files it is expecting, it will raise an error.


"trec-robust04"

The TREC Robust retrieval task focuses on "improving the consistency of retrieval technology by focusing on poorly performing topics."

The TREC Robust document collection is from TREC disks 4 and 5. Due to the copyrighted nature of the documents, this collection is for research use only, which requires agreements to be filed with NIST. See details here.

queriesdocsqrelsCitation

Language: en

Query type:
TrecQuery: (namedtuple)
  1. query_id: str
  2. title: str
  3. description: str
  4. narrative: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("trec-robust04")
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, description, narrative>

You can find more details about the Python API here.


"trec-robust04/fold1"

Robust04 Fold 1 (Title) proposed by Huston & Croft (2014) and used in numerious works

queriesdocsqrelsCitation

Language: en

Query type:
TrecQuery: (namedtuple)
  1. query_id: str
  2. title: str
  3. description: str
  4. narrative: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("trec-robust04/fold1")
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, description, narrative>

You can find more details about the Python API here.


"trec-robust04/fold2"

Robust04 Fold 2 (Title) proposed by Huston & Croft (2014) and used in numerious works

queriesdocsqrelsCitation

Language: en

Query type:
TrecQuery: (namedtuple)
  1. query_id: str
  2. title: str
  3. description: str
  4. narrative: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("trec-robust04/fold2")
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, description, narrative>

You can find more details about the Python API here.


"trec-robust04/fold3"

Robust04 Fold 3 (Title) proposed by Huston & Croft (2014) and used in numerious works

queriesdocsqrelsCitation

Language: en

Query type:
TrecQuery: (namedtuple)
  1. query_id: str
  2. title: str
  3. description: str
  4. narrative: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("trec-robust04/fold3")
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, description, narrative>

You can find more details about the Python API here.


"trec-robust04/fold4"

Robust04 Fold 4 (Title) proposed by Huston & Croft (2014) and used in numerious works

queriesdocsqrelsCitation

Language: en

Query type:
TrecQuery: (namedtuple)
  1. query_id: str
  2. title: str
  3. description: str
  4. narrative: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("trec-robust04/fold4")
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, description, narrative>

You can find more details about the Python API here.


"trec-robust04/fold5"

Robust04 Fold 5 (Title) proposed by Huston & Croft (2014) and used in numerious works

queriesdocsqrelsCitation

Language: en

Query type:
TrecQuery: (namedtuple)
  1. query_id: str
  2. title: str
  3. description: str
  4. narrative: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("trec-robust04/fold5")
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, description, narrative>

You can find more details about the Python API here.