← home
Github: datasets/aquaint.py

ir_datasets: AQUAINT

Index
  1. aquaint
  2. aquaint/trec-robust-2005

"aquaint"

A document collection of about 1M English newswire text. Sources are the Xinhua News Service (People's Republic of China), the New York Times News Service, and the Associated Press Worldstream News Service.

docs

Language: en

Document type:
TrecDoc: (namedtuple)
  1. doc_id: str
  2. text: str
  3. marked_up_doc: str

Example

import ir_datasets
dataset = ir_datasets.load('aquaint')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text, marked_up_doc>
Citation
bibtex: @misc{LDC2001T55, title={The AQUAINT Corpus of English News Text}, author={David Graff}, year={2002}, url={https://catalog.ldc.upenn.edu/LDC2002T31}, publisher={Linguistic Data Consortium} }

"aquaint/trec-robust-2005"

The TREC Robust 2005 dataset. Contains a subset of 50 "hard" queries from trec-robust04.

queries

Language: en

Query type:
TrecQuery: (namedtuple)
  1. query_id: str
  2. title: str
  3. description: str
  4. narrative: str

Example

import ir_datasets
dataset = ir_datasets.load('aquaint/trec-robust-2005')
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, description, narrative>
docs

Language: en

Document type:
TrecDoc: (namedtuple)
  1. doc_id: str
  2. text: str
  3. marked_up_doc: str

Example

import ir_datasets
dataset = ir_datasets.load('aquaint/trec-robust-2005')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text, marked_up_doc>
qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
0not relevant
1relevant
2highly relevant

Example

import ir_datasets
dataset = ir_datasets.load('aquaint/trec-robust-2005')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
Citation
bibtex: @inproceedings{Voorhees2005Robust, title={Overview of the TREC 2005 Robust Retrieval Track}, author={Ellen M. Voorhees}, booktitle={TREC}, year={2005} }