← home
Github: datasets/aquaint.py

ir_datasets: AQUAINT

Index
  1. aquaint
  2. aquaint/trec-robust-2005

"aquaint"

A document collection of about 1M English newswire text. Sources are the Xinhua News Service (People's Republic of China), the New York Times News Service, and the Associated Press Worldstream News Service.

docsCitation

Language: en

Document type:
TrecDoc: (namedtuple)
  1. doc_id: str
  2. text: str
  3. marked_up_doc: str

Example

import ir_datasets
dataset = ir_datasets.load('aquaint')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text, marked_up_doc>

"aquaint/trec-robust-2005"

The TREC Robust 2005 dataset. Contains a subset of 50 "hard" queries from trec-robust04.

queriesdocsqrelsCitation

Language: en

Query type:
TrecQuery: (namedtuple)
  1. query_id: str
  2. title: str
  3. description: str
  4. narrative: str

Example

import ir_datasets
dataset = ir_datasets.load('aquaint/trec-robust-2005')
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, description, narrative>