ir_datasets
: AQUAINTA document collection of about 1M English newswire text. Sources are the Xinhua News Service (People's Republic of China), the New York Times News Service, and the Associated Press Worldstream News Service.
The TREC Robust 2005 dataset. Contains a subset of 50 "hard" queries from trec-robust04.
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('aquaint/trec-robust-2005')
for query in dataset.queries_iter():
query # namedtuple<query_id, title, description, narrative>