ir_datasets
: AQUAINTA document collection of about 1M English newswire text. Sources are the Xinhua News Service (People's Republic of China), the New York Times News Service, and the Associated Press Worldstream News Service.
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('aquaint')
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, marked_up_doc>
The TREC Robust 2005 dataset. Contains a subset of 50 "hard" queries from trec-robust04.
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('aquaint/trec-robust-2005')
for query in dataset.queries_iter():
query # namedtuple<query_id, title, description, narrative>
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('aquaint/trec-robust-2005')
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, marked_up_doc>
Relevance levels
Rel. | Definition |
---|---|
0 | not relevant |
1 | relevant |
2 | highly relevant |
Example
import ir_datasets
dataset = ir_datasets.load('aquaint/trec-robust-2005')
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>