ir_datasets
: TREC MandarinA collection of news articles in Mandarin in Simplified Chinese, used for multi-lingual evaluation in TREC 5 and TREC 6.
Document collection from LDC2000T52.
Language: zh
Example
import ir_datasets
dataset = ir_datasets.load('trec-mandarin')
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, marked_up_doc>
Mandarin Chinese benchmark from TREC 5.
Language: multiple/other/unknown
Example
import ir_datasets
dataset = ir_datasets.load('trec-mandarin/trec5')
for query in dataset.queries_iter():
query # namedtuple<query_id, title_en, title_zh, description_en, description_zh, narrative_en, narrative_zh>
Language: zh
Example
import ir_datasets
dataset = ir_datasets.load('trec-mandarin/trec5')
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, marked_up_doc>
Relevance levels
Rel. | Definition |
---|---|
0 | not relevant |
1 | relevant |
Example
import ir_datasets
dataset = ir_datasets.load('trec-mandarin/trec5')
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
Mandarin Chinese benchmark from TREC 6.
Language: multiple/other/unknown
Example
import ir_datasets
dataset = ir_datasets.load('trec-mandarin/trec6')
for query in dataset.queries_iter():
query # namedtuple<query_id, title_en, title_zh, description_en, description_zh, narrative_en, narrative_zh>
Language: zh
Example
import ir_datasets
dataset = ir_datasets.load('trec-mandarin/trec6')
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, marked_up_doc>
Relevance levels
Rel. | Definition |
---|---|
0 | not relevant |
1 | relevant |
Example
import ir_datasets
dataset = ir_datasets.load('trec-mandarin/trec6')
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>