ir_datasets
: CODECTo use this dataset, you need a copy the document corpus from here.
The process involves emailing a dataset author, who will provide instructions for downloading the dataset.
ir_datasets expects the source file to be copied/linked under ~/.ir_datasets/codec/v1/comets_documents.jsonl.
CODEC Document Ranking sub-task.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("codec")
for query in dataset.queries_iter():
query # namedtuple<query_id, query, domain, guidelines>
You can find more details about the Python API here.
Subset of codec that only contains topics about economics.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("codec/economics")
for query in dataset.queries_iter():
query # namedtuple<query_id, query, domain, guidelines>
You can find more details about the Python API here.
Subset of codec that only contains topics about history.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("codec/history")
for query in dataset.queries_iter():
query # namedtuple<query_id, query, domain, guidelines>
You can find more details about the Python API here.
Subset of codec that only contains topics about politics.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("codec/politics")
for query in dataset.queries_iter():
query # namedtuple<query_id, query, domain, guidelines>
You can find more details about the Python API here.