← home
Github: datasets/codec.py

ir_datasets: CODEC

Index
  1. codec
  2. codec/economics
  3. codec/history
  4. codec/politics

Data Access Information

To use this dataset, you need a copy the document corpus from here.

The process involves emailing a dataset author, who will provide instructions for downloading the dataset.

ir_datasets expects the source file to be copied/linked under ~/.ir_datasets/codec/v1/comets_documents.jsonl.


"codec"

CODEC Document Ranking sub-task.

  • Documents: curated web articles
  • Queries: challenging, entity-focused queries
  • Task Repository
  • See also: kilt/codec, the entity ranking subtask
queriesdocsqrelsCitationMetadata
42 queries

Language: en

Query type:
CodecQuery: (namedtuple)
  1. query_id: str
  2. query: str
  3. domain: str
  4. guidelines: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("codec")
for query in dataset.queries_iter():
    query # namedtuple<query_id, query, domain, guidelines>

You can find more details about the Python API here.


"codec/economics"

Subset of codec that only contains topics about economics.

queriesdocsqrelsCitationMetadata
14 queries

Language: en

Query type:
CodecQuery: (namedtuple)
  1. query_id: str
  2. query: str
  3. domain: str
  4. guidelines: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("codec/economics")
for query in dataset.queries_iter():
    query # namedtuple<query_id, query, domain, guidelines>

You can find more details about the Python API here.


"codec/history"

Subset of codec that only contains topics about history.

queriesdocsqrelsCitationMetadata
14 queries

Language: en

Query type:
CodecQuery: (namedtuple)
  1. query_id: str
  2. query: str
  3. domain: str
  4. guidelines: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("codec/history")
for query in dataset.queries_iter():
    query # namedtuple<query_id, query, domain, guidelines>

You can find more details about the Python API here.


"codec/politics"

Subset of codec that only contains topics about politics.

queriesdocsqrelsCitationMetadata
14 queries

Language: en

Query type:
CodecQuery: (namedtuple)
  1. query_id: str
  2. query: str
  3. domain: str
  4. guidelines: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("codec/politics")
for query in dataset.queries_iter():
    query # namedtuple<query_id, query, domain, guidelines>

You can find more details about the Python API here.