← home
Github: datasets/codec.py

ir_datasets: CODEC

Index
  1. codec
  2. codec/economics
  3. codec/history
  4. codec/politics

"codec"

CODEC Document Ranking sub-task.

  • Documents: curated web articles
  • Queries: challenging, entity-focused queries
  • Task Repository
  • See also: kilt/codec, the entity ranking subtask
queries
36 queries

Language: en

Query type:
CodecQuery: (namedtuple)
  1. query_id: str
  2. query: str
  3. narrative: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("codec")
for query in dataset.queries_iter():
    query # namedtuple<query_id, query, narrative>

You can find more details about the Python API here.

CLI
ir_datasets export codec queries
[query_id]    [query]    [narrative]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

qrels
5.1K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
0Not Relevant. Not useful or on topic.2.1K40.4%
1Not Valuable. Consists of definitions or background.1.8K35.5%
2Somewhat Valuable. Includes valuable topic-specific arguments, evidence, or knowledge.924 18.0%
3Very Valuable. Includes central topic-specific arguments, evidence, or knowledge. This does not include general definitions or background.312 6.1%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("codec")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export codec qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Metadata

"codec/economics"

Subset of codec that only contains topics about economics.

queries
12 queries

Language: en

Query type:
CodecQuery: (namedtuple)
  1. query_id: str
  2. query: str
  3. narrative: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("codec/economics")
for query in dataset.queries_iter():
    query # namedtuple<query_id, query, narrative>

You can find more details about the Python API here.

CLI
ir_datasets export codec/economics queries
[query_id]    [query]    [narrative]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

qrels
1.6K qrels
Query relevance judgment type:
GenericQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int

Relevance levels

Rel.DefinitionCount%
0Not Relevant. Not useful or on topic.596 37.4%
1Not Valuable. Consists of definitions or background.545 34.2%
2Somewhat Valuable. Includes valuable topic-specific arguments, evidence, or knowledge.330 20.7%
3Very Valuable. Includes central topic-specific arguments, evidence, or knowledge. This does not include general definitions or background.121 7.6%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("codec/economics")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance>

You can find more details about the Python API here.

CLI
ir_datasets export codec/economics qrels --format tsv
[query_id]    [doc_id]    [relevance]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Metadata

"codec/history"

Subset of codec that only contains topics about history.

queries
12 queries

Language: en

Query type:
CodecQuery: (namedtuple)
  1. query_id: str
  2. query: str
  3. narrative: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("codec/history")
for query in dataset.queries_iter():
    query # namedtuple<query_id, query, narrative>

You can find more details about the Python API here.

CLI
ir_datasets export codec/history queries
[query_id]    [query]    [narrative]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

qrels
1.7K qrels
Query relevance judgment type:
GenericQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int

Relevance levels

Rel.DefinitionCount%
0Not Relevant. Not useful or on topic.870 51.3%
1Not Valuable. Consists of definitions or background.509 30.0%
2Somewhat Valuable. Includes valuable topic-specific arguments, evidence, or knowledge.235 13.9%
3Very Valuable. Includes central topic-specific arguments, evidence, or knowledge. This does not include general definitions or background.81 4.8%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("codec/history")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance>

You can find more details about the Python API here.

CLI
ir_datasets export codec/history qrels --format tsv
[query_id]    [doc_id]    [relevance]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Metadata

"codec/politics"

Subset of codec that only contains topics about politics.

queries
12 queries

Language: en

Query type:
CodecQuery: (namedtuple)
  1. query_id: str
  2. query: str
  3. narrative: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("codec/politics")
for query in dataset.queries_iter():
    query # namedtuple<query_id, query, narrative>

You can find more details about the Python API here.

CLI
ir_datasets export codec/politics queries
[query_id]    [query]    [narrative]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

qrels
1.8K qrels
Query relevance judgment type:
GenericQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int

Relevance levels

Rel.DefinitionCount%
0Not Relevant. Not useful or on topic.609 33.0%
1Not Valuable. Consists of definitions or background.765 41.5%
2Somewhat Valuable. Includes valuable topic-specific arguments, evidence, or knowledge.359 19.5%
3Very Valuable. Includes central topic-specific arguments, evidence, or knowledge. This does not include general definitions or background.110 6.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("codec/politics")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance>

You can find more details about the Python API here.

CLI
ir_datasets export codec/politics qrels --format tsv
[query_id]    [doc_id]    [relevance]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Metadata