← home
Github: datasets/kilt.py

ir_datasets: KILT

Index
  1. kilt
  2. kilt/codec
  3. kilt/codec/economics
  4. kilt/codec/history
  5. kilt/codec/politics

"kilt"

KILT is a corpus used for various "knowledge intensive language tasks".

docsCitationMetadata
5.9M docs

Language: en

Document type:
KiltDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str
  4. text_pieces: Tuple[str, ...]
  5. anchors: Tuple[
    KiltDocAnchor: (namedtuple)
    1. text: str
    2. href: str
    3. paragraph_id: int
    4. start: int
    5. end: int
    , ...]
  6. categories: Tuple[str, ...]
  7. wikidata_id: str
  8. history_revid: str
  9. history_timestamp: str
  10. history_parentid: str
  11. history_pageid: str
  12. history_url: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("kilt")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text, text_pieces, anchors, categories, wikidata_id, history_revid, history_timestamp, history_parentid, history_pageid, history_url>

You can find more details about the Python API here.


"kilt/codec"

CODEC Entity Ranking sub-task.

queriesdocsqrelsCitationMetadata
42 queries

Language: en

Query type:
CodecQuery: (namedtuple)
  1. query_id: str
  2. query: str
  3. domain: str
  4. guidelines: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("kilt/codec")
for query in dataset.queries_iter():
    query # namedtuple<query_id, query, domain, guidelines>

You can find more details about the Python API here.


"kilt/codec/economics"

Subset of codec that only contains topics about economics.

queriesdocsqrelsCitationMetadata
14 queries

Language: en

Query type:
CodecQuery: (namedtuple)
  1. query_id: str
  2. query: str
  3. domain: str
  4. guidelines: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("kilt/codec/economics")
for query in dataset.queries_iter():
    query # namedtuple<query_id, query, domain, guidelines>

You can find more details about the Python API here.


"kilt/codec/history"

Subset of codec that only contains topics about history.

queriesdocsqrelsCitationMetadata
14 queries

Language: en

Query type:
CodecQuery: (namedtuple)
  1. query_id: str
  2. query: str
  3. domain: str
  4. guidelines: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("kilt/codec/history")
for query in dataset.queries_iter():
    query # namedtuple<query_id, query, domain, guidelines>

You can find more details about the Python API here.


"kilt/codec/politics"

Subset of codec that only contains topics about politics.

queriesdocsqrelsCitationMetadata
14 queries

Language: en

Query type:
CodecQuery: (namedtuple)
  1. query_id: str
  2. query: str
  3. domain: str
  4. guidelines: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("kilt/codec/politics")
for query in dataset.queries_iter():
    query # namedtuple<query_id, query, domain, guidelines>

You can find more details about the Python API here.