ir_datasets : CSL

import ir_datasets
dataset = ir_datasets.load("csl")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, abstract, keywords, category, category_eng, discipline, discipline_eng>

You can find more details about the Python API here.

CLI

ir_datasets export csl docs



[doc_id]    [title]    [abstract]    [keywords]    [category]    [category_eng]    [discipline]    [discipline_eng]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.csl')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Metadata

{
  "docs": {
    "count": 395927,
    "fields": {
      "doc_id": {
        "max_len": 10,
        "common_prefix": "csl-"
      }
    }
  }
}

`"csl/trec-2023"`

The TREC NeuCLIR 2023 technical documen task.

queries

41 queries

Language: multiple/other/unknown

Query type:

ExctractedCCNoReportQuery: (namedtuple)

query_id: str
title: str
description: str
narrative: str
ht_title: str
ht_description: str
ht_narrative: str
mt_title: str
mt_description: str
mt_narrative: str
translation_lang: str

Examples:

import ir_datasets
dataset = ir_datasets.load("csl/trec-2023")
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, description, narrative, ht_title, ht_description, ht_narrative, mt_title, mt_description, mt_narrative, translation_lang>

You can find more details about the Python API here.

CLI

ir_datasets export csl/trec-2023 queries



[query_id]    [title]    [description]    [narrative]    [ht_title]    [ht_description]    [ht_narrative]    [mt_title]    [mt_description]    [mt_narrative]    [translation_lang]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.csl.trec-2023.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

396K docs

Inherits docs from csl

Language: zh

Document type:

CslDoc: (namedtuple)

doc_id: str
title: str
abstract: str
keywords: List[str]
category: str
category_eng: str
discipline: str
discipline_eng: str

Examples:

import ir_datasets
dataset = ir_datasets.load("csl/trec-2023")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, abstract, keywords, category, category_eng, discipline, discipline_eng>

You can find more details about the Python API here.

CLI

ir_datasets export csl/trec-2023 docs



[doc_id]    [title]    [abstract]    [keywords]    [category]    [category_eng]    [discipline]    [discipline_eng]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.csl.trec-2023')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

11K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
0	Not-valuable. Information in the document might be included in a report footnote, or omitted entirely.	`11K`	94.3%
1	Somewhat-valuable. The most valuable information in the document would be found in the remainder of such a report.	`419`	3.7%
3	Very-valuable. Information in the document would be found in the lead paragraph of a report that is later written on the topic.	`228`	2.0%

Examples:

import ir_datasets
dataset = ir_datasets.load("csl/trec-2023")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export csl/trec-2023 qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier