ir_datasets: CSLThe CSL dataset, used for the TREC NueCLIR technical document task.
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("csl")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, abstract, keywords, category, category_eng, discipline, discipline_eng>
You can find more details about the Python API here.
ir_datasets export csl docs
[doc_id]    [title]    [abstract]    [keywords]    [category]    [category_eng]    [discipline]    [discipline_eng]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.csl')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
{
  "docs": {
    "count": 395927,
    "fields": {
      "doc_id": {
        "max_len": 10,
        "common_prefix": "csl-"
      }
    }
  }
}
The TREC NeuCLIR 2023 technical documen task.
Language: multiple/other/unknown
Examples:
import ir_datasets
dataset = ir_datasets.load("csl/trec-2023")
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, description, narrative, ht_title, ht_description, ht_narrative, mt_title, mt_description, mt_narrative, translation_lang>
You can find more details about the Python API here.
ir_datasets export csl/trec-2023 queries
[query_id]    [title]    [description]    [narrative]    [ht_title]    [ht_description]    [ht_narrative]    [mt_title]    [mt_description]    [mt_narrative]    [translation_lang]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.csl.trec-2023.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from csl
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("csl/trec-2023")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, abstract, keywords, category, category_eng, discipline, discipline_eng>
You can find more details about the Python API here.
ir_datasets export csl/trec-2023 docs
[doc_id]    [title]    [abstract]    [keywords]    [category]    [category_eng]    [discipline]    [discipline_eng]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.csl.trec-2023')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 0 | Not-valuable. Information in the document might be included in a report footnote, or omitted entirely. | 11K | 94.3% | 
| 1 | Somewhat-valuable. The most valuable information in the document would be found in the remainder of such a report. | 419 | 3.7% | 
| 3 | Very-valuable. Information in the document would be found in the lead paragraph of a report that is later written on the topic. | 228 | 2.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("csl/trec-2023")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export csl/trec-2023 qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.csl.trec-2023.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
{
  "docs": {
    "count": 395927,
    "fields": {
      "doc_id": {
        "max_len": 10,
        "common_prefix": "csl-"
      }
    }
  },
  "queries": {
    "count": 41
  },
  "qrels": {
    "count": 11291,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "0": 10644,
          "1": 419,
          "3": 228
        }
      }
    }
  }
}