ir_datasets: CSLThe CSL dataset, used for the TREC NueCLIR technical document task.
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("csl")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, abstract, keywords, category, category_eng, discipline, discipline_eng>
You can find more details about the Python API here.
ir_datasets export csl docs
[doc_id] [title] [abstract] [keywords] [category] [category_eng] [discipline] [discipline_eng]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.csl')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
{
"docs": {
"count": 395927,
"fields": {
"doc_id": {
"max_len": 10,
"common_prefix": "csl-"
}
}
}
}
The TREC NeuCLIR 2023 technical documen task.
Language: multiple/other/unknown
Examples:
import ir_datasets
dataset = ir_datasets.load("csl/trec-2023")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, description, narrative, ht_title, ht_description, ht_narrative, mt_title, mt_description, mt_narrative, translation_lang>
You can find more details about the Python API here.
ir_datasets export csl/trec-2023 queries
[query_id] [title] [description] [narrative] [ht_title] [ht_description] [ht_narrative] [mt_title] [mt_description] [mt_narrative] [translation_lang]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.csl.trec-2023.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from csl
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("csl/trec-2023")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, abstract, keywords, category, category_eng, discipline, discipline_eng>
You can find more details about the Python API here.
ir_datasets export csl/trec-2023 docs
[doc_id] [title] [abstract] [keywords] [category] [category_eng] [discipline] [discipline_eng]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.csl.trec-2023')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 0 | Not-valuable. Information in the document might be included in a report footnote, or omitted entirely. | 11K | 94.3% |
| 1 | Somewhat-valuable. The most valuable information in the document would be found in the remainder of such a report. | 419 | 3.7% |
| 3 | Very-valuable. Information in the document would be found in the lead paragraph of a report that is later written on the topic. | 228 | 2.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("csl/trec-2023")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export csl/trec-2023 qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.csl.trec-2023.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
{
"docs": {
"count": 395927,
"fields": {
"doc_id": {
"max_len": 10,
"common_prefix": "csl-"
}
}
},
"queries": {
"count": 41
},
"qrels": {
"count": 11291,
"fields": {
"relevance": {
"counts_by_value": {
"0": 10644,
"1": 419,
"3": 228
}
}
}
}
}