ir_datasets
: CSLThe CSL dataset, used for the TREC NueCLIR technical document task.
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("csl")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, abstract, keywords, category, category_eng, discipline, discipline_eng>
You can find more details about the Python API here.
ir_datasets export csl docs
[doc_id] [title] [abstract] [keywords] [category] [category_eng] [discipline] [discipline_eng]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.csl')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
{ "docs": { "count": 395927, "fields": { "doc_id": { "max_len": 10, "common_prefix": "csl-" } } } }
The TREC NeuCLIR 2023 technical documen task.
Language: multiple/other/unknown
Examples:
import ir_datasets
dataset = ir_datasets.load("csl/trec-2023")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, description, narrative, ht_title, ht_description, ht_narrative, mt_title, mt_description, mt_narrative, translation_lang>
You can find more details about the Python API here.
ir_datasets export csl/trec-2023 queries
[query_id] [title] [description] [narrative] [ht_title] [ht_description] [ht_narrative] [mt_title] [mt_description] [mt_narrative] [translation_lang]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.csl.trec-2023.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from csl
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("csl/trec-2023")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, abstract, keywords, category, category_eng, discipline, discipline_eng>
You can find more details about the Python API here.
ir_datasets export csl/trec-2023 docs
[doc_id] [title] [abstract] [keywords] [category] [category_eng] [discipline] [discipline_eng]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.csl.trec-2023')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
0 | Not-valuable. Information in the document might be included in a report footnote, or omitted entirely. | 11K | 94.3% |
1 | Somewhat-valuable. The most valuable information in the document would be found in the remainder of such a report. | 419 | 3.7% |
3 | Very-valuable. Information in the document would be found in the lead paragraph of a report that is later written on the topic. | 228 | 2.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("csl/trec-2023")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export csl/trec-2023 qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.csl.trec-2023.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
{ "docs": { "count": 395927, "fields": { "doc_id": { "max_len": 10, "common_prefix": "csl-" } } }, "queries": { "count": 41 }, "qrels": { "count": 11291, "fields": { "relevance": { "counts_by_value": { "0": 10644, "1": 419, "3": 228 } } } } }