ir_datasets: Istella22The Istella22 dataset facilitates comparisions between traditional and neural learning-to-rank by including query and document text along with LTR features (not included in ir_datasets).
Note that to use the dataset, you must read and accept the Istella22 License Agreement. By using the dataset, you agree to be bound by the terms of the license: the Istella dataset is solely for non-commercial use.
Language: multiple/other/unknown
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, url, text, extra_text, lang, lang_pct>
You can find more details about the Python API here.
ir_datasets export istella22 docs
[doc_id] [title] [url] [text] [extra_text] [lang] [lang_pct]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.istella22')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@inproceedings{Dato2022Istella, title={The Istella22 Dataset: Bridging Traditional and Neural Learning to Rank Evaluation}, author={Domenico Dato, Sean MacAvaney, Franco Maria Nardini, Raffaele Perego, Nicola Tonellotto}, booktitle={Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval}, year={2022} }{
"docs": {
"count": 8421456,
"fields": {
"doc_id": {
"max_len": 16,
"common_prefix": "1990"
}
}
}
}
Official test query set.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export istella22/test queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.istella22.test.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from istella22
Language: multiple/other/unknown
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, url, text, extra_text, lang, lang_pct>
You can find more details about the Python API here.
ir_datasets export istella22/test docs
[doc_id] [title] [url] [text] [extra_text] [lang] [lang_pct]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.istella22.test')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | Least relevant | 6.1K | 56.8% |
| 2 | Somewhat relevant | 1.0K | 9.4% |
| 3 | Mostly relevant | 2.6K | 24.1% |
| 4 | Perfectly relevant | 1.0K | 9.7% |
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export istella22/test qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.istella22.test.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
{
"docs": {
"count": 8421456,
"fields": {
"doc_id": {
"max_len": 16,
"common_prefix": "1990"
}
}
},
"queries": {
"count": 2198
},
"qrels": {
"count": 10693,
"fields": {
"relevance": {
"counts_by_value": {
"3": 2573,
"4": 1040,
"1": 6070,
"2": 1010
}
}
}
}
}
Official test query set.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold1")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export istella22/test/fold1 queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.istella22.test.fold1.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from istella22
Language: multiple/other/unknown
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold1")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, url, text, extra_text, lang, lang_pct>
You can find more details about the Python API here.
ir_datasets export istella22/test/fold1 docs
[doc_id] [title] [url] [text] [extra_text] [lang] [lang_pct]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.istella22.test.fold1')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | Least relevant | 1.2K | 55.9% |
| 2 | Somewhat relevant | 201 | 9.3% |
| 3 | Mostly relevant | 560 | 25.9% |
| 4 | Perfectly relevant | 194 | 9.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold1")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance>
You can find more details about the Python API here.
ir_datasets export istella22/test/fold1 qrels --format tsv
[query_id] [doc_id] [relevance]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.istella22.test.fold1.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
{
"docs": {
"count": 8421456,
"fields": {
"doc_id": {
"max_len": 16,
"common_prefix": "1990"
}
}
},
"queries": {
"count": 440
},
"qrels": {
"count": 2164,
"fields": {
"relevance": {
"counts_by_value": {
"4": 194,
"1": 1209,
"3": 560,
"2": 201
}
}
}
}
}
Official test query set.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold2")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export istella22/test/fold2 queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.istella22.test.fold2.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from istella22
Language: multiple/other/unknown
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold2")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, url, text, extra_text, lang, lang_pct>
You can find more details about the Python API here.
ir_datasets export istella22/test/fold2 docs
[doc_id] [title] [url] [text] [extra_text] [lang] [lang_pct]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.istella22.test.fold2')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | Least relevant | 1.3K | 58.5% |
| 2 | Somewhat relevant | 196 | 9.2% |
| 3 | Mostly relevant | 493 | 23.0% |
| 4 | Perfectly relevant | 200 | 9.3% |
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold2")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance>
You can find more details about the Python API here.
ir_datasets export istella22/test/fold2 qrels --format tsv
[query_id] [doc_id] [relevance]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.istella22.test.fold2.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
{
"docs": {
"count": 8421456,
"fields": {
"doc_id": {
"max_len": 16,
"common_prefix": "1990"
}
}
},
"queries": {
"count": 440
},
"qrels": {
"count": 2140,
"fields": {
"relevance": {
"counts_by_value": {
"3": 493,
"1": 1251,
"4": 200,
"2": 196
}
}
}
}
}
Official test query set.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold3")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export istella22/test/fold3 queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.istella22.test.fold3.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from istella22
Language: multiple/other/unknown
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold3")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, url, text, extra_text, lang, lang_pct>
You can find more details about the Python API here.
ir_datasets export istella22/test/fold3 docs
[doc_id] [title] [url] [text] [extra_text] [lang] [lang_pct]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.istella22.test.fold3')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | Least relevant | 1.2K | 56.5% |
| 2 | Somewhat relevant | 216 | 9.8% |
| 3 | Mostly relevant | 532 | 24.2% |
| 4 | Perfectly relevant | 207 | 9.4% |
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold3")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance>
You can find more details about the Python API here.
ir_datasets export istella22/test/fold3 qrels --format tsv
[query_id] [doc_id] [relevance]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.istella22.test.fold3.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
{
"docs": {
"count": 8421456,
"fields": {
"doc_id": {
"max_len": 16,
"common_prefix": "1990"
}
}
},
"queries": {
"count": 440
},
"qrels": {
"count": 2197,
"fields": {
"relevance": {
"counts_by_value": {
"3": 532,
"1": 1242,
"4": 207,
"2": 216
}
}
}
}
}
Official test query set.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold4")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export istella22/test/fold4 queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.istella22.test.fold4.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from istella22
Language: multiple/other/unknown
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold4")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, url, text, extra_text, lang, lang_pct>
You can find more details about the Python API here.
ir_datasets export istella22/test/fold4 docs
[doc_id] [title] [url] [text] [extra_text] [lang] [lang_pct]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.istella22.test.fold4')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | Least relevant | 1.2K | 56.1% |
| 2 | Somewhat relevant | 192 | 9.2% |
| 3 | Mostly relevant | 512 | 24.4% |
| 4 | Perfectly relevant | 216 | 10.3% |
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold4")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance>
You can find more details about the Python API here.
ir_datasets export istella22/test/fold4 qrels --format tsv
[query_id] [doc_id] [relevance]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.istella22.test.fold4.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
{
"docs": {
"count": 8421456,
"fields": {
"doc_id": {
"max_len": 16,
"common_prefix": "1990"
}
}
},
"queries": {
"count": 439
},
"qrels": {
"count": 2098,
"fields": {
"relevance": {
"counts_by_value": {
"1": 1178,
"4": 216,
"3": 512,
"2": 192
}
}
}
}
}
Official test query set.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold5")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export istella22/test/fold5 queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.istella22.test.fold5.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from istella22
Language: multiple/other/unknown
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold5")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, url, text, extra_text, lang, lang_pct>
You can find more details about the Python API here.
ir_datasets export istella22/test/fold5 docs
[doc_id] [title] [url] [text] [extra_text] [lang] [lang_pct]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.istella22.test.fold5')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | Least relevant | 1.2K | 56.8% |
| 2 | Somewhat relevant | 205 | 9.8% |
| 3 | Mostly relevant | 476 | 22.7% |
| 4 | Perfectly relevant | 223 | 10.6% |
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold5")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance>
You can find more details about the Python API here.
ir_datasets export istella22/test/fold5 qrels --format tsv
[query_id] [doc_id] [relevance]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.istella22.test.fold5.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
{
"docs": {
"count": 8421456,
"fields": {
"doc_id": {
"max_len": 16,
"common_prefix": "1990"
}
}
},
"queries": {
"count": 439
},
"qrels": {
"count": 2094,
"fields": {
"relevance": {
"counts_by_value": {
"3": 476,
"1": 1190,
"4": 223,
"2": 205
}
}
}
}
}