ir_datasets
: Istella22The Istella22 dataset facilitates comparisions between traditional and neural learning-to-rank by including query and document text along with LTR features (not included in ir_datasets).
Note that to use the dataset, you must read and accept the Istella22 License Agreement. By using the dataset, you agree to be bound by the terms of the license: the Istella dataset is solely for non-commercial use.
Language: multiple/other/unknown
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, url, text, extra_text, lang, lang_pct>
You can find more details about the Python API here.
ir_datasets export istella22 docs
[doc_id] [title] [url] [text] [extra_text] [lang] [lang_pct]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@inproceedings{Dato2022Istella, title={The Istella22 Dataset: Bridging Traditional and Neural Learning to Rank Evaluation}, author={Domenico Dato, Sean MacAvaney, Franco Maria Nardini, Raffaele Perego, Nicola Tonellotto}, booktitle={Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval}, year={2022} }{ "docs": { "count": 8421456, "fields": { "doc_id": { "max_len": 16, "common_prefix": "1990" } } } }
Official test query set.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export istella22/test queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from istella22
Language: multiple/other/unknown
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, url, text, extra_text, lang, lang_pct>
You can find more details about the Python API here.
ir_datasets export istella22/test docs
[doc_id] [title] [url] [text] [extra_text] [lang] [lang_pct]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Least relevant | 6.1K | 56.8% |
2 | Somewhat relevant | 1.0K | 9.4% |
3 | Mostly relevant | 2.6K | 24.1% |
4 | Perfectly relevant | 1.0K | 9.7% |
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export istella22/test qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
{ "docs": { "count": 8421456, "fields": { "doc_id": { "max_len": 16, "common_prefix": "1990" } } }, "queries": { "count": 2198 }, "qrels": { "count": 10693, "fields": { "relevance": { "counts_by_value": { "3": 2573, "4": 1040, "1": 6070, "2": 1010 } } } } }
Official test query set.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold1")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export istella22/test/fold1 queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from istella22
Language: multiple/other/unknown
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold1")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, url, text, extra_text, lang, lang_pct>
You can find more details about the Python API here.
ir_datasets export istella22/test/fold1 docs
[doc_id] [title] [url] [text] [extra_text] [lang] [lang_pct]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Least relevant | 1.2K | 55.9% |
2 | Somewhat relevant | 201 | 9.3% |
3 | Mostly relevant | 560 | 25.9% |
4 | Perfectly relevant | 194 | 9.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold1")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance>
You can find more details about the Python API here.
ir_datasets export istella22/test/fold1 qrels --format tsv
[query_id] [doc_id] [relevance]
...
You can find more details about the CLI here.
No example available for PyTerrier
{ "docs": { "count": 8421456, "fields": { "doc_id": { "max_len": 16, "common_prefix": "1990" } } }, "queries": { "count": 440 }, "qrels": { "count": 2164, "fields": { "relevance": { "counts_by_value": { "4": 194, "1": 1209, "3": 560, "2": 201 } } } } }
Official test query set.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold2")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export istella22/test/fold2 queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from istella22
Language: multiple/other/unknown
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold2")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, url, text, extra_text, lang, lang_pct>
You can find more details about the Python API here.
ir_datasets export istella22/test/fold2 docs
[doc_id] [title] [url] [text] [extra_text] [lang] [lang_pct]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Least relevant | 1.3K | 58.5% |
2 | Somewhat relevant | 196 | 9.2% |
3 | Mostly relevant | 493 | 23.0% |
4 | Perfectly relevant | 200 | 9.3% |
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold2")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance>
You can find more details about the Python API here.
ir_datasets export istella22/test/fold2 qrels --format tsv
[query_id] [doc_id] [relevance]
...
You can find more details about the CLI here.
No example available for PyTerrier
{ "docs": { "count": 8421456, "fields": { "doc_id": { "max_len": 16, "common_prefix": "1990" } } }, "queries": { "count": 440 }, "qrels": { "count": 2140, "fields": { "relevance": { "counts_by_value": { "3": 493, "1": 1251, "4": 200, "2": 196 } } } } }
Official test query set.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold3")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export istella22/test/fold3 queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from istella22
Language: multiple/other/unknown
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold3")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, url, text, extra_text, lang, lang_pct>
You can find more details about the Python API here.
ir_datasets export istella22/test/fold3 docs
[doc_id] [title] [url] [text] [extra_text] [lang] [lang_pct]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Least relevant | 1.2K | 56.5% |
2 | Somewhat relevant | 216 | 9.8% |
3 | Mostly relevant | 532 | 24.2% |
4 | Perfectly relevant | 207 | 9.4% |
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold3")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance>
You can find more details about the Python API here.
ir_datasets export istella22/test/fold3 qrels --format tsv
[query_id] [doc_id] [relevance]
...
You can find more details about the CLI here.
No example available for PyTerrier
{ "docs": { "count": 8421456, "fields": { "doc_id": { "max_len": 16, "common_prefix": "1990" } } }, "queries": { "count": 440 }, "qrels": { "count": 2197, "fields": { "relevance": { "counts_by_value": { "3": 532, "1": 1242, "4": 207, "2": 216 } } } } }
Official test query set.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold4")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export istella22/test/fold4 queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from istella22
Language: multiple/other/unknown
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold4")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, url, text, extra_text, lang, lang_pct>
You can find more details about the Python API here.
ir_datasets export istella22/test/fold4 docs
[doc_id] [title] [url] [text] [extra_text] [lang] [lang_pct]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Least relevant | 1.2K | 56.1% |
2 | Somewhat relevant | 192 | 9.2% |
3 | Mostly relevant | 512 | 24.4% |
4 | Perfectly relevant | 216 | 10.3% |
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold4")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance>
You can find more details about the Python API here.
ir_datasets export istella22/test/fold4 qrels --format tsv
[query_id] [doc_id] [relevance]
...
You can find more details about the CLI here.
No example available for PyTerrier
{ "docs": { "count": 8421456, "fields": { "doc_id": { "max_len": 16, "common_prefix": "1990" } } }, "queries": { "count": 439 }, "qrels": { "count": 2098, "fields": { "relevance": { "counts_by_value": { "1": 1178, "4": 216, "3": 512, "2": 192 } } } } }
Official test query set.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold5")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export istella22/test/fold5 queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from istella22
Language: multiple/other/unknown
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold5")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, url, text, extra_text, lang, lang_pct>
You can find more details about the Python API here.
ir_datasets export istella22/test/fold5 docs
[doc_id] [title] [url] [text] [extra_text] [lang] [lang_pct]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Least relevant | 1.2K | 56.8% |
2 | Somewhat relevant | 205 | 9.8% |
3 | Mostly relevant | 476 | 22.7% |
4 | Perfectly relevant | 223 | 10.6% |
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold5")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance>
You can find more details about the Python API here.
ir_datasets export istella22/test/fold5 qrels --format tsv
[query_id] [doc_id] [relevance]
...
You can find more details about the CLI here.
No example available for PyTerrier
{ "docs": { "count": 8421456, "fields": { "doc_id": { "max_len": 16, "common_prefix": "1990" } } }, "queries": { "count": 439 }, "qrels": { "count": 2094, "fields": { "relevance": { "counts_by_value": { "3": 476, "1": 1190, "4": 223, "2": 205 } } } } }