ir_datasets: Istella22The Istella22 dataset facilitates comparisions between traditional and neural learning-to-rank by including query and document text along with LTR features (not included in ir_datasets).
Note that to use the dataset, you must read and accept the Istella22 License Agreement. By using the dataset, you agree to be bound by the terms of the license: the Istella dataset is solely for non-commercial use.
Language: multiple/other/unknown
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, url, text, extra_text, lang, lang_pct>
You can find more details about the Python API here.
ir_datasets export istella22 docs
[doc_id]    [title]    [url]    [text]    [extra_text]    [lang]    [lang_pct]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.istella22')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@inproceedings{Dato2022Istella, title={The Istella22 Dataset: Bridging Traditional and Neural Learning to Rank Evaluation}, author={Domenico Dato, Sean MacAvaney, Franco Maria Nardini, Raffaele Perego, Nicola Tonellotto}, booktitle={Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval}, year={2022} }{
  "docs": {
    "count": 8421456,
    "fields": {
      "doc_id": {
        "max_len": 16,
        "common_prefix": "1990"
      }
    }
  }
}
Official test query set.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export istella22/test queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.istella22.test.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from istella22
Language: multiple/other/unknown
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, url, text, extra_text, lang, lang_pct>
You can find more details about the Python API here.
ir_datasets export istella22/test docs
[doc_id]    [title]    [url]    [text]    [extra_text]    [lang]    [lang_pct]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.istella22.test')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Least relevant | 6.1K | 56.8% | 
| 2 | Somewhat relevant | 1.0K | 9.4% | 
| 3 | Mostly relevant | 2.6K | 24.1% | 
| 4 | Perfectly relevant | 1.0K | 9.7% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export istella22/test qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.istella22.test.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
{
  "docs": {
    "count": 8421456,
    "fields": {
      "doc_id": {
        "max_len": 16,
        "common_prefix": "1990"
      }
    }
  },
  "queries": {
    "count": 2198
  },
  "qrels": {
    "count": 10693,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "3": 2573,
          "4": 1040,
          "1": 6070,
          "2": 1010
        }
      }
    }
  }
}
Official test query set.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold1")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export istella22/test/fold1 queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.istella22.test.fold1.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from istella22
Language: multiple/other/unknown
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold1")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, url, text, extra_text, lang, lang_pct>
You can find more details about the Python API here.
ir_datasets export istella22/test/fold1 docs
[doc_id]    [title]    [url]    [text]    [extra_text]    [lang]    [lang_pct]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.istella22.test.fold1')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Least relevant | 1.2K | 55.9% | 
| 2 | Somewhat relevant | 201 | 9.3% | 
| 3 | Mostly relevant | 560 | 25.9% | 
| 4 | Perfectly relevant | 194 | 9.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold1")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance>
You can find more details about the Python API here.
ir_datasets export istella22/test/fold1 qrels --format tsv
[query_id]    [doc_id]    [relevance]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.istella22.test.fold1.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
{
  "docs": {
    "count": 8421456,
    "fields": {
      "doc_id": {
        "max_len": 16,
        "common_prefix": "1990"
      }
    }
  },
  "queries": {
    "count": 440
  },
  "qrels": {
    "count": 2164,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "4": 194,
          "1": 1209,
          "3": 560,
          "2": 201
        }
      }
    }
  }
}
Official test query set.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold2")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export istella22/test/fold2 queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.istella22.test.fold2.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from istella22
Language: multiple/other/unknown
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold2")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, url, text, extra_text, lang, lang_pct>
You can find more details about the Python API here.
ir_datasets export istella22/test/fold2 docs
[doc_id]    [title]    [url]    [text]    [extra_text]    [lang]    [lang_pct]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.istella22.test.fold2')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Least relevant | 1.3K | 58.5% | 
| 2 | Somewhat relevant | 196 | 9.2% | 
| 3 | Mostly relevant | 493 | 23.0% | 
| 4 | Perfectly relevant | 200 | 9.3% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold2")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance>
You can find more details about the Python API here.
ir_datasets export istella22/test/fold2 qrels --format tsv
[query_id]    [doc_id]    [relevance]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.istella22.test.fold2.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
{
  "docs": {
    "count": 8421456,
    "fields": {
      "doc_id": {
        "max_len": 16,
        "common_prefix": "1990"
      }
    }
  },
  "queries": {
    "count": 440
  },
  "qrels": {
    "count": 2140,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "3": 493,
          "1": 1251,
          "4": 200,
          "2": 196
        }
      }
    }
  }
}
Official test query set.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold3")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export istella22/test/fold3 queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.istella22.test.fold3.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from istella22
Language: multiple/other/unknown
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold3")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, url, text, extra_text, lang, lang_pct>
You can find more details about the Python API here.
ir_datasets export istella22/test/fold3 docs
[doc_id]    [title]    [url]    [text]    [extra_text]    [lang]    [lang_pct]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.istella22.test.fold3')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Least relevant | 1.2K | 56.5% | 
| 2 | Somewhat relevant | 216 | 9.8% | 
| 3 | Mostly relevant | 532 | 24.2% | 
| 4 | Perfectly relevant | 207 | 9.4% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold3")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance>
You can find more details about the Python API here.
ir_datasets export istella22/test/fold3 qrels --format tsv
[query_id]    [doc_id]    [relevance]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.istella22.test.fold3.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
{
  "docs": {
    "count": 8421456,
    "fields": {
      "doc_id": {
        "max_len": 16,
        "common_prefix": "1990"
      }
    }
  },
  "queries": {
    "count": 440
  },
  "qrels": {
    "count": 2197,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "3": 532,
          "1": 1242,
          "4": 207,
          "2": 216
        }
      }
    }
  }
}
Official test query set.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold4")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export istella22/test/fold4 queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.istella22.test.fold4.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from istella22
Language: multiple/other/unknown
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold4")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, url, text, extra_text, lang, lang_pct>
You can find more details about the Python API here.
ir_datasets export istella22/test/fold4 docs
[doc_id]    [title]    [url]    [text]    [extra_text]    [lang]    [lang_pct]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.istella22.test.fold4')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Least relevant | 1.2K | 56.1% | 
| 2 | Somewhat relevant | 192 | 9.2% | 
| 3 | Mostly relevant | 512 | 24.4% | 
| 4 | Perfectly relevant | 216 | 10.3% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold4")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance>
You can find more details about the Python API here.
ir_datasets export istella22/test/fold4 qrels --format tsv
[query_id]    [doc_id]    [relevance]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.istella22.test.fold4.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
{
  "docs": {
    "count": 8421456,
    "fields": {
      "doc_id": {
        "max_len": 16,
        "common_prefix": "1990"
      }
    }
  },
  "queries": {
    "count": 439
  },
  "qrels": {
    "count": 2098,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 1178,
          "4": 216,
          "3": 512,
          "2": 192
        }
      }
    }
  }
}
Official test query set.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold5")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export istella22/test/fold5 queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.istella22.test.fold5.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from istella22
Language: multiple/other/unknown
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold5")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, url, text, extra_text, lang, lang_pct>
You can find more details about the Python API here.
ir_datasets export istella22/test/fold5 docs
[doc_id]    [title]    [url]    [text]    [extra_text]    [lang]    [lang_pct]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.istella22.test.fold5')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Least relevant | 1.2K | 56.8% | 
| 2 | Somewhat relevant | 205 | 9.8% | 
| 3 | Mostly relevant | 476 | 22.7% | 
| 4 | Perfectly relevant | 223 | 10.6% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold5")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance>
You can find more details about the Python API here.
ir_datasets export istella22/test/fold5 qrels --format tsv
[query_id]    [doc_id]    [relevance]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.istella22.test.fold5.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
{
  "docs": {
    "count": 8421456,
    "fields": {
      "doc_id": {
        "max_len": 16,
        "common_prefix": "1990"
      }
    }
  },
  "queries": {
    "count": 439
  },
  "qrels": {
    "count": 2094,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "3": 476,
          "1": 1190,
          "4": 223,
          "2": 205
        }
      }
    }
  }
}