← home
Github: datasets/istella22.py

ir_datasets: Istella22

Index
  1. istella22
  2. istella22/test
  3. istella22/test/fold1
  4. istella22/test/fold2
  5. istella22/test/fold3
  6. istella22/test/fold4
  7. istella22/test/fold5

"istella22"

The Istella22 dataset facilitates comparisions between traditional and neural learning-to-rank by including query and document text along with LTR features (not included in ir_datasets).

Note that to use the dataset, you must read and accept the Istella22 License Agreement. By using the dataset, you agree to be bound by the terms of the license: the Istella dataset is solely for non-commercial use.

docs
8.4M docs

Language: multiple/other/unknown

Document type:
Istella22Doc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. url: str
  4. text: str
  5. extra_text: str
  6. lang: str
  7. lang_pct: int

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("istella22")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, url, text, extra_text, lang, lang_pct>

You can find more details about the Python API here.

CLI
ir_datasets export istella22 docs
[doc_id]    [title]    [url]    [text]    [extra_text]    [lang]    [lang_pct]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.istella22')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Dato2022Istella}

Bibtex:

@inproceedings{Dato2022Istella, title={The Istella22 Dataset: Bridging Traditional and Neural Learning to Rank Evaluation}, author={Domenico Dato, Sean MacAvaney, Franco Maria Nardini, Raffaele Perego, Nicola Tonellotto}, booktitle={Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval}, year={2022} }
Metadata

"istella22/test"

Official test query set.

queries
2.2K queries

Language: it

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("istella22/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export istella22/test queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.istella22.test.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.4M docs

Inherits docs from istella22

Language: multiple/other/unknown

Document type:
Istella22Doc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. url: str
  4. text: str
  5. extra_text: str
  6. lang: str
  7. lang_pct: int

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("istella22/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, url, text, extra_text, lang, lang_pct>

You can find more details about the Python API here.

CLI
ir_datasets export istella22/test docs
[doc_id]    [title]    [url]    [text]    [extra_text]    [lang]    [lang_pct]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.istella22.test')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
11K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Least relevant6.1K56.8%
2Somewhat relevant1.0K9.4%
3Mostly relevant2.6K24.1%
4Perfectly relevant1.0K9.7%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("istella22/test")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export istella22/test qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.istella22.test.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Metadata

"istella22/test/fold1"

Official test query set.

queries
440 queries

Language: it

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold1")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export istella22/test/fold1 queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.istella22.test.fold1.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.4M docs

Inherits docs from istella22

Language: multiple/other/unknown

Document type:
Istella22Doc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. url: str
  4. text: str
  5. extra_text: str
  6. lang: str
  7. lang_pct: int

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold1")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, url, text, extra_text, lang, lang_pct>

You can find more details about the Python API here.

CLI
ir_datasets export istella22/test/fold1 docs
[doc_id]    [title]    [url]    [text]    [extra_text]    [lang]    [lang_pct]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.istella22.test.fold1')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
2.2K qrels
Query relevance judgment type:
GenericQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int

Relevance levels

Rel.DefinitionCount%
1Least relevant1.2K55.9%
2Somewhat relevant201 9.3%
3Mostly relevant560 25.9%
4Perfectly relevant194 9.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold1")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance>

You can find more details about the Python API here.

CLI
ir_datasets export istella22/test/fold1 qrels --format tsv
[query_id]    [doc_id]    [relevance]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.istella22.test.fold1.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Metadata

"istella22/test/fold2"

Official test query set.

queries
440 queries

Language: it

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold2")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export istella22/test/fold2 queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.istella22.test.fold2.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.4M docs

Inherits docs from istella22

Language: multiple/other/unknown

Document type:
Istella22Doc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. url: str
  4. text: str
  5. extra_text: str
  6. lang: str
  7. lang_pct: int

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold2")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, url, text, extra_text, lang, lang_pct>

You can find more details about the Python API here.

CLI
ir_datasets export istella22/test/fold2 docs
[doc_id]    [title]    [url]    [text]    [extra_text]    [lang]    [lang_pct]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.istella22.test.fold2')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
2.1K qrels
Query relevance judgment type:
GenericQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int

Relevance levels

Rel.DefinitionCount%
1Least relevant1.3K58.5%
2Somewhat relevant196 9.2%
3Mostly relevant493 23.0%
4Perfectly relevant200 9.3%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold2")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance>

You can find more details about the Python API here.

CLI
ir_datasets export istella22/test/fold2 qrels --format tsv
[query_id]    [doc_id]    [relevance]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.istella22.test.fold2.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Metadata

"istella22/test/fold3"

Official test query set.

queries
440 queries

Language: it

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold3")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export istella22/test/fold3 queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.istella22.test.fold3.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.4M docs

Inherits docs from istella22

Language: multiple/other/unknown

Document type:
Istella22Doc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. url: str
  4. text: str
  5. extra_text: str
  6. lang: str
  7. lang_pct: int

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold3")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, url, text, extra_text, lang, lang_pct>

You can find more details about the Python API here.

CLI
ir_datasets export istella22/test/fold3 docs
[doc_id]    [title]    [url]    [text]    [extra_text]    [lang]    [lang_pct]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.istella22.test.fold3')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
2.2K qrels
Query relevance judgment type:
GenericQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int

Relevance levels

Rel.DefinitionCount%
1Least relevant1.2K56.5%
2Somewhat relevant216 9.8%
3Mostly relevant532 24.2%
4Perfectly relevant207 9.4%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold3")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance>

You can find more details about the Python API here.

CLI
ir_datasets export istella22/test/fold3 qrels --format tsv
[query_id]    [doc_id]    [relevance]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.istella22.test.fold3.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Metadata

"istella22/test/fold4"

Official test query set.

queries
439 queries

Language: it

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold4")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export istella22/test/fold4 queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.istella22.test.fold4.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.4M docs

Inherits docs from istella22

Language: multiple/other/unknown

Document type:
Istella22Doc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. url: str
  4. text: str
  5. extra_text: str
  6. lang: str
  7. lang_pct: int

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold4")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, url, text, extra_text, lang, lang_pct>

You can find more details about the Python API here.

CLI
ir_datasets export istella22/test/fold4 docs
[doc_id]    [title]    [url]    [text]    [extra_text]    [lang]    [lang_pct]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.istella22.test.fold4')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
2.1K qrels
Query relevance judgment type:
GenericQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int

Relevance levels

Rel.DefinitionCount%
1Least relevant1.2K56.1%
2Somewhat relevant192 9.2%
3Mostly relevant512 24.4%
4Perfectly relevant216 10.3%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold4")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance>

You can find more details about the Python API here.

CLI
ir_datasets export istella22/test/fold4 qrels --format tsv
[query_id]    [doc_id]    [relevance]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.istella22.test.fold4.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Metadata

"istella22/test/fold5"

Official test query set.

queries
439 queries

Language: it

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold5")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export istella22/test/fold5 queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.istella22.test.fold5.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
8.4M docs

Inherits docs from istella22

Language: multiple/other/unknown

Document type:
Istella22Doc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. url: str
  4. text: str
  5. extra_text: str
  6. lang: str
  7. lang_pct: int

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold5")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, url, text, extra_text, lang, lang_pct>

You can find more details about the Python API here.

CLI
ir_datasets export istella22/test/fold5 docs
[doc_id]    [title]    [url]    [text]    [extra_text]    [lang]    [lang_pct]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.istella22.test.fold5')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
2.1K qrels
Query relevance judgment type:
GenericQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int

Relevance levels

Rel.DefinitionCount%
1Least relevant1.2K56.8%
2Somewhat relevant205 9.8%
3Mostly relevant476 22.7%
4Perfectly relevant223 10.6%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("istella22/test/fold5")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance>

You can find more details about the Python API here.

CLI
ir_datasets export istella22/test/fold5 qrels --format tsv
[query_id]    [doc_id]    [relevance]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.istella22.test.fold5.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Metadata