ir_datasets : Istella22

import ir_datasets
dataset = ir_datasets.load("istella22")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, url, text, extra_text, lang, lang_pct>

You can find more details about the Python API here.

CLI

ir_datasets export istella22 docs



[doc_id]    [title]    [url]    [text]    [extra_text]    [lang]    [lang_pct]
...

You can find more details about the CLI here.

No example available for PyTerrier

Citation

ir_datasets.bib:

\cite{Dato2022Istella}

Bibtex:

@inproceedings{Dato2022Istella, title={The Istella22 Dataset: Bridging Traditional and Neural Learning to Rank Evaluation}, author={Domenico Dato, Sean MacAvaney, Franco Maria Nardini, Raffaele Perego, Nicola Tonellotto}, booktitle={Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval}, year={2022} }

{
  "docs": {
    "count": 8421456,
    "fields": {
      "doc_id": {
        "max_len": 16,
        "common_prefix": "1990"
      }
    }
  }
}

`"istella22/test"`

Official test query set.

2.2K queries

Language: it

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("istella22/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export istella22/test queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

docs

8.4M docs

Inherits docs from istella22

Language: multiple/other/unknown

Document type:

Istella22Doc: (namedtuple)

doc_id: str
title: str
url: str
text: str
extra_text: str
lang: str
lang_pct: int

Examples:

import ir_datasets
dataset = ir_datasets.load("istella22/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, url, text, extra_text, lang, lang_pct>

You can find more details about the Python API here.

CLI

ir_datasets export istella22/test docs



[doc_id]    [title]    [url]    [text]    [extra_text]    [lang]    [lang_pct]
...

You can find more details about the CLI here.

No example available for PyTerrier

qrels

11K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	Least relevant	`6.1K`	56.8%
2	Somewhat relevant	`1.0K`	9.4%
3	Mostly relevant	`2.6K`	24.1%
4	Perfectly relevant	`1.0K`	9.7%

Examples:

import ir_datasets
dataset = ir_datasets.load("istella22/test")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export istella22/test qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

{
  "docs": {
    "count": 8421456,
    "fields": {
      "doc_id": {
        "max_len": 16,
        "common_prefix": "1990"
      }
    }
  },
  "queries": {
    "count": 2198
  },
  "qrels": {
    "count": 10693,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "3": 2573,
          "4": 1040,
          "1": 6070,
          "2": 1010
        }
      }
    }
  }
}

`"istella22/test/fold1"`

Official test query set.

440 queries

Language: it

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("istella22/test/fold1")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export istella22/test/fold1 queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

docs

8.4M docs

Inherits docs from istella22

Language: multiple/other/unknown

Document type:

Istella22Doc: (namedtuple)

doc_id: str
title: str
url: str
text: str
extra_text: str
lang: str
lang_pct: int

Examples:

import ir_datasets
dataset = ir_datasets.load("istella22/test/fold1")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, url, text, extra_text, lang, lang_pct>

You can find more details about the Python API here.

CLI

ir_datasets export istella22/test/fold1 docs



[doc_id]    [title]    [url]    [text]    [extra_text]    [lang]    [lang_pct]
...

You can find more details about the CLI here.

No example available for PyTerrier

qrels

2.2K qrels

Query relevance judgment type:

GenericQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int

Relevance levels

Rel.	Definition	Count	%
1	Least relevant	`1.2K`	55.9%
2	Somewhat relevant	`201`	9.3%
3	Mostly relevant	`560`	25.9%
4	Perfectly relevant	`194`	9.0%

Examples:

import ir_datasets
dataset = ir_datasets.load("istella22/test/fold1")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance>

You can find more details about the Python API here.

CLI

ir_datasets export istella22/test/fold1 qrels --format tsv



[query_id]    [doc_id]    [relevance]
...

You can find more details about the CLI here.

No example available for PyTerrier

{
  "docs": {
    "count": 8421456,
    "fields": {
      "doc_id": {
        "max_len": 16,
        "common_prefix": "1990"
      }
    }
  },
  "queries": {
    "count": 440
  },
  "qrels": {
    "count": 2164,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "4": 194,
          "1": 1209,
          "3": 560,
          "2": 201
        }
      }
    }
  }
}

`"istella22/test/fold2"`

Official test query set.

440 queries

Language: it

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("istella22/test/fold2")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export istella22/test/fold2 queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

docs

8.4M docs

Inherits docs from istella22

Language: multiple/other/unknown

Document type:

Istella22Doc: (namedtuple)

doc_id: str
title: str
url: str
text: str
extra_text: str
lang: str
lang_pct: int

Examples:

import ir_datasets
dataset = ir_datasets.load("istella22/test/fold2")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, url, text, extra_text, lang, lang_pct>

You can find more details about the Python API here.

CLI

ir_datasets export istella22/test/fold2 docs



[doc_id]    [title]    [url]    [text]    [extra_text]    [lang]    [lang_pct]
...

You can find more details about the CLI here.

No example available for PyTerrier

qrels

2.1K qrels

Query relevance judgment type:

GenericQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int

Relevance levels

Rel.	Definition	Count	%
1	Least relevant	`1.3K`	58.5%
2	Somewhat relevant	`196`	9.2%
3	Mostly relevant	`493`	23.0%
4	Perfectly relevant	`200`	9.3%

Examples:

import ir_datasets
dataset = ir_datasets.load("istella22/test/fold2")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance>

You can find more details about the Python API here.

CLI

ir_datasets export istella22/test/fold2 qrels --format tsv



[query_id]    [doc_id]    [relevance]
...

You can find more details about the CLI here.

No example available for PyTerrier

{
  "docs": {
    "count": 8421456,
    "fields": {
      "doc_id": {
        "max_len": 16,
        "common_prefix": "1990"
      }
    }
  },
  "queries": {
    "count": 440
  },
  "qrels": {
    "count": 2140,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "3": 493,
          "1": 1251,
          "4": 200,
          "2": 196
        }
      }
    }
  }
}

`"istella22/test/fold3"`

Official test query set.

440 queries

Language: it

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("istella22/test/fold3")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export istella22/test/fold3 queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

docs

8.4M docs

Inherits docs from istella22

Language: multiple/other/unknown

Document type:

Istella22Doc: (namedtuple)

doc_id: str
title: str
url: str
text: str
extra_text: str
lang: str
lang_pct: int

Examples:

import ir_datasets
dataset = ir_datasets.load("istella22/test/fold3")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, url, text, extra_text, lang, lang_pct>

You can find more details about the Python API here.

CLI

ir_datasets export istella22/test/fold3 docs



[doc_id]    [title]    [url]    [text]    [extra_text]    [lang]    [lang_pct]
...

You can find more details about the CLI here.

No example available for PyTerrier

qrels

2.2K qrels

Query relevance judgment type:

GenericQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int

Relevance levels

Rel.	Definition	Count	%
1	Least relevant	`1.2K`	56.5%
2	Somewhat relevant	`216`	9.8%
3	Mostly relevant	`532`	24.2%
4	Perfectly relevant	`207`	9.4%

Examples:

import ir_datasets
dataset = ir_datasets.load("istella22/test/fold3")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance>

You can find more details about the Python API here.

CLI

ir_datasets export istella22/test/fold3 qrels --format tsv



[query_id]    [doc_id]    [relevance]
...

You can find more details about the CLI here.

No example available for PyTerrier

{
  "docs": {
    "count": 8421456,
    "fields": {
      "doc_id": {
        "max_len": 16,
        "common_prefix": "1990"
      }
    }
  },
  "queries": {
    "count": 440
  },
  "qrels": {
    "count": 2197,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "3": 532,
          "1": 1242,
          "4": 207,
          "2": 216
        }
      }
    }
  }
}

`"istella22/test/fold4"`

Official test query set.

439 queries

Language: it

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("istella22/test/fold4")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export istella22/test/fold4 queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

docs

8.4M docs

Inherits docs from istella22

Language: multiple/other/unknown

Document type:

Istella22Doc: (namedtuple)

doc_id: str
title: str
url: str
text: str
extra_text: str
lang: str
lang_pct: int

Examples:

import ir_datasets
dataset = ir_datasets.load("istella22/test/fold4")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, url, text, extra_text, lang, lang_pct>

You can find more details about the Python API here.

CLI

ir_datasets export istella22/test/fold4 docs



[doc_id]    [title]    [url]    [text]    [extra_text]    [lang]    [lang_pct]
...

You can find more details about the CLI here.

No example available for PyTerrier

qrels

2.1K qrels

Query relevance judgment type:

GenericQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int

Relevance levels

Rel.	Definition	Count	%
1	Least relevant	`1.2K`	56.1%
2	Somewhat relevant	`192`	9.2%
3	Mostly relevant	`512`	24.4%
4	Perfectly relevant	`216`	10.3%

Examples:

import ir_datasets
dataset = ir_datasets.load("istella22/test/fold4")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance>

You can find more details about the Python API here.

CLI

ir_datasets export istella22/test/fold4 qrels --format tsv



[query_id]    [doc_id]    [relevance]
...

You can find more details about the CLI here.

No example available for PyTerrier

{
  "docs": {
    "count": 8421456,
    "fields": {
      "doc_id": {
        "max_len": 16,
        "common_prefix": "1990"
      }
    }
  },
  "queries": {
    "count": 439
  },
  "qrels": {
    "count": 2098,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 1178,
          "4": 216,
          "3": 512,
          "2": 192
        }
      }
    }
  }
}

`"istella22/test/fold5"`

Official test query set.

439 queries

Language: it

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("istella22/test/fold5")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export istella22/test/fold5 queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

docs

8.4M docs

Inherits docs from istella22

Language: multiple/other/unknown

Document type:

Istella22Doc: (namedtuple)

doc_id: str
title: str
url: str
text: str
extra_text: str
lang: str
lang_pct: int

Examples:

import ir_datasets
dataset = ir_datasets.load("istella22/test/fold5")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, url, text, extra_text, lang, lang_pct>

You can find more details about the Python API here.

CLI

ir_datasets export istella22/test/fold5 docs



[doc_id]    [title]    [url]    [text]    [extra_text]    [lang]    [lang_pct]
...

You can find more details about the CLI here.

No example available for PyTerrier

qrels

2.1K qrels

Query relevance judgment type:

GenericQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int

Relevance levels

Rel.	Definition	Count	%
1	Least relevant	`1.2K`	56.8%
2	Somewhat relevant	`205`	9.8%
3	Mostly relevant	`476`	22.7%
4	Perfectly relevant	`223`	10.6%

Examples:

import ir_datasets
dataset = ir_datasets.load("istella22/test/fold5")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance>

You can find more details about the Python API here.

CLI

ir_datasets export istella22/test/fold5 qrels --format tsv



[query_id]    [doc_id]    [relevance]
...

You can find more details about the CLI here.

No example available for PyTerrier