ir_datasets : TREC Tip-of-the-Tongue

import ir_datasets
dataset = ir_datasets.load("trec-tot/2023")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, page_title, wikidata_id, wikidata_classes, text, sections, infoboxes>

You can find more details about the Python API here.

CLI

ir_datasets export trec-tot/2023 docs



[doc_id]    [page_title]    [wikidata_id]    [wikidata_classes]    [text]    [sections]    [infoboxes]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2023')
# Index trec-tot/2023
indexer = pt.IterDictIndexer('./indices/trec-tot_2023')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['page_title', 'wikidata_id', 'text'])

You can find more details about PyTerrier indexing here.

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.trec-tot.2023')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

{
  "docs": {
    "count": 231852,
    "fields": {
      "doc_id": {
        "max_len": 8,
        "common_prefix": ""
      }
    }
  }
}

`"trec-tot/2023/dev"`

Dev query set for TREC 2023 tip-of-the-tongue search track.

150 queries

Language: en

Query type:

TipOfTheTongueQuery: (namedtuple)

query_id: str
url: str
domain: str
title: str
text: str
sentence_annotations: List[Dict[str,str]]

Examples:

import ir_datasets
dataset = ir_datasets.load("trec-tot/2023/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, url, domain, title, text, sentence_annotations>

You can find more details about the Python API here.

CLI

ir_datasets export trec-tot/2023/dev queries



[query_id]    [url]    [domain]    [title]    [text]    [sentence_annotations]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2023/dev')
index_ref = pt.IndexRef.of('./indices/trec-tot_2023') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('url'))

You can find more details about PyTerrier retrieval here.

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.trec-tot.2023.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

232K docs

Inherits docs from trec-tot/2023

Language: en

Document type:

TipOfTheTongueDoc: (namedtuple)

doc_id: str
page_title: str
wikidata_id: str
wikidata_classes: List[str]
text: str
sections: Dict[str,str]
infoboxes: List[Dict[str,str]]

Examples:

import ir_datasets
dataset = ir_datasets.load("trec-tot/2023/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, page_title, wikidata_id, wikidata_classes, text, sections, infoboxes>

You can find more details about the Python API here.

CLI

ir_datasets export trec-tot/2023/dev docs



[doc_id]    [page_title]    [wikidata_id]    [wikidata_classes]    [text]    [sections]    [infoboxes]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2023/dev')
# Index trec-tot/2023
indexer = pt.IterDictIndexer('./indices/trec-tot_2023')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['page_title', 'wikidata_id', 'text'])

You can find more details about PyTerrier indexing here.

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.trec-tot.2023.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

150 qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
0	Not Relevant	`0`	0.0%
1	Relevant	`150`	100.0%

Examples:

import ir_datasets
dataset = ir_datasets.load("trec-tot/2023/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export trec-tot/2023/dev qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2023/dev')
index_ref = pt.IndexRef.of('./indices/trec-tot_2023') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics('url'),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)

You can find more details about PyTerrier experiments here.

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.trec-tot.2023.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

{
  "docs": {
    "count": 231852,
    "fields": {
      "doc_id": {
        "max_len": 8,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 150
  },
  "qrels": {
    "count": 150,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 150
        }
      }
    }
  }
}

`"trec-tot/2023/train"`

Train query set for TREC 2023 tip-of-the-tongue search track.

150 queries

Language: en

Query type:

TipOfTheTongueQuery: (namedtuple)

query_id: str
url: str
domain: str
title: str
text: str
sentence_annotations: List[Dict[str,str]]

Examples:

import ir_datasets
dataset = ir_datasets.load("trec-tot/2023/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, url, domain, title, text, sentence_annotations>

You can find more details about the Python API here.

CLI

ir_datasets export trec-tot/2023/train queries



[query_id]    [url]    [domain]    [title]    [text]    [sentence_annotations]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2023/train')
index_ref = pt.IndexRef.of('./indices/trec-tot_2023') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('url'))

You can find more details about PyTerrier retrieval here.

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.trec-tot.2023.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

232K docs

Inherits docs from trec-tot/2023

Language: en

Document type:

TipOfTheTongueDoc: (namedtuple)

doc_id: str
page_title: str
wikidata_id: str
wikidata_classes: List[str]
text: str
sections: Dict[str,str]
infoboxes: List[Dict[str,str]]

Examples:

import ir_datasets
dataset = ir_datasets.load("trec-tot/2023/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, page_title, wikidata_id, wikidata_classes, text, sections, infoboxes>

You can find more details about the Python API here.

CLI

ir_datasets export trec-tot/2023/train docs



[doc_id]    [page_title]    [wikidata_id]    [wikidata_classes]    [text]    [sections]    [infoboxes]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2023/train')
# Index trec-tot/2023
indexer = pt.IterDictIndexer('./indices/trec-tot_2023')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['page_title', 'wikidata_id', 'text'])

You can find more details about PyTerrier indexing here.

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.trec-tot.2023.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

150 qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
0	Not Relevant	`0`	0.0%
1	Relevant	`150`	100.0%

Examples:

import ir_datasets
dataset = ir_datasets.load("trec-tot/2023/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export trec-tot/2023/train qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2023/train')
index_ref = pt.IndexRef.of('./indices/trec-tot_2023') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics('url'),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)

You can find more details about PyTerrier experiments here.

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.trec-tot.2023.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

{
  "docs": {
    "count": 231852,
    "fields": {
      "doc_id": {
        "max_len": 8,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 150
  },
  "qrels": {
    "count": 150,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 150
        }
      }
    }
  }
}

`"trec-tot/2024"`

Corpus for the TREC 2024 tip-of-the-tongue search track.

docs

3.2M docs

Language: en

Document type:

TipOfTheTongueDoc2024: (namedtuple)

doc_id: str
title: str
wikidata_id: str
text: str
sections: Dict[str,str]

Examples:

import ir_datasets
dataset = ir_datasets.load("trec-tot/2024")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, wikidata_id, text, sections>

You can find more details about the Python API here.

CLI

ir_datasets export trec-tot/2024 docs



[doc_id]    [title]    [wikidata_id]    [text]    [sections]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2024')
# Index trec-tot/2024
indexer = pt.IterDictIndexer('./indices/trec-tot_2024')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'wikidata_id', 'text'])

You can find more details about PyTerrier indexing here.

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.trec-tot.2024')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

{
  "docs": {
    "count": 3185450,
    "fields": {
      "doc_id": {
        "max_len": 8,
        "common_prefix": ""
      }
    }
  }
}

`"trec-tot/2024/test"`

Test query set for TREC 2024 tip-of-the-tongue search track.

600 queries

Language: en

Query type:

TipOfTheTongueQuery2024: (namedtuple)

query_id: str
query: str

Examples:

import ir_datasets
dataset = ir_datasets.load("trec-tot/2024/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, query>

You can find more details about the Python API here.

CLI

ir_datasets export trec-tot/2024/test queries



[query_id]    [query]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2024/test')
index_ref = pt.IndexRef.of('./indices/trec-tot_2024') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.trec-tot.2024.test.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

3.2M docs

Inherits docs from trec-tot/2024

Language: en

Document type:

TipOfTheTongueDoc2024: (namedtuple)

doc_id: str
title: str
wikidata_id: str
text: str
sections: Dict[str,str]

Examples:

import ir_datasets
dataset = ir_datasets.load("trec-tot/2024/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, wikidata_id, text, sections>

You can find more details about the Python API here.

CLI

ir_datasets export trec-tot/2024/test docs



[doc_id]    [title]    [wikidata_id]    [text]    [sections]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2024/test')
# Index trec-tot/2024
indexer = pt.IterDictIndexer('./indices/trec-tot_2024')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'wikidata_id', 'text'])

You can find more details about PyTerrier indexing here.

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.trec-tot.2024.test')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

{
  "docs": {
    "count": 3185450,
    "fields": {
      "doc_id": {
        "max_len": 8,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 600
  }
}

`"trec-tot/2025"`

(no description provided)

docs

6.4M docs

Language: en

Document type:

TrecToT2025Doc: (namedtuple)

doc_id: str
title: str
url: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("trec-tot/2025")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, url, text>

You can find more details about the Python API here.

CLI

ir_datasets export trec-tot/2025 docs



[doc_id]    [title]    [url]    [text]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2025')
# Index trec-tot/2025
indexer = pt.IterDictIndexer('./indices/trec-tot_2025')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'url', 'text'])

You can find more details about PyTerrier indexing here.

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.trec-tot.2025')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

{
  "docs": {
    "count": 6407814,
    "fields": {
      "doc_id": {
        "max_len": 8,
        "common_prefix": ""
      }
    }
  }
}

`"trec-tot/2025/dev1"`

(no description provided)

142 queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("trec-tot/2025/dev1")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export trec-tot/2025/dev1 queries



[query_id]    [text]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2025/dev1')
index_ref = pt.IndexRef.of('./indices/trec-tot_2025_dev1') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.trec-tot.2025.dev1.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

6.4M docs

Language: en

Document type:

TrecToT2025Doc: (namedtuple)

doc_id: str
title: str
url: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("trec-tot/2025/dev1")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, url, text>

You can find more details about the Python API here.

CLI

ir_datasets export trec-tot/2025/dev1 docs



[doc_id]    [title]    [url]    [text]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2025/dev1')
# Index trec-tot/2025/dev1
indexer = pt.IterDictIndexer('./indices/trec-tot_2025_dev1')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'url', 'text'])

You can find more details about PyTerrier indexing here.

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.trec-tot.2025.dev1')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

142 qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
0	Not Relevant	`0`	0.0%
1	Relevant	`142`	100.0%

Examples:

import ir_datasets
dataset = ir_datasets.load("trec-tot/2025/dev1")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export trec-tot/2025/dev1 qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2025/dev1')
index_ref = pt.IndexRef.of('./indices/trec-tot_2025_dev1') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)

You can find more details about PyTerrier experiments here.

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.trec-tot.2025.dev1.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

{
  "docs": {
    "count": 6407814,
    "fields": {
      "doc_id": {
        "max_len": 8,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 142
  },
  "qrels": {
    "count": 142,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 142
        }
      }
    }
  }
}

`"trec-tot/2025/dev2"`

(no description provided)

143 queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("trec-tot/2025/dev2")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export trec-tot/2025/dev2 queries



[query_id]    [text]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2025/dev2')
index_ref = pt.IndexRef.of('./indices/trec-tot_2025_dev2') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.trec-tot.2025.dev2.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

6.4M docs

Language: en

Document type:

TrecToT2025Doc: (namedtuple)

doc_id: str
title: str
url: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("trec-tot/2025/dev2")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, url, text>

You can find more details about the Python API here.

CLI

ir_datasets export trec-tot/2025/dev2 docs



[doc_id]    [title]    [url]    [text]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2025/dev2')
# Index trec-tot/2025/dev2
indexer = pt.IterDictIndexer('./indices/trec-tot_2025_dev2')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'url', 'text'])

You can find more details about PyTerrier indexing here.

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.trec-tot.2025.dev2')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

143 qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
0	Not Relevant	`0`	0.0%
1	Relevant	`143`	100.0%

Examples:

import ir_datasets
dataset = ir_datasets.load("trec-tot/2025/dev2")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export trec-tot/2025/dev2 qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2025/dev2')
index_ref = pt.IndexRef.of('./indices/trec-tot_2025_dev2') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)

You can find more details about PyTerrier experiments here.

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.trec-tot.2025.dev2.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

{
  "docs": {
    "count": 6407814,
    "fields": {
      "doc_id": {
        "max_len": 8,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 143
  },
  "qrels": {
    "count": 143,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 143
        }
      }
    }
  }
}

`"trec-tot/2025/dev3"`

(no description provided)

536 queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("trec-tot/2025/dev3")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export trec-tot/2025/dev3 queries



[query_id]    [text]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2025/dev3')
index_ref = pt.IndexRef.of('./indices/trec-tot_2025_dev3') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.trec-tot.2025.dev3.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

6.4M docs

Language: en

Document type:

TrecToT2025Doc: (namedtuple)

doc_id: str
title: str
url: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("trec-tot/2025/dev3")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, url, text>

You can find more details about the Python API here.

CLI

ir_datasets export trec-tot/2025/dev3 docs



[doc_id]    [title]    [url]    [text]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2025/dev3')
# Index trec-tot/2025/dev3
indexer = pt.IterDictIndexer('./indices/trec-tot_2025_dev3')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'url', 'text'])

You can find more details about PyTerrier indexing here.

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.trec-tot.2025.dev3')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

536 qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
0	Not Relevant	`0`	0.0%
1	Relevant	`536`	100.0%

Examples:

import ir_datasets
dataset = ir_datasets.load("trec-tot/2025/dev3")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export trec-tot/2025/dev3 qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2025/dev3')
index_ref = pt.IndexRef.of('./indices/trec-tot_2025_dev3') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)

You can find more details about PyTerrier experiments here.

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.trec-tot.2025.dev3.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

{
  "docs": {
    "count": 6407814,
    "fields": {
      "doc_id": {
        "max_len": 8,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 536
  },
  "qrels": {
    "count": 536,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 536
        }
      }
    }
  }
}

`"trec-tot/2025/test"`

(no description provided)

622 queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("trec-tot/2025/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export trec-tot/2025/test queries



[query_id]    [text]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2025/test')
index_ref = pt.IndexRef.of('./indices/trec-tot_2025_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.trec-tot.2025.test.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

6.4M docs

Language: en

Document type:

TrecToT2025Doc: (namedtuple)

doc_id: str
title: str
url: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("trec-tot/2025/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, url, text>

You can find more details about the Python API here.

CLI

ir_datasets export trec-tot/2025/test docs



[doc_id]    [title]    [url]    [text]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2025/test')
# Index trec-tot/2025/test
indexer = pt.IterDictIndexer('./indices/trec-tot_2025_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'url', 'text'])

You can find more details about PyTerrier indexing here.

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.trec-tot.2025.test')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

{
  "docs": {
    "count": 6407814,
    "fields": {
      "doc_id": {
        "max_len": 8,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 622
  }
}

`"trec-tot/2025/train"`

(no description provided)

143 queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("trec-tot/2025/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export trec-tot/2025/train queries



[query_id]    [text]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2025/train')
index_ref = pt.IndexRef.of('./indices/trec-tot_2025_train') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.trec-tot.2025.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

6.4M docs

Language: en

Document type:

TrecToT2025Doc: (namedtuple)

doc_id: str
title: str
url: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("trec-tot/2025/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, url, text>

You can find more details about the Python API here.

CLI

ir_datasets export trec-tot/2025/train docs



[doc_id]    [title]    [url]    [text]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2025/train')
# Index trec-tot/2025/train
indexer = pt.IterDictIndexer('./indices/trec-tot_2025_train')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'url', 'text'])

You can find more details about PyTerrier indexing here.

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.trec-tot.2025.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

143 qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
0	Not Relevant	`0`	0.0%
1	Relevant	`143`	100.0%

Examples:

import ir_datasets
dataset = ir_datasets.load("trec-tot/2025/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export trec-tot/2025/train qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2025/train')
index_ref = pt.IndexRef.of('./indices/trec-tot_2025_train') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)

You can find more details about PyTerrier experiments here.

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.trec-tot.2025.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.