ir_datasets: TREC Tip-of-the-TongueTip of the tongue: The phenomenon of failing to retrieve something from memory, combined with partial recall and the feeling that retrieval is imminent. More details are available on the official page for the TREC Tip-of-the-Tongue (ToT) Track.
Corpus for the TREC 2023 tip-of-the-tongue search track.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("trec-tot/2023")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, page_title, wikidata_id, wikidata_classes, text, sections, infoboxes>
You can find more details about the Python API here.
ir_datasets export trec-tot/2023 docs
[doc_id] [page_title] [wikidata_id] [wikidata_classes] [text] [sections] [infoboxes]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2023')
# Index trec-tot/2023
indexer = pt.IterDictIndexer('./indices/trec-tot_2023')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['page_title', 'wikidata_id', 'text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.trec-tot.2023')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
{
"docs": {
"count": 231852,
"fields": {
"doc_id": {
"max_len": 8,
"common_prefix": ""
}
}
}
}
Dev query set for TREC 2023 tip-of-the-tongue search track.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("trec-tot/2023/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, url, domain, title, text, sentence_annotations>
You can find more details about the Python API here.
ir_datasets export trec-tot/2023/dev queries
[query_id] [url] [domain] [title] [text] [sentence_annotations]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2023/dev')
index_ref = pt.IndexRef.of('./indices/trec-tot_2023') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('url'))
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.trec-tot.2023.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from trec-tot/2023
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("trec-tot/2023/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, page_title, wikidata_id, wikidata_classes, text, sections, infoboxes>
You can find more details about the Python API here.
ir_datasets export trec-tot/2023/dev docs
[doc_id] [page_title] [wikidata_id] [wikidata_classes] [text] [sections] [infoboxes]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2023/dev')
# Index trec-tot/2023
indexer = pt.IterDictIndexer('./indices/trec-tot_2023')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['page_title', 'wikidata_id', 'text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.trec-tot.2023.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 0 | Not Relevant | 0 | 0.0% |
| 1 | Relevant | 150 | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("trec-tot/2023/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export trec-tot/2023/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2023/dev')
index_ref = pt.IndexRef.of('./indices/trec-tot_2023') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('url'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.trec-tot.2023.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
{
"docs": {
"count": 231852,
"fields": {
"doc_id": {
"max_len": 8,
"common_prefix": ""
}
}
},
"queries": {
"count": 150
},
"qrels": {
"count": 150,
"fields": {
"relevance": {
"counts_by_value": {
"1": 150
}
}
}
}
}
Train query set for TREC 2023 tip-of-the-tongue search track.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("trec-tot/2023/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, url, domain, title, text, sentence_annotations>
You can find more details about the Python API here.
ir_datasets export trec-tot/2023/train queries
[query_id] [url] [domain] [title] [text] [sentence_annotations]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2023/train')
index_ref = pt.IndexRef.of('./indices/trec-tot_2023') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('url'))
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.trec-tot.2023.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from trec-tot/2023
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("trec-tot/2023/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, page_title, wikidata_id, wikidata_classes, text, sections, infoboxes>
You can find more details about the Python API here.
ir_datasets export trec-tot/2023/train docs
[doc_id] [page_title] [wikidata_id] [wikidata_classes] [text] [sections] [infoboxes]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2023/train')
# Index trec-tot/2023
indexer = pt.IterDictIndexer('./indices/trec-tot_2023')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['page_title', 'wikidata_id', 'text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.trec-tot.2023.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 0 | Not Relevant | 0 | 0.0% |
| 1 | Relevant | 150 | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("trec-tot/2023/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export trec-tot/2023/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2023/train')
index_ref = pt.IndexRef.of('./indices/trec-tot_2023') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('url'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.trec-tot.2023.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
{
"docs": {
"count": 231852,
"fields": {
"doc_id": {
"max_len": 8,
"common_prefix": ""
}
}
},
"queries": {
"count": 150
},
"qrels": {
"count": 150,
"fields": {
"relevance": {
"counts_by_value": {
"1": 150
}
}
}
}
}
Corpus for the TREC 2024 tip-of-the-tongue search track.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("trec-tot/2024")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, wikidata_id, text, sections>
You can find more details about the Python API here.
ir_datasets export trec-tot/2024 docs
[doc_id] [title] [wikidata_id] [text] [sections]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2024')
# Index trec-tot/2024
indexer = pt.IterDictIndexer('./indices/trec-tot_2024')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'wikidata_id', 'text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.trec-tot.2024')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
{
"docs": {
"count": 3185450,
"fields": {
"doc_id": {
"max_len": 8,
"common_prefix": ""
}
}
}
}
Test query set for TREC 2024 tip-of-the-tongue search track.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("trec-tot/2024/test")
for query in dataset.queries_iter():
query # namedtuple<query_id, query>
You can find more details about the Python API here.
ir_datasets export trec-tot/2024/test queries
[query_id] [query]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2024/test')
index_ref = pt.IndexRef.of('./indices/trec-tot_2024') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.trec-tot.2024.test.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from trec-tot/2024
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("trec-tot/2024/test")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, wikidata_id, text, sections>
You can find more details about the Python API here.
ir_datasets export trec-tot/2024/test docs
[doc_id] [title] [wikidata_id] [text] [sections]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2024/test')
# Index trec-tot/2024
indexer = pt.IterDictIndexer('./indices/trec-tot_2024')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'wikidata_id', 'text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.trec-tot.2024.test')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
{
"docs": {
"count": 3185450,
"fields": {
"doc_id": {
"max_len": 8,
"common_prefix": ""
}
}
},
"queries": {
"count": 600
}
}
(no description provided)
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("trec-tot/2025")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, url, text>
You can find more details about the Python API here.
ir_datasets export trec-tot/2025 docs
[doc_id] [title] [url] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2025')
# Index trec-tot/2025
indexer = pt.IterDictIndexer('./indices/trec-tot_2025')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'url', 'text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.trec-tot.2025')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
{
"docs": {
"count": 6407814,
"fields": {
"doc_id": {
"max_len": 8,
"common_prefix": ""
}
}
}
}
(no description provided)
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("trec-tot/2025/dev1")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export trec-tot/2025/dev1 queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2025/dev1')
index_ref = pt.IndexRef.of('./indices/trec-tot_2025_dev1') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.trec-tot.2025.dev1.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("trec-tot/2025/dev1")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, url, text>
You can find more details about the Python API here.
ir_datasets export trec-tot/2025/dev1 docs
[doc_id] [title] [url] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2025/dev1')
# Index trec-tot/2025/dev1
indexer = pt.IterDictIndexer('./indices/trec-tot_2025_dev1')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'url', 'text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.trec-tot.2025.dev1')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 0 | Not Relevant | 0 | 0.0% |
| 1 | Relevant | 142 | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("trec-tot/2025/dev1")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export trec-tot/2025/dev1 qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2025/dev1')
index_ref = pt.IndexRef.of('./indices/trec-tot_2025_dev1') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.trec-tot.2025.dev1.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
{
"docs": {
"count": 6407814,
"fields": {
"doc_id": {
"max_len": 8,
"common_prefix": ""
}
}
},
"queries": {
"count": 142
},
"qrels": {
"count": 142,
"fields": {
"relevance": {
"counts_by_value": {
"1": 142
}
}
}
}
}
(no description provided)
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("trec-tot/2025/dev2")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export trec-tot/2025/dev2 queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2025/dev2')
index_ref = pt.IndexRef.of('./indices/trec-tot_2025_dev2') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.trec-tot.2025.dev2.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("trec-tot/2025/dev2")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, url, text>
You can find more details about the Python API here.
ir_datasets export trec-tot/2025/dev2 docs
[doc_id] [title] [url] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2025/dev2')
# Index trec-tot/2025/dev2
indexer = pt.IterDictIndexer('./indices/trec-tot_2025_dev2')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'url', 'text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.trec-tot.2025.dev2')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 0 | Not Relevant | 0 | 0.0% |
| 1 | Relevant | 143 | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("trec-tot/2025/dev2")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export trec-tot/2025/dev2 qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2025/dev2')
index_ref = pt.IndexRef.of('./indices/trec-tot_2025_dev2') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.trec-tot.2025.dev2.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
{
"docs": {
"count": 6407814,
"fields": {
"doc_id": {
"max_len": 8,
"common_prefix": ""
}
}
},
"queries": {
"count": 143
},
"qrels": {
"count": 143,
"fields": {
"relevance": {
"counts_by_value": {
"1": 143
}
}
}
}
}
(no description provided)
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("trec-tot/2025/dev3")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export trec-tot/2025/dev3 queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2025/dev3')
index_ref = pt.IndexRef.of('./indices/trec-tot_2025_dev3') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.trec-tot.2025.dev3.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("trec-tot/2025/dev3")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, url, text>
You can find more details about the Python API here.
ir_datasets export trec-tot/2025/dev3 docs
[doc_id] [title] [url] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2025/dev3')
# Index trec-tot/2025/dev3
indexer = pt.IterDictIndexer('./indices/trec-tot_2025_dev3')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'url', 'text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.trec-tot.2025.dev3')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 0 | Not Relevant | 0 | 0.0% |
| 1 | Relevant | 536 | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("trec-tot/2025/dev3")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export trec-tot/2025/dev3 qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2025/dev3')
index_ref = pt.IndexRef.of('./indices/trec-tot_2025_dev3') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.trec-tot.2025.dev3.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
{
"docs": {
"count": 6407814,
"fields": {
"doc_id": {
"max_len": 8,
"common_prefix": ""
}
}
},
"queries": {
"count": 536
},
"qrels": {
"count": 536,
"fields": {
"relevance": {
"counts_by_value": {
"1": 536
}
}
}
}
}
(no description provided)
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("trec-tot/2025/test")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export trec-tot/2025/test queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2025/test')
index_ref = pt.IndexRef.of('./indices/trec-tot_2025_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.trec-tot.2025.test.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("trec-tot/2025/test")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, url, text>
You can find more details about the Python API here.
ir_datasets export trec-tot/2025/test docs
[doc_id] [title] [url] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2025/test')
# Index trec-tot/2025/test
indexer = pt.IterDictIndexer('./indices/trec-tot_2025_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'url', 'text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.trec-tot.2025.test')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
{
"docs": {
"count": 6407814,
"fields": {
"doc_id": {
"max_len": 8,
"common_prefix": ""
}
}
},
"queries": {
"count": 622
}
}
(no description provided)
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("trec-tot/2025/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export trec-tot/2025/train queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2025/train')
index_ref = pt.IndexRef.of('./indices/trec-tot_2025_train') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.trec-tot.2025.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("trec-tot/2025/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, url, text>
You can find more details about the Python API here.
ir_datasets export trec-tot/2025/train docs
[doc_id] [title] [url] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2025/train')
# Index trec-tot/2025/train
indexer = pt.IterDictIndexer('./indices/trec-tot_2025_train')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'url', 'text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.trec-tot.2025.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 0 | Not Relevant | 0 | 0.0% |
| 1 | Relevant | 143 | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("trec-tot/2025/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export trec-tot/2025/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:trec-tot/2025/train')
index_ref = pt.IndexRef.of('./indices/trec-tot_2025_train') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.trec-tot.2025.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
{
"docs": {
"count": 6407814,
"fields": {
"doc_id": {
"max_len": 8,
"common_prefix": ""
}
}
},
"queries": {
"count": 143
},
"qrels": {
"count": 143,
"fields": {
"relevance": {
"counts_by_value": {
"1": 143
}
}
}
}
}