ir_datasets: BRIGHT (benchmark suite)BRIGHT is a retrieval benchmark in which finding relevant documents requires reasoning rather than surface-level lexical or semantic matching. It spans 12 diverse domains drawn from sources such as StackExchange, coding problems (LeetCode), and math competitions (AoPS, TheoremQA).
Each domain is available as a separate subset. The base subsets use short documents; the -long subsets provide the original long-form documents with relevance judgments mapped accordingly. Queries include the original reasoning rationale, gold answer, and LLM-generated reasoning fields (from Gemini, Claude 3 Opus, GPT-4, GRIT, and Llama3-70B).
Bibtex:
@inproceedings{DBLP:conf/iclr/SuYXSMWLSST0YA025, author = {Hongjin Su and Howard Yen and Mengzhou Xia and Weijia Shi and Niklas Muennighoff and Han{-}yu Wang and Haisu Liu and Quan Shi and Zachary S. Siegel and Michael Tang and Ruoxi Sun and Jinsung Yoon and Sercan {\"{O}}. Arik and Danqi Chen and Tao Yu}, title = {{BRIGHT:} {A} Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval}, booktitle = {The Thirteenth International Conference on Learning Representations, {ICLR} 2025, Singapore, April 24-28, 2025}, publisher = {OpenReview.net}, year = {2025}, url = {https://openreview.net/forum?id=ykuc5q381b}, timestamp = {Thu, 15 May 2025 17:19:05 +0200}, biburl = {https://dblp.org/rec/conf/iclr/SuYXSMWLSST0YA025.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }Math competition problems from the Art of Problem Solving (AoPS) forum, where relevance requires recognizing shared problem-solving techniques.
Official evaluation measures: nDCG@10
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/aops")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, reasoning, gold_answer, gemini_1_0_reason, claude_3_opus_reason, gpt4_reason, grit_reason, llama3_70b_reason>
You can find more details about the Python API here.
ir_datasets export bright/aops queries
[query_id] [text] [reasoning] [gold_answer] [gemini_1_0_reason] [claude_3_opus_reason] [gpt4_reason] [grit_reason] [llama3_70b_reason]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:bright/aops')
index_ref = pt.IndexRef.of('./indices/bright_aops') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.bright.aops.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/aops")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export bright/aops docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:bright/aops')
# Index bright/aops
indexer = pt.IterDictIndexer('./indices/bright_aops', meta={"docno": 62})
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.bright.aops')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| -100 | Excluded from evaluation | 623K | 99.9% |
| 1 | Relevant | 524 | 0.1% |
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/aops")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export bright/aops qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:bright/aops')
index_ref = pt.IndexRef.of('./indices/bright_aops') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[nDCG@10]
)
You can find more details about PyTerrier experiments here.
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.bright.aops.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{DBLP:conf/iclr/SuYXSMWLSST0YA025, author = {Hongjin Su and Howard Yen and Mengzhou Xia and Weijia Shi and Niklas Muennighoff and Han{-}yu Wang and Haisu Liu and Quan Shi and Zachary S. Siegel and Michael Tang and Ruoxi Sun and Jinsung Yoon and Sercan {\"{O}}. Arik and Danqi Chen and Tao Yu}, title = {{BRIGHT:} {A} Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval}, booktitle = {The Thirteenth International Conference on Learning Representations, {ICLR} 2025, Singapore, April 24-28, 2025}, publisher = {OpenReview.net}, year = {2025}, url = {https://openreview.net/forum?id=ykuc5q381b}, timestamp = {Thu, 15 May 2025 17:19:05 +0200}, biburl = {https://dblp.org/rec/conf/iclr/SuYXSMWLSST0YA025.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }{
"docs": {
"count": 188002,
"fields": {
"doc_id": {
"max_len": 62,
"common_prefix": ""
}
}
},
"queries": {
"count": 111
},
"qrels": {
"count": 623280,
"fields": {
"relevance": {
"counts_by_value": {
"1": 524,
"-100": 622756
}
}
}
}
}
Reasoning-intensive retrieval over biology content sourced from StackExchange.
Official evaluation measures: nDCG@10
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/biology")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, reasoning, gold_answer, gemini_1_0_reason, claude_3_opus_reason, gpt4_reason, grit_reason, llama3_70b_reason>
You can find more details about the Python API here.
ir_datasets export bright/biology queries
[query_id] [text] [reasoning] [gold_answer] [gemini_1_0_reason] [claude_3_opus_reason] [gpt4_reason] [grit_reason] [llama3_70b_reason]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:bright/biology')
index_ref = pt.IndexRef.of('./indices/bright_biology') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.bright.biology.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/biology")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export bright/biology docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:bright/biology')
# Index bright/biology
indexer = pt.IterDictIndexer('./indices/bright_biology', meta={"docno": 149})
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.bright.biology')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| -100 | Excluded from evaluation | 0 | 0.0% |
| 1 | Relevant | 372 | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/biology")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export bright/biology qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:bright/biology')
index_ref = pt.IndexRef.of('./indices/bright_biology') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[nDCG@10]
)
You can find more details about PyTerrier experiments here.
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.bright.biology.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{DBLP:conf/iclr/SuYXSMWLSST0YA025, author = {Hongjin Su and Howard Yen and Mengzhou Xia and Weijia Shi and Niklas Muennighoff and Han{-}yu Wang and Haisu Liu and Quan Shi and Zachary S. Siegel and Michael Tang and Ruoxi Sun and Jinsung Yoon and Sercan {\"{O}}. Arik and Danqi Chen and Tao Yu}, title = {{BRIGHT:} {A} Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval}, booktitle = {The Thirteenth International Conference on Learning Representations, {ICLR} 2025, Singapore, April 24-28, 2025}, publisher = {OpenReview.net}, year = {2025}, url = {https://openreview.net/forum?id=ykuc5q381b}, timestamp = {Thu, 15 May 2025 17:19:05 +0200}, biburl = {https://dblp.org/rec/conf/iclr/SuYXSMWLSST0YA025.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }{
"docs": {
"count": 57359,
"fields": {
"doc_id": {
"max_len": 149,
"common_prefix": ""
}
}
},
"queries": {
"count": 103
},
"qrels": {
"count": 372,
"fields": {
"relevance": {
"counts_by_value": {
"1": 372
}
}
}
}
}
Long-document variant of bright/biology, retrieving the original full-length documents.
Official evaluation measures: Success@1
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/biology-long")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, reasoning, gold_answer, gemini_1_0_reason, claude_3_opus_reason, gpt4_reason, grit_reason, llama3_70b_reason>
You can find more details about the Python API here.
ir_datasets export bright/biology-long queries
[query_id] [text] [reasoning] [gold_answer] [gemini_1_0_reason] [claude_3_opus_reason] [gpt4_reason] [grit_reason] [llama3_70b_reason]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:bright/biology-long')
index_ref = pt.IndexRef.of('./indices/bright_biology-long') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.bright.biology-long.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/biology-long")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export bright/biology-long docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:bright/biology-long')
# Index bright/biology-long
indexer = pt.IterDictIndexer('./indices/bright_biology-long', meta={"docno": 144})
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.bright.biology-long')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| -100 | Excluded from evaluation | 0 | 0.0% |
| 1 | Relevant | 134 | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/biology-long")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export bright/biology-long qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:bright/biology-long')
index_ref = pt.IndexRef.of('./indices/bright_biology-long') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[Success@1]
)
You can find more details about PyTerrier experiments here.
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.bright.biology-long.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{DBLP:conf/iclr/SuYXSMWLSST0YA025, author = {Hongjin Su and Howard Yen and Mengzhou Xia and Weijia Shi and Niklas Muennighoff and Han{-}yu Wang and Haisu Liu and Quan Shi and Zachary S. Siegel and Michael Tang and Ruoxi Sun and Jinsung Yoon and Sercan {\"{O}}. Arik and Danqi Chen and Tao Yu}, title = {{BRIGHT:} {A} Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval}, booktitle = {The Thirteenth International Conference on Learning Representations, {ICLR} 2025, Singapore, April 24-28, 2025}, publisher = {OpenReview.net}, year = {2025}, url = {https://openreview.net/forum?id=ykuc5q381b}, timestamp = {Thu, 15 May 2025 17:19:05 +0200}, biburl = {https://dblp.org/rec/conf/iclr/SuYXSMWLSST0YA025.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }{
"docs": {
"count": 524,
"fields": {
"doc_id": {
"max_len": 144,
"common_prefix": ""
}
}
},
"queries": {
"count": 103
},
"qrels": {
"count": 134,
"fields": {
"relevance": {
"counts_by_value": {
"1": 134
}
}
}
}
}
Reasoning-intensive retrieval over earth science content sourced from StackExchange.
Official evaluation measures: nDCG@10
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/earth-science")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, reasoning, gold_answer, gemini_1_0_reason, claude_3_opus_reason, gpt4_reason, grit_reason, llama3_70b_reason>
You can find more details about the Python API here.
ir_datasets export bright/earth-science queries
[query_id] [text] [reasoning] [gold_answer] [gemini_1_0_reason] [claude_3_opus_reason] [gpt4_reason] [grit_reason] [llama3_70b_reason]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:bright/earth-science')
index_ref = pt.IndexRef.of('./indices/bright_earth-science') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.bright.earth-science.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/earth-science")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export bright/earth-science docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:bright/earth-science')
# Index bright/earth-science
indexer = pt.IterDictIndexer('./indices/bright_earth-science', meta={"docno": 145})
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.bright.earth-science')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| -100 | Excluded from evaluation | 0 | 0.0% |
| 1 | Relevant | 585 | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/earth-science")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export bright/earth-science qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:bright/earth-science')
index_ref = pt.IndexRef.of('./indices/bright_earth-science') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[nDCG@10]
)
You can find more details about PyTerrier experiments here.
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.bright.earth-science.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{DBLP:conf/iclr/SuYXSMWLSST0YA025, author = {Hongjin Su and Howard Yen and Mengzhou Xia and Weijia Shi and Niklas Muennighoff and Han{-}yu Wang and Haisu Liu and Quan Shi and Zachary S. Siegel and Michael Tang and Ruoxi Sun and Jinsung Yoon and Sercan {\"{O}}. Arik and Danqi Chen and Tao Yu}, title = {{BRIGHT:} {A} Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval}, booktitle = {The Thirteenth International Conference on Learning Representations, {ICLR} 2025, Singapore, April 24-28, 2025}, publisher = {OpenReview.net}, year = {2025}, url = {https://openreview.net/forum?id=ykuc5q381b}, timestamp = {Thu, 15 May 2025 17:19:05 +0200}, biburl = {https://dblp.org/rec/conf/iclr/SuYXSMWLSST0YA025.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }{
"docs": {
"count": 121249,
"fields": {
"doc_id": {
"max_len": 145,
"common_prefix": ""
}
}
},
"queries": {
"count": 116
},
"qrels": {
"count": 585,
"fields": {
"relevance": {
"counts_by_value": {
"1": 585
}
}
}
}
}
Long-document variant of bright/earth-science, retrieving the original full-length documents.
Official evaluation measures: Success@1
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/earth-science-long")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, reasoning, gold_answer, gemini_1_0_reason, claude_3_opus_reason, gpt4_reason, grit_reason, llama3_70b_reason>
You can find more details about the Python API here.
ir_datasets export bright/earth-science-long queries
[query_id] [text] [reasoning] [gold_answer] [gemini_1_0_reason] [claude_3_opus_reason] [gpt4_reason] [grit_reason] [llama3_70b_reason]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:bright/earth-science-long')
index_ref = pt.IndexRef.of('./indices/bright_earth-science-long') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.bright.earth-science-long.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/earth-science-long")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export bright/earth-science-long docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:bright/earth-science-long')
# Index bright/earth-science-long
indexer = pt.IterDictIndexer('./indices/bright_earth-science-long', meta={"docno": 142})
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.bright.earth-science-long')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| -100 | Excluded from evaluation | 0 | 0.0% |
| 1 | Relevant | 187 | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/earth-science-long")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export bright/earth-science-long qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:bright/earth-science-long')
index_ref = pt.IndexRef.of('./indices/bright_earth-science-long') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[Success@1]
)
You can find more details about PyTerrier experiments here.
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.bright.earth-science-long.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{DBLP:conf/iclr/SuYXSMWLSST0YA025, author = {Hongjin Su and Howard Yen and Mengzhou Xia and Weijia Shi and Niklas Muennighoff and Han{-}yu Wang and Haisu Liu and Quan Shi and Zachary S. Siegel and Michael Tang and Ruoxi Sun and Jinsung Yoon and Sercan {\"{O}}. Arik and Danqi Chen and Tao Yu}, title = {{BRIGHT:} {A} Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval}, booktitle = {The Thirteenth International Conference on Learning Representations, {ICLR} 2025, Singapore, April 24-28, 2025}, publisher = {OpenReview.net}, year = {2025}, url = {https://openreview.net/forum?id=ykuc5q381b}, timestamp = {Thu, 15 May 2025 17:19:05 +0200}, biburl = {https://dblp.org/rec/conf/iclr/SuYXSMWLSST0YA025.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }{
"docs": {
"count": 601,
"fields": {
"doc_id": {
"max_len": 142,
"common_prefix": ""
}
}
},
"queries": {
"count": 116
},
"qrels": {
"count": 187,
"fields": {
"relevance": {
"counts_by_value": {
"1": 187
}
}
}
}
}
Reasoning-intensive retrieval over economics content sourced from StackExchange.
Official evaluation measures: nDCG@10
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/economics")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, reasoning, gold_answer, gemini_1_0_reason, claude_3_opus_reason, gpt4_reason, grit_reason, llama3_70b_reason>
You can find more details about the Python API here.
ir_datasets export bright/economics queries
[query_id] [text] [reasoning] [gold_answer] [gemini_1_0_reason] [claude_3_opus_reason] [gpt4_reason] [grit_reason] [llama3_70b_reason]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:bright/economics')
index_ref = pt.IndexRef.of('./indices/bright_economics') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.bright.economics.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/economics")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export bright/economics docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:bright/economics')
# Index bright/economics
indexer = pt.IterDictIndexer('./indices/bright_economics', meta={"docno": 156})
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.bright.economics')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| -100 | Excluded from evaluation | 0 | 0.0% |
| 1 | Relevant | 800 | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/economics")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export bright/economics qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:bright/economics')
index_ref = pt.IndexRef.of('./indices/bright_economics') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[nDCG@10]
)
You can find more details about PyTerrier experiments here.
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.bright.economics.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{DBLP:conf/iclr/SuYXSMWLSST0YA025, author = {Hongjin Su and Howard Yen and Mengzhou Xia and Weijia Shi and Niklas Muennighoff and Han{-}yu Wang and Haisu Liu and Quan Shi and Zachary S. Siegel and Michael Tang and Ruoxi Sun and Jinsung Yoon and Sercan {\"{O}}. Arik and Danqi Chen and Tao Yu}, title = {{BRIGHT:} {A} Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval}, booktitle = {The Thirteenth International Conference on Learning Representations, {ICLR} 2025, Singapore, April 24-28, 2025}, publisher = {OpenReview.net}, year = {2025}, url = {https://openreview.net/forum?id=ykuc5q381b}, timestamp = {Thu, 15 May 2025 17:19:05 +0200}, biburl = {https://dblp.org/rec/conf/iclr/SuYXSMWLSST0YA025.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }{
"docs": {
"count": 50220,
"fields": {
"doc_id": {
"max_len": 156,
"common_prefix": ""
}
}
},
"queries": {
"count": 103
},
"qrels": {
"count": 800,
"fields": {
"relevance": {
"counts_by_value": {
"1": 800
}
}
}
}
}
Long-document variant of bright/economics, retrieving the original full-length documents.
Official evaluation measures: Success@1
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/economics-long")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, reasoning, gold_answer, gemini_1_0_reason, claude_3_opus_reason, gpt4_reason, grit_reason, llama3_70b_reason>
You can find more details about the Python API here.
ir_datasets export bright/economics-long queries
[query_id] [text] [reasoning] [gold_answer] [gemini_1_0_reason] [claude_3_opus_reason] [gpt4_reason] [grit_reason] [llama3_70b_reason]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:bright/economics-long')
index_ref = pt.IndexRef.of('./indices/bright_economics-long') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.bright.economics-long.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/economics-long")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export bright/economics-long docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:bright/economics-long')
# Index bright/economics-long
indexer = pt.IterDictIndexer('./indices/bright_economics-long', meta={"docno": 152})
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.bright.economics-long')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| -100 | Excluded from evaluation | 0 | 0.0% |
| 1 | Relevant | 109 | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/economics-long")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export bright/economics-long qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:bright/economics-long')
index_ref = pt.IndexRef.of('./indices/bright_economics-long') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[Success@1]
)
You can find more details about PyTerrier experiments here.
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.bright.economics-long.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{DBLP:conf/iclr/SuYXSMWLSST0YA025, author = {Hongjin Su and Howard Yen and Mengzhou Xia and Weijia Shi and Niklas Muennighoff and Han{-}yu Wang and Haisu Liu and Quan Shi and Zachary S. Siegel and Michael Tang and Ruoxi Sun and Jinsung Yoon and Sercan {\"{O}}. Arik and Danqi Chen and Tao Yu}, title = {{BRIGHT:} {A} Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval}, booktitle = {The Thirteenth International Conference on Learning Representations, {ICLR} 2025, Singapore, April 24-28, 2025}, publisher = {OpenReview.net}, year = {2025}, url = {https://openreview.net/forum?id=ykuc5q381b}, timestamp = {Thu, 15 May 2025 17:19:05 +0200}, biburl = {https://dblp.org/rec/conf/iclr/SuYXSMWLSST0YA025.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }{
"docs": {
"count": 516,
"fields": {
"doc_id": {
"max_len": 152,
"common_prefix": ""
}
}
},
"queries": {
"count": 103
},
"qrels": {
"count": 109,
"fields": {
"relevance": {
"counts_by_value": {
"1": 109
}
}
}
}
}
Coding problems from LeetCode, where relevant documents share the underlying algorithmic approach needed to solve the query problem.
Official evaluation measures: nDCG@10
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/leetcode")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, reasoning, gold_answer, gemini_1_0_reason, claude_3_opus_reason, gpt4_reason, grit_reason, llama3_70b_reason>
You can find more details about the Python API here.
ir_datasets export bright/leetcode queries
[query_id] [text] [reasoning] [gold_answer] [gemini_1_0_reason] [claude_3_opus_reason] [gpt4_reason] [grit_reason] [llama3_70b_reason]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:bright/leetcode')
index_ref = pt.IndexRef.of('./indices/bright_leetcode') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.bright.leetcode.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/leetcode")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export bright/leetcode docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:bright/leetcode')
# Index bright/leetcode
indexer = pt.IterDictIndexer('./indices/bright_leetcode', meta={"docno": 36})
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.bright.leetcode')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| -100 | Excluded from evaluation | 33K | 99.2% |
| 1 | Relevant | 262 | 0.8% |
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/leetcode")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export bright/leetcode qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:bright/leetcode')
index_ref = pt.IndexRef.of('./indices/bright_leetcode') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[nDCG@10]
)
You can find more details about PyTerrier experiments here.
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.bright.leetcode.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{DBLP:conf/iclr/SuYXSMWLSST0YA025, author = {Hongjin Su and Howard Yen and Mengzhou Xia and Weijia Shi and Niklas Muennighoff and Han{-}yu Wang and Haisu Liu and Quan Shi and Zachary S. Siegel and Michael Tang and Ruoxi Sun and Jinsung Yoon and Sercan {\"{O}}. Arik and Danqi Chen and Tao Yu}, title = {{BRIGHT:} {A} Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval}, booktitle = {The Thirteenth International Conference on Learning Representations, {ICLR} 2025, Singapore, April 24-28, 2025}, publisher = {OpenReview.net}, year = {2025}, url = {https://openreview.net/forum?id=ykuc5q381b}, timestamp = {Thu, 15 May 2025 17:19:05 +0200}, biburl = {https://dblp.org/rec/conf/iclr/SuYXSMWLSST0YA025.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }{
"docs": {
"count": 413932,
"fields": {
"doc_id": {
"max_len": 36,
"common_prefix": "leetcode/"
}
}
},
"queries": {
"count": 142
},
"qrels": {
"count": 33747,
"fields": {
"relevance": {
"counts_by_value": {
"1": 262,
"-100": 33485
}
}
}
}
}
Retrieval over documentation for the Pony programming language.
Official evaluation measures: nDCG@10
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/pony")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, reasoning, gold_answer, gemini_1_0_reason, claude_3_opus_reason, gpt4_reason, grit_reason, llama3_70b_reason>
You can find more details about the Python API here.
ir_datasets export bright/pony queries
[query_id] [text] [reasoning] [gold_answer] [gemini_1_0_reason] [claude_3_opus_reason] [gpt4_reason] [grit_reason] [llama3_70b_reason]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:bright/pony')
index_ref = pt.IndexRef.of('./indices/bright_pony') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.bright.pony.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/pony")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export bright/pony docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:bright/pony')
# Index bright/pony
indexer = pt.IterDictIndexer('./indices/bright_pony', meta={"docno": 60})
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.bright.pony')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| -100 | Excluded from evaluation | 0 | 0.0% |
| 1 | Relevant | 2.2K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/pony")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export bright/pony qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:bright/pony')
index_ref = pt.IndexRef.of('./indices/bright_pony') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[nDCG@10]
)
You can find more details about PyTerrier experiments here.
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.bright.pony.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{DBLP:conf/iclr/SuYXSMWLSST0YA025, author = {Hongjin Su and Howard Yen and Mengzhou Xia and Weijia Shi and Niklas Muennighoff and Han{-}yu Wang and Haisu Liu and Quan Shi and Zachary S. Siegel and Michael Tang and Ruoxi Sun and Jinsung Yoon and Sercan {\"{O}}. Arik and Danqi Chen and Tao Yu}, title = {{BRIGHT:} {A} Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval}, booktitle = {The Thirteenth International Conference on Learning Representations, {ICLR} 2025, Singapore, April 24-28, 2025}, publisher = {OpenReview.net}, year = {2025}, url = {https://openreview.net/forum?id=ykuc5q381b}, timestamp = {Thu, 15 May 2025 17:19:05 +0200}, biburl = {https://dblp.org/rec/conf/iclr/SuYXSMWLSST0YA025.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }{
"docs": {
"count": 7894,
"fields": {
"doc_id": {
"max_len": 60,
"common_prefix": "Pony/"
}
}
},
"queries": {
"count": 112
},
"qrels": {
"count": 2219,
"fields": {
"relevance": {
"counts_by_value": {
"1": 2219
}
}
}
}
}
Long-document variant of bright/pony, retrieving the original full-length documents.
Official evaluation measures: Success@1
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/pony-long")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, reasoning, gold_answer, gemini_1_0_reason, claude_3_opus_reason, gpt4_reason, grit_reason, llama3_70b_reason>
You can find more details about the Python API here.
ir_datasets export bright/pony-long queries
[query_id] [text] [reasoning] [gold_answer] [gemini_1_0_reason] [claude_3_opus_reason] [gpt4_reason] [grit_reason] [llama3_70b_reason]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:bright/pony-long')
index_ref = pt.IndexRef.of('./indices/bright_pony-long') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.bright.pony-long.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/pony-long")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export bright/pony-long docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:bright/pony-long')
# Index bright/pony-long
indexer = pt.IterDictIndexer('./indices/bright_pony-long', meta={"docno": 58})
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.bright.pony-long')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| -100 | Excluded from evaluation | 0 | 0.0% |
| 1 | Relevant | 769 | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/pony-long")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export bright/pony-long qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:bright/pony-long')
index_ref = pt.IndexRef.of('./indices/bright_pony-long') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[Success@1]
)
You can find more details about PyTerrier experiments here.
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.bright.pony-long.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{DBLP:conf/iclr/SuYXSMWLSST0YA025, author = {Hongjin Su and Howard Yen and Mengzhou Xia and Weijia Shi and Niklas Muennighoff and Han{-}yu Wang and Haisu Liu and Quan Shi and Zachary S. Siegel and Michael Tang and Ruoxi Sun and Jinsung Yoon and Sercan {\"{O}}. Arik and Danqi Chen and Tao Yu}, title = {{BRIGHT:} {A} Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval}, booktitle = {The Thirteenth International Conference on Learning Representations, {ICLR} 2025, Singapore, April 24-28, 2025}, publisher = {OpenReview.net}, year = {2025}, url = {https://openreview.net/forum?id=ykuc5q381b}, timestamp = {Thu, 15 May 2025 17:19:05 +0200}, biburl = {https://dblp.org/rec/conf/iclr/SuYXSMWLSST0YA025.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }{
"docs": {
"count": 577,
"fields": {
"doc_id": {
"max_len": 58,
"common_prefix": "Pony/"
}
}
},
"queries": {
"count": 112
},
"qrels": {
"count": 769,
"fields": {
"relevance": {
"counts_by_value": {
"1": 769
}
}
}
}
}
Reasoning-intensive retrieval over psychology content sourced from StackExchange.
Official evaluation measures: nDCG@10
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/psychology")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, reasoning, gold_answer, gemini_1_0_reason, claude_3_opus_reason, gpt4_reason, grit_reason, llama3_70b_reason>
You can find more details about the Python API here.
ir_datasets export bright/psychology queries
[query_id] [text] [reasoning] [gold_answer] [gemini_1_0_reason] [claude_3_opus_reason] [gpt4_reason] [grit_reason] [llama3_70b_reason]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:bright/psychology')
index_ref = pt.IndexRef.of('./indices/bright_psychology') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.bright.psychology.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/psychology")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export bright/psychology docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:bright/psychology')
# Index bright/psychology
indexer = pt.IterDictIndexer('./indices/bright_psychology', meta={"docno": 165})
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.bright.psychology')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| -100 | Excluded from evaluation | 0 | 0.0% |
| 1 | Relevant | 692 | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/psychology")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export bright/psychology qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:bright/psychology')
index_ref = pt.IndexRef.of('./indices/bright_psychology') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[nDCG@10]
)
You can find more details about PyTerrier experiments here.
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.bright.psychology.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{DBLP:conf/iclr/SuYXSMWLSST0YA025, author = {Hongjin Su and Howard Yen and Mengzhou Xia and Weijia Shi and Niklas Muennighoff and Han{-}yu Wang and Haisu Liu and Quan Shi and Zachary S. Siegel and Michael Tang and Ruoxi Sun and Jinsung Yoon and Sercan {\"{O}}. Arik and Danqi Chen and Tao Yu}, title = {{BRIGHT:} {A} Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval}, booktitle = {The Thirteenth International Conference on Learning Representations, {ICLR} 2025, Singapore, April 24-28, 2025}, publisher = {OpenReview.net}, year = {2025}, url = {https://openreview.net/forum?id=ykuc5q381b}, timestamp = {Thu, 15 May 2025 17:19:05 +0200}, biburl = {https://dblp.org/rec/conf/iclr/SuYXSMWLSST0YA025.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }{
"docs": {
"count": 52835,
"fields": {
"doc_id": {
"max_len": 165,
"common_prefix": ""
}
}
},
"queries": {
"count": 101
},
"qrels": {
"count": 692,
"fields": {
"relevance": {
"counts_by_value": {
"1": 692
}
}
}
}
}
Long-document variant of bright/psychology, retrieving the original full-length documents.
Official evaluation measures: Success@1
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/psychology-long")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, reasoning, gold_answer, gemini_1_0_reason, claude_3_opus_reason, gpt4_reason, grit_reason, llama3_70b_reason>
You can find more details about the Python API here.
ir_datasets export bright/psychology-long queries
[query_id] [text] [reasoning] [gold_answer] [gemini_1_0_reason] [claude_3_opus_reason] [gpt4_reason] [grit_reason] [llama3_70b_reason]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:bright/psychology-long')
index_ref = pt.IndexRef.of('./indices/bright_psychology-long') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.bright.psychology-long.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/psychology-long")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export bright/psychology-long docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:bright/psychology-long')
# Index bright/psychology-long
indexer = pt.IterDictIndexer('./indices/bright_psychology-long', meta={"docno": 162})
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.bright.psychology-long')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| -100 | Excluded from evaluation | 0 | 0.0% |
| 1 | Relevant | 116 | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/psychology-long")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export bright/psychology-long qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:bright/psychology-long')
index_ref = pt.IndexRef.of('./indices/bright_psychology-long') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[Success@1]
)
You can find more details about PyTerrier experiments here.
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.bright.psychology-long.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{DBLP:conf/iclr/SuYXSMWLSST0YA025, author = {Hongjin Su and Howard Yen and Mengzhou Xia and Weijia Shi and Niklas Muennighoff and Han{-}yu Wang and Haisu Liu and Quan Shi and Zachary S. Siegel and Michael Tang and Ruoxi Sun and Jinsung Yoon and Sercan {\"{O}}. Arik and Danqi Chen and Tao Yu}, title = {{BRIGHT:} {A} Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval}, booktitle = {The Thirteenth International Conference on Learning Representations, {ICLR} 2025, Singapore, April 24-28, 2025}, publisher = {OpenReview.net}, year = {2025}, url = {https://openreview.net/forum?id=ykuc5q381b}, timestamp = {Thu, 15 May 2025 17:19:05 +0200}, biburl = {https://dblp.org/rec/conf/iclr/SuYXSMWLSST0YA025.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }{
"docs": {
"count": 512,
"fields": {
"doc_id": {
"max_len": 162,
"common_prefix": ""
}
}
},
"queries": {
"count": 101
},
"qrels": {
"count": 116,
"fields": {
"relevance": {
"counts_by_value": {
"1": 116
}
}
}
}
}
Reasoning-intensive retrieval over robotics content sourced from StackExchange.
Official evaluation measures: nDCG@10
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/robotics")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, reasoning, gold_answer, gemini_1_0_reason, claude_3_opus_reason, gpt4_reason, grit_reason, llama3_70b_reason>
You can find more details about the Python API here.
ir_datasets export bright/robotics queries
[query_id] [text] [reasoning] [gold_answer] [gemini_1_0_reason] [claude_3_opus_reason] [gpt4_reason] [grit_reason] [llama3_70b_reason]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:bright/robotics')
index_ref = pt.IndexRef.of('./indices/bright_robotics') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.bright.robotics.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/robotics")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export bright/robotics docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:bright/robotics')
# Index bright/robotics
indexer = pt.IterDictIndexer('./indices/bright_robotics', meta={"docno": 54})
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.bright.robotics')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| -100 | Excluded from evaluation | 0 | 0.0% |
| 1 | Relevant | 520 | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/robotics")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export bright/robotics qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:bright/robotics')
index_ref = pt.IndexRef.of('./indices/bright_robotics') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[nDCG@10]
)
You can find more details about PyTerrier experiments here.
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.bright.robotics.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{DBLP:conf/iclr/SuYXSMWLSST0YA025, author = {Hongjin Su and Howard Yen and Mengzhou Xia and Weijia Shi and Niklas Muennighoff and Han{-}yu Wang and Haisu Liu and Quan Shi and Zachary S. Siegel and Michael Tang and Ruoxi Sun and Jinsung Yoon and Sercan {\"{O}}. Arik and Danqi Chen and Tao Yu}, title = {{BRIGHT:} {A} Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval}, booktitle = {The Thirteenth International Conference on Learning Representations, {ICLR} 2025, Singapore, April 24-28, 2025}, publisher = {OpenReview.net}, year = {2025}, url = {https://openreview.net/forum?id=ykuc5q381b}, timestamp = {Thu, 15 May 2025 17:19:05 +0200}, biburl = {https://dblp.org/rec/conf/iclr/SuYXSMWLSST0YA025.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }{
"docs": {
"count": 61961,
"fields": {
"doc_id": {
"max_len": 54,
"common_prefix": ""
}
}
},
"queries": {
"count": 101
},
"qrels": {
"count": 520,
"fields": {
"relevance": {
"counts_by_value": {
"1": 520
}
}
}
}
}
Long-document variant of bright/robotics, retrieving the original full-length documents.
Official evaluation measures: Success@1
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/robotics-long")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, reasoning, gold_answer, gemini_1_0_reason, claude_3_opus_reason, gpt4_reason, grit_reason, llama3_70b_reason>
You can find more details about the Python API here.
ir_datasets export bright/robotics-long queries
[query_id] [text] [reasoning] [gold_answer] [gemini_1_0_reason] [claude_3_opus_reason] [gpt4_reason] [grit_reason] [llama3_70b_reason]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:bright/robotics-long')
index_ref = pt.IndexRef.of('./indices/bright_robotics-long') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.bright.robotics-long.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/robotics-long")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export bright/robotics-long docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:bright/robotics-long')
# Index bright/robotics-long
indexer = pt.IterDictIndexer('./indices/bright_robotics-long', meta={"docno": 50})
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.bright.robotics-long')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| -100 | Excluded from evaluation | 0 | 0.0% |
| 1 | Relevant | 106 | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/robotics-long")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export bright/robotics-long qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:bright/robotics-long')
index_ref = pt.IndexRef.of('./indices/bright_robotics-long') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[Success@1]
)
You can find more details about PyTerrier experiments here.
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.bright.robotics-long.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{DBLP:conf/iclr/SuYXSMWLSST0YA025, author = {Hongjin Su and Howard Yen and Mengzhou Xia and Weijia Shi and Niklas Muennighoff and Han{-}yu Wang and Haisu Liu and Quan Shi and Zachary S. Siegel and Michael Tang and Ruoxi Sun and Jinsung Yoon and Sercan {\"{O}}. Arik and Danqi Chen and Tao Yu}, title = {{BRIGHT:} {A} Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval}, booktitle = {The Thirteenth International Conference on Learning Representations, {ICLR} 2025, Singapore, April 24-28, 2025}, publisher = {OpenReview.net}, year = {2025}, url = {https://openreview.net/forum?id=ykuc5q381b}, timestamp = {Thu, 15 May 2025 17:19:05 +0200}, biburl = {https://dblp.org/rec/conf/iclr/SuYXSMWLSST0YA025.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }{
"docs": {
"count": 508,
"fields": {
"doc_id": {
"max_len": 50,
"common_prefix": ""
}
}
},
"queries": {
"count": 101
},
"qrels": {
"count": 106,
"fields": {
"relevance": {
"counts_by_value": {
"1": 106
}
}
}
}
}
Reasoning-intensive retrieval over programming content sourced from StackOverflow.
Official evaluation measures: nDCG@10
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/stackoverflow")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, reasoning, gold_answer, gemini_1_0_reason, claude_3_opus_reason, gpt4_reason, grit_reason, llama3_70b_reason>
You can find more details about the Python API here.
ir_datasets export bright/stackoverflow queries
[query_id] [text] [reasoning] [gold_answer] [gemini_1_0_reason] [claude_3_opus_reason] [gpt4_reason] [grit_reason] [llama3_70b_reason]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:bright/stackoverflow')
index_ref = pt.IndexRef.of('./indices/bright_stackoverflow') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.bright.stackoverflow.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/stackoverflow")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export bright/stackoverflow docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:bright/stackoverflow')
# Index bright/stackoverflow
indexer = pt.IterDictIndexer('./indices/bright_stackoverflow', meta={"docno": 145})
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.bright.stackoverflow')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| -100 | Excluded from evaluation | 0 | 0.0% |
| 1 | Relevant | 478 | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/stackoverflow")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export bright/stackoverflow qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:bright/stackoverflow')
index_ref = pt.IndexRef.of('./indices/bright_stackoverflow') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[nDCG@10]
)
You can find more details about PyTerrier experiments here.
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.bright.stackoverflow.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{DBLP:conf/iclr/SuYXSMWLSST0YA025, author = {Hongjin Su and Howard Yen and Mengzhou Xia and Weijia Shi and Niklas Muennighoff and Han{-}yu Wang and Haisu Liu and Quan Shi and Zachary S. Siegel and Michael Tang and Ruoxi Sun and Jinsung Yoon and Sercan {\"{O}}. Arik and Danqi Chen and Tao Yu}, title = {{BRIGHT:} {A} Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval}, booktitle = {The Thirteenth International Conference on Learning Representations, {ICLR} 2025, Singapore, April 24-28, 2025}, publisher = {OpenReview.net}, year = {2025}, url = {https://openreview.net/forum?id=ykuc5q381b}, timestamp = {Thu, 15 May 2025 17:19:05 +0200}, biburl = {https://dblp.org/rec/conf/iclr/SuYXSMWLSST0YA025.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }{
"docs": {
"count": 107081,
"fields": {
"doc_id": {
"max_len": 145,
"common_prefix": ""
}
}
},
"queries": {
"count": 117
},
"qrels": {
"count": 478,
"fields": {
"relevance": {
"counts_by_value": {
"1": 478
}
}
}
}
}
Long-document variant of bright/stackoverflow, retrieving the original full-length documents.
Official evaluation measures: Success@1
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/stackoverflow-long")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, reasoning, gold_answer, gemini_1_0_reason, claude_3_opus_reason, gpt4_reason, grit_reason, llama3_70b_reason>
You can find more details about the Python API here.
ir_datasets export bright/stackoverflow-long queries
[query_id] [text] [reasoning] [gold_answer] [gemini_1_0_reason] [claude_3_opus_reason] [gpt4_reason] [grit_reason] [llama3_70b_reason]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:bright/stackoverflow-long')
index_ref = pt.IndexRef.of('./indices/bright_stackoverflow-long') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.bright.stackoverflow-long.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/stackoverflow-long")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export bright/stackoverflow-long docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:bright/stackoverflow-long')
# Index bright/stackoverflow-long
indexer = pt.IterDictIndexer('./indices/bright_stackoverflow-long', meta={"docno": 140})
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.bright.stackoverflow-long')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| -100 | Excluded from evaluation | 0 | 0.0% |
| 1 | Relevant | 129 | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/stackoverflow-long")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export bright/stackoverflow-long qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:bright/stackoverflow-long')
index_ref = pt.IndexRef.of('./indices/bright_stackoverflow-long') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[Success@1]
)
You can find more details about PyTerrier experiments here.
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.bright.stackoverflow-long.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{DBLP:conf/iclr/SuYXSMWLSST0YA025, author = {Hongjin Su and Howard Yen and Mengzhou Xia and Weijia Shi and Niklas Muennighoff and Han{-}yu Wang and Haisu Liu and Quan Shi and Zachary S. Siegel and Michael Tang and Ruoxi Sun and Jinsung Yoon and Sercan {\"{O}}. Arik and Danqi Chen and Tao Yu}, title = {{BRIGHT:} {A} Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval}, booktitle = {The Thirteenth International Conference on Learning Representations, {ICLR} 2025, Singapore, April 24-28, 2025}, publisher = {OpenReview.net}, year = {2025}, url = {https://openreview.net/forum?id=ykuc5q381b}, timestamp = {Thu, 15 May 2025 17:19:05 +0200}, biburl = {https://dblp.org/rec/conf/iclr/SuYXSMWLSST0YA025.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }{
"docs": {
"count": 1858,
"fields": {
"doc_id": {
"max_len": 140,
"common_prefix": ""
}
}
},
"queries": {
"count": 117
},
"qrels": {
"count": 129,
"fields": {
"relevance": {
"counts_by_value": {
"1": 129
}
}
}
}
}
Reasoning-intensive retrieval over sustainable living content sourced from StackExchange.
Official evaluation measures: nDCG@10
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/sustainable-living")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, reasoning, gold_answer, gemini_1_0_reason, claude_3_opus_reason, gpt4_reason, grit_reason, llama3_70b_reason>
You can find more details about the Python API here.
ir_datasets export bright/sustainable-living queries
[query_id] [text] [reasoning] [gold_answer] [gemini_1_0_reason] [claude_3_opus_reason] [gpt4_reason] [grit_reason] [llama3_70b_reason]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:bright/sustainable-living')
index_ref = pt.IndexRef.of('./indices/bright_sustainable-living') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.bright.sustainable-living.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/sustainable-living")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export bright/sustainable-living docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:bright/sustainable-living')
# Index bright/sustainable-living
indexer = pt.IterDictIndexer('./indices/bright_sustainable-living', meta={"docno": 260})
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.bright.sustainable-living')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| -100 | Excluded from evaluation | 0 | 0.0% |
| 1 | Relevant | 576 | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/sustainable-living")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export bright/sustainable-living qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:bright/sustainable-living')
index_ref = pt.IndexRef.of('./indices/bright_sustainable-living') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[nDCG@10]
)
You can find more details about PyTerrier experiments here.
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.bright.sustainable-living.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{DBLP:conf/iclr/SuYXSMWLSST0YA025, author = {Hongjin Su and Howard Yen and Mengzhou Xia and Weijia Shi and Niklas Muennighoff and Han{-}yu Wang and Haisu Liu and Quan Shi and Zachary S. Siegel and Michael Tang and Ruoxi Sun and Jinsung Yoon and Sercan {\"{O}}. Arik and Danqi Chen and Tao Yu}, title = {{BRIGHT:} {A} Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval}, booktitle = {The Thirteenth International Conference on Learning Representations, {ICLR} 2025, Singapore, April 24-28, 2025}, publisher = {OpenReview.net}, year = {2025}, url = {https://openreview.net/forum?id=ykuc5q381b}, timestamp = {Thu, 15 May 2025 17:19:05 +0200}, biburl = {https://dblp.org/rec/conf/iclr/SuYXSMWLSST0YA025.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }{
"docs": {
"count": 60792,
"fields": {
"doc_id": {
"max_len": 260,
"common_prefix": ""
}
}
},
"queries": {
"count": 108
},
"qrels": {
"count": 576,
"fields": {
"relevance": {
"counts_by_value": {
"1": 576
}
}
}
}
}
Long-document variant of bright/sustainable-living, retrieving the original full-length documents.
Official evaluation measures: Success@1
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/sustainable-living-long")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, reasoning, gold_answer, gemini_1_0_reason, claude_3_opus_reason, gpt4_reason, grit_reason, llama3_70b_reason>
You can find more details about the Python API here.
ir_datasets export bright/sustainable-living-long queries
[query_id] [text] [reasoning] [gold_answer] [gemini_1_0_reason] [claude_3_opus_reason] [gpt4_reason] [grit_reason] [llama3_70b_reason]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:bright/sustainable-living-long')
index_ref = pt.IndexRef.of('./indices/bright_sustainable-living-long') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.bright.sustainable-living-long.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/sustainable-living-long")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export bright/sustainable-living-long docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:bright/sustainable-living-long')
# Index bright/sustainable-living-long
indexer = pt.IterDictIndexer('./indices/bright_sustainable-living-long', meta={"docno": 257})
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.bright.sustainable-living-long')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| -100 | Excluded from evaluation | 0 | 0.0% |
| 1 | Relevant | 129 | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/sustainable-living-long")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export bright/sustainable-living-long qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:bright/sustainable-living-long')
index_ref = pt.IndexRef.of('./indices/bright_sustainable-living-long') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[Success@1]
)
You can find more details about PyTerrier experiments here.
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.bright.sustainable-living-long.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{DBLP:conf/iclr/SuYXSMWLSST0YA025, author = {Hongjin Su and Howard Yen and Mengzhou Xia and Weijia Shi and Niklas Muennighoff and Han{-}yu Wang and Haisu Liu and Quan Shi and Zachary S. Siegel and Michael Tang and Ruoxi Sun and Jinsung Yoon and Sercan {\"{O}}. Arik and Danqi Chen and Tao Yu}, title = {{BRIGHT:} {A} Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval}, booktitle = {The Thirteenth International Conference on Learning Representations, {ICLR} 2025, Singapore, April 24-28, 2025}, publisher = {OpenReview.net}, year = {2025}, url = {https://openreview.net/forum?id=ykuc5q381b}, timestamp = {Thu, 15 May 2025 17:19:05 +0200}, biburl = {https://dblp.org/rec/conf/iclr/SuYXSMWLSST0YA025.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }{
"docs": {
"count": 554,
"fields": {
"doc_id": {
"max_len": 257,
"common_prefix": ""
}
}
},
"queries": {
"count": 108
},
"qrels": {
"count": 129,
"fields": {
"relevance": {
"counts_by_value": {
"1": 129
}
}
}
}
}
TheoremQA questions, where relevance requires retrieving questions that rely on the same underlying theorem.
Official evaluation measures: nDCG@10
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/theoremqa-questions")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, reasoning, gold_answer, gemini_1_0_reason, claude_3_opus_reason, gpt4_reason, grit_reason, llama3_70b_reason>
You can find more details about the Python API here.
ir_datasets export bright/theoremqa-questions queries
[query_id] [text] [reasoning] [gold_answer] [gemini_1_0_reason] [claude_3_opus_reason] [gpt4_reason] [grit_reason] [llama3_70b_reason]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:bright/theoremqa-questions')
index_ref = pt.IndexRef.of('./indices/bright_theoremqa-questions') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.bright.theoremqa-questions.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/theoremqa-questions")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export bright/theoremqa-questions docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:bright/theoremqa-questions')
# Index bright/theoremqa-questions
indexer = pt.IterDictIndexer('./indices/bright_theoremqa-questions', meta={"docno": 62})
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.bright.theoremqa-questions')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| -100 | Excluded from evaluation | 607K | 99.9% |
| 1 | Relevant | 617 | 0.1% |
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/theoremqa-questions")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export bright/theoremqa-questions qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:bright/theoremqa-questions')
index_ref = pt.IndexRef.of('./indices/bright_theoremqa-questions') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[nDCG@10]
)
You can find more details about PyTerrier experiments here.
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.bright.theoremqa-questions.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{DBLP:conf/iclr/SuYXSMWLSST0YA025, author = {Hongjin Su and Howard Yen and Mengzhou Xia and Weijia Shi and Niklas Muennighoff and Han{-}yu Wang and Haisu Liu and Quan Shi and Zachary S. Siegel and Michael Tang and Ruoxi Sun and Jinsung Yoon and Sercan {\"{O}}. Arik and Danqi Chen and Tao Yu}, title = {{BRIGHT:} {A} Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval}, booktitle = {The Thirteenth International Conference on Learning Representations, {ICLR} 2025, Singapore, April 24-28, 2025}, publisher = {OpenReview.net}, year = {2025}, url = {https://openreview.net/forum?id=ykuc5q381b}, timestamp = {Thu, 15 May 2025 17:19:05 +0200}, biburl = {https://dblp.org/rec/conf/iclr/SuYXSMWLSST0YA025.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }{
"docs": {
"count": 188002,
"fields": {
"doc_id": {
"max_len": 62,
"common_prefix": ""
}
}
},
"queries": {
"count": 194
},
"qrels": {
"count": 607140,
"fields": {
"relevance": {
"counts_by_value": {
"1": 617,
"-100": 606523
}
}
}
}
}
TheoremQA theorems, where relevance requires retrieving the theorem(s) needed to answer the query.
Official evaluation measures: nDCG@10
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/theoremqa-theorems")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, reasoning, gold_answer, gemini_1_0_reason, claude_3_opus_reason, gpt4_reason, grit_reason, llama3_70b_reason>
You can find more details about the Python API here.
ir_datasets export bright/theoremqa-theorems queries
[query_id] [text] [reasoning] [gold_answer] [gemini_1_0_reason] [claude_3_opus_reason] [gpt4_reason] [grit_reason] [llama3_70b_reason]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:bright/theoremqa-theorems')
index_ref = pt.IndexRef.of('./indices/bright_theoremqa-theorems') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.bright.theoremqa-theorems.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/theoremqa-theorems")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export bright/theoremqa-theorems docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:bright/theoremqa-theorems')
# Index bright/theoremqa-theorems
indexer = pt.IterDictIndexer('./indices/bright_theoremqa-theorems')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.bright.theoremqa-theorems')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| -100 | Excluded from evaluation | 0 | 0.0% |
| 1 | Relevant | 151 | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("bright/theoremqa-theorems")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export bright/theoremqa-theorems qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:bright/theoremqa-theorems')
index_ref = pt.IndexRef.of('./indices/bright_theoremqa-theorems') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[nDCG@10]
)
You can find more details about PyTerrier experiments here.
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.bright.theoremqa-theorems.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{DBLP:conf/iclr/SuYXSMWLSST0YA025, author = {Hongjin Su and Howard Yen and Mengzhou Xia and Weijia Shi and Niklas Muennighoff and Han{-}yu Wang and Haisu Liu and Quan Shi and Zachary S. Siegel and Michael Tang and Ruoxi Sun and Jinsung Yoon and Sercan {\"{O}}. Arik and Danqi Chen and Tao Yu}, title = {{BRIGHT:} {A} Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval}, booktitle = {The Thirteenth International Conference on Learning Representations, {ICLR} 2025, Singapore, April 24-28, 2025}, publisher = {OpenReview.net}, year = {2025}, url = {https://openreview.net/forum?id=ykuc5q381b}, timestamp = {Thu, 15 May 2025 17:19:05 +0200}, biburl = {https://dblp.org/rec/conf/iclr/SuYXSMWLSST0YA025.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }{
"docs": {
"count": 23839,
"fields": {
"doc_id": {
"max_len": 5,
"common_prefix": ""
}
}
},
"queries": {
"count": 76
},
"qrels": {
"count": 151,
"fields": {
"relevance": {
"counts_by_value": {
"1": 151
}
}
}
}
}