ir_datasets
: Beir (benchmark suite)A version of the ArguAna Counterargs dataset, for argument retrieval.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/arguana")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/arguana queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/arguana')
index_ref = pt.IndexRef.of('./indices/beir_arguana') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/arguana")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/arguana docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/arguana')
# Index beir/arguana
indexer = pt.IterDictIndexer('./indices/beir_arguana')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/arguana")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export beir/arguana qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:beir/arguana')
index_ref = pt.IndexRef.of('./indices/beir_arguana') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
A version of the CLIMATE-FEVER dataset, for fact verification on claims about climate.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/climate-fever")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/climate-fever queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/climate-fever')
index_ref = pt.IndexRef.of('./indices/beir_climate-fever') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/climate-fever")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/climate-fever docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/climate-fever')
# Index beir/climate-fever
indexer = pt.IterDictIndexer('./indices/beir_climate-fever')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/climate-fever")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export beir/climate-fever qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:beir/climate-fever')
index_ref = pt.IndexRef.of('./indices/beir_climate-fever') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the android StackExchange subforum.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/android")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/cqadupstack/android queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/cqadupstack/android')
index_ref = pt.IndexRef.of('./indices/beir_cqadupstack_android') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/android")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/cqadupstack/android docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/cqadupstack/android')
# Index beir/cqadupstack/android
indexer = pt.IterDictIndexer('./indices/beir_cqadupstack_android')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/android")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export beir/cqadupstack/android qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:beir/cqadupstack/android')
index_ref = pt.IndexRef.of('./indices/beir_cqadupstack_android') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the english StackExchange subforum.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/english")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/cqadupstack/english queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/cqadupstack/english')
index_ref = pt.IndexRef.of('./indices/beir_cqadupstack_english') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/english")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/cqadupstack/english docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/cqadupstack/english')
# Index beir/cqadupstack/english
indexer = pt.IterDictIndexer('./indices/beir_cqadupstack_english')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/english")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export beir/cqadupstack/english qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:beir/cqadupstack/english')
index_ref = pt.IndexRef.of('./indices/beir_cqadupstack_english') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the gaming StackExchange subforum.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/gaming")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/cqadupstack/gaming queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/cqadupstack/gaming')
index_ref = pt.IndexRef.of('./indices/beir_cqadupstack_gaming') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/gaming")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/cqadupstack/gaming docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/cqadupstack/gaming')
# Index beir/cqadupstack/gaming
indexer = pt.IterDictIndexer('./indices/beir_cqadupstack_gaming')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/gaming")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export beir/cqadupstack/gaming qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:beir/cqadupstack/gaming')
index_ref = pt.IndexRef.of('./indices/beir_cqadupstack_gaming') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the gis StackExchange subforum.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/gis")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/cqadupstack/gis queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/cqadupstack/gis')
index_ref = pt.IndexRef.of('./indices/beir_cqadupstack_gis') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/gis")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/cqadupstack/gis docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/cqadupstack/gis')
# Index beir/cqadupstack/gis
indexer = pt.IterDictIndexer('./indices/beir_cqadupstack_gis')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/gis")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export beir/cqadupstack/gis qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:beir/cqadupstack/gis')
index_ref = pt.IndexRef.of('./indices/beir_cqadupstack_gis') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the mathematica StackExchange subforum.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/mathematica")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/cqadupstack/mathematica queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/cqadupstack/mathematica')
index_ref = pt.IndexRef.of('./indices/beir_cqadupstack_mathematica') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/mathematica")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/cqadupstack/mathematica docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/cqadupstack/mathematica')
# Index beir/cqadupstack/mathematica
indexer = pt.IterDictIndexer('./indices/beir_cqadupstack_mathematica')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/mathematica")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export beir/cqadupstack/mathematica qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:beir/cqadupstack/mathematica')
index_ref = pt.IndexRef.of('./indices/beir_cqadupstack_mathematica') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the physics StackExchange subforum.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/physics")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/cqadupstack/physics queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/cqadupstack/physics')
index_ref = pt.IndexRef.of('./indices/beir_cqadupstack_physics') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/physics")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/cqadupstack/physics docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/cqadupstack/physics')
# Index beir/cqadupstack/physics
indexer = pt.IterDictIndexer('./indices/beir_cqadupstack_physics')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/physics")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export beir/cqadupstack/physics qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:beir/cqadupstack/physics')
index_ref = pt.IndexRef.of('./indices/beir_cqadupstack_physics') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the programmers StackExchange subforum.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/programmers")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/cqadupstack/programmers queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/cqadupstack/programmers')
index_ref = pt.IndexRef.of('./indices/beir_cqadupstack_programmers') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/programmers")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/cqadupstack/programmers docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/cqadupstack/programmers')
# Index beir/cqadupstack/programmers
indexer = pt.IterDictIndexer('./indices/beir_cqadupstack_programmers')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/programmers")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export beir/cqadupstack/programmers qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:beir/cqadupstack/programmers')
index_ref = pt.IndexRef.of('./indices/beir_cqadupstack_programmers') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the stats StackExchange subforum.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/stats")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/cqadupstack/stats queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/cqadupstack/stats')
index_ref = pt.IndexRef.of('./indices/beir_cqadupstack_stats') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/stats")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/cqadupstack/stats docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/cqadupstack/stats')
# Index beir/cqadupstack/stats
indexer = pt.IterDictIndexer('./indices/beir_cqadupstack_stats')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/stats")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export beir/cqadupstack/stats qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:beir/cqadupstack/stats')
index_ref = pt.IndexRef.of('./indices/beir_cqadupstack_stats') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the tex StackExchange subforum.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/tex")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/cqadupstack/tex queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/cqadupstack/tex')
index_ref = pt.IndexRef.of('./indices/beir_cqadupstack_tex') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/tex")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/cqadupstack/tex docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/cqadupstack/tex')
# Index beir/cqadupstack/tex
indexer = pt.IterDictIndexer('./indices/beir_cqadupstack_tex')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/tex")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export beir/cqadupstack/tex qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:beir/cqadupstack/tex')
index_ref = pt.IndexRef.of('./indices/beir_cqadupstack_tex') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the unix StackExchange subforum.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/unix")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/cqadupstack/unix queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/cqadupstack/unix')
index_ref = pt.IndexRef.of('./indices/beir_cqadupstack_unix') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/unix")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/cqadupstack/unix docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/cqadupstack/unix')
# Index beir/cqadupstack/unix
indexer = pt.IterDictIndexer('./indices/beir_cqadupstack_unix')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/unix")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export beir/cqadupstack/unix qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:beir/cqadupstack/unix')
index_ref = pt.IndexRef.of('./indices/beir_cqadupstack_unix') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the webmasters StackExchange subforum.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/webmasters")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/cqadupstack/webmasters queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/cqadupstack/webmasters')
index_ref = pt.IndexRef.of('./indices/beir_cqadupstack_webmasters') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/webmasters")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/cqadupstack/webmasters docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/cqadupstack/webmasters')
# Index beir/cqadupstack/webmasters
indexer = pt.IterDictIndexer('./indices/beir_cqadupstack_webmasters')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/webmasters")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export beir/cqadupstack/webmasters qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:beir/cqadupstack/webmasters')
index_ref = pt.IndexRef.of('./indices/beir_cqadupstack_webmasters') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the wordpress StackExchange subforum.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/wordpress")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/cqadupstack/wordpress queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/cqadupstack/wordpress')
index_ref = pt.IndexRef.of('./indices/beir_cqadupstack_wordpress') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/wordpress")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/cqadupstack/wordpress docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/cqadupstack/wordpress')
# Index beir/cqadupstack/wordpress
indexer = pt.IterDictIndexer('./indices/beir_cqadupstack_wordpress')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/wordpress")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export beir/cqadupstack/wordpress qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:beir/cqadupstack/wordpress')
index_ref = pt.IndexRef.of('./indices/beir_cqadupstack_wordpress') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
A version of the DBPedia-Entity-v2 dataset for entity retrieval.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/dbpedia-entity")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/dbpedia-entity queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/dbpedia-entity')
index_ref = pt.IndexRef.of('./indices/beir_dbpedia-entity') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/dbpedia-entity")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/dbpedia-entity docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/dbpedia-entity')
# Index beir/dbpedia-entity
indexer = pt.IterDictIndexer('./indices/beir_dbpedia-entity')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
A random sample of 67 queries from the official test set, used as a dev set.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/dbpedia-entity/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/dbpedia-entity/dev queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/dbpedia-entity/dev')
index_ref = pt.IndexRef.of('./indices/beir_dbpedia-entity') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Note: Uses docs from beir/dbpedia-entity
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/dbpedia-entity/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/dbpedia-entity/dev docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/dbpedia-entity/dev')
# Index beir/dbpedia-entity
indexer = pt.IterDictIndexer('./indices/beir_dbpedia-entity')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/dbpedia-entity/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export beir/dbpedia-entity/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:beir/dbpedia-entity/dev')
index_ref = pt.IndexRef.of('./indices/beir_dbpedia-entity') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
A the official test set, without 67 queries used as a dev set.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/dbpedia-entity/test")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/dbpedia-entity/test queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/dbpedia-entity/test')
index_ref = pt.IndexRef.of('./indices/beir_dbpedia-entity') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Note: Uses docs from beir/dbpedia-entity
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/dbpedia-entity/test")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/dbpedia-entity/test docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/dbpedia-entity/test')
# Index beir/dbpedia-entity
indexer = pt.IterDictIndexer('./indices/beir_dbpedia-entity')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/dbpedia-entity/test")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export beir/dbpedia-entity/test qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:beir/dbpedia-entity/test')
index_ref = pt.IndexRef.of('./indices/beir_dbpedia-entity') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
A version of the FEVER dataset for fact verification. Includes queries from the /train /dev and /test subsets.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/fever")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/fever queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/fever')
index_ref = pt.IndexRef.of('./indices/beir_fever') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/fever")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/fever docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/fever')
# Index beir/fever
indexer = pt.IterDictIndexer('./indices/beir_fever')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
The official dev set.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/fever/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/fever/dev queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/fever/dev')
index_ref = pt.IndexRef.of('./indices/beir_fever') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Note: Uses docs from beir/fever
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/fever/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/fever/dev docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/fever/dev')
# Index beir/fever
indexer = pt.IterDictIndexer('./indices/beir_fever')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/fever/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export beir/fever/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:beir/fever/dev')
index_ref = pt.IndexRef.of('./indices/beir_fever') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
The official test set.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/fever/test")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/fever/test queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/fever/test')
index_ref = pt.IndexRef.of('./indices/beir_fever') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Note: Uses docs from beir/fever
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/fever/test")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/fever/test docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/fever/test')
# Index beir/fever
indexer = pt.IterDictIndexer('./indices/beir_fever')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/fever/test")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export beir/fever/test qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:beir/fever/test')
index_ref = pt.IndexRef.of('./indices/beir_fever') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
The official train set.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/fever/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/fever/train queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/fever/train')
index_ref = pt.IndexRef.of('./indices/beir_fever') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Note: Uses docs from beir/fever
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/fever/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/fever/train docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/fever/train')
# Index beir/fever
indexer = pt.IterDictIndexer('./indices/beir_fever')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/fever/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export beir/fever/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:beir/fever/train')
index_ref = pt.IndexRef.of('./indices/beir_fever') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
A version of the FIQA-2018 dataset (financial opinion question answering). Queries include those in the /train /dev and /test subsets.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/fiqa")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/fiqa queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/fiqa')
index_ref = pt.IndexRef.of('./indices/beir_fiqa') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/fiqa")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/fiqa docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/fiqa')
# Index beir/fiqa
indexer = pt.IterDictIndexer('./indices/beir_fiqa')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
Random sample of 500 queries from the official dataset.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/fiqa/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/fiqa/dev queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/fiqa/dev')
index_ref = pt.IndexRef.of('./indices/beir_fiqa') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Note: Uses docs from beir/fiqa
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/fiqa/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/fiqa/dev docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/fiqa/dev')
# Index beir/fiqa
indexer = pt.IterDictIndexer('./indices/beir_fiqa')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/fiqa/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export beir/fiqa/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:beir/fiqa/dev')
index_ref = pt.IndexRef.of('./indices/beir_fiqa') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
Random sample of 648 queries from the official dataset.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/fiqa/test")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/fiqa/test queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/fiqa/test')
index_ref = pt.IndexRef.of('./indices/beir_fiqa') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Note: Uses docs from beir/fiqa
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/fiqa/test")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/fiqa/test docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/fiqa/test')
# Index beir/fiqa
indexer = pt.IterDictIndexer('./indices/beir_fiqa')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/fiqa/test")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export beir/fiqa/test qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:beir/fiqa/test')
index_ref = pt.IndexRef.of('./indices/beir_fiqa') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
Official dataset without the 1148 queries sampled for /dev and /test.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/fiqa/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/fiqa/train queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/fiqa/train')
index_ref = pt.IndexRef.of('./indices/beir_fiqa') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Note: Uses docs from beir/fiqa
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/fiqa/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/fiqa/train docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/fiqa/train')
# Index beir/fiqa
indexer = pt.IterDictIndexer('./indices/beir_fiqa')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/fiqa/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export beir/fiqa/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:beir/fiqa/train')
index_ref = pt.IndexRef.of('./indices/beir_fiqa') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
A version of the Hotpot QA dataset for multi-hop question answering. Queries include all those in /train /dev and /test.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/hotpotqa")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/hotpotqa queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/hotpotqa')
index_ref = pt.IndexRef.of('./indices/beir_hotpotqa') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/hotpotqa")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/hotpotqa docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/hotpotqa')
# Index beir/hotpotqa
indexer = pt.IterDictIndexer('./indices/beir_hotpotqa')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
Random selection of the 5447 queries from /train.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/hotpotqa/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/hotpotqa/dev queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/hotpotqa/dev')
index_ref = pt.IndexRef.of('./indices/beir_hotpotqa') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Note: Uses docs from beir/hotpotqa
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/hotpotqa/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/hotpotqa/dev docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/hotpotqa/dev')
# Index beir/hotpotqa
indexer = pt.IterDictIndexer('./indices/beir_hotpotqa')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/hotpotqa/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export beir/hotpotqa/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:beir/hotpotqa/dev')
index_ref = pt.IndexRef.of('./indices/beir_hotpotqa') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
Official dev set from HotpotQA, here used as a test set.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/hotpotqa/test")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/hotpotqa/test queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/hotpotqa/test')
index_ref = pt.IndexRef.of('./indices/beir_hotpotqa') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Note: Uses docs from beir/hotpotqa
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/hotpotqa/test")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/hotpotqa/test docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/hotpotqa/test')
# Index beir/hotpotqa
indexer = pt.IterDictIndexer('./indices/beir_hotpotqa')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/hotpotqa/test")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export beir/hotpotqa/test qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:beir/hotpotqa/test')
index_ref = pt.IndexRef.of('./indices/beir_hotpotqa') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
Official train set, without the random selection of the 5447 queries used for /dev.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/hotpotqa/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/hotpotqa/train queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/hotpotqa/train')
index_ref = pt.IndexRef.of('./indices/beir_hotpotqa') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Note: Uses docs from beir/hotpotqa
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/hotpotqa/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/hotpotqa/train docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/hotpotqa/train')
# Index beir/hotpotqa
indexer = pt.IterDictIndexer('./indices/beir_hotpotqa')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/hotpotqa/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export beir/hotpotqa/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:beir/hotpotqa/train')
index_ref = pt.IndexRef.of('./indices/beir_hotpotqa') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
A version of the MS MARCO passage ranking dataset. Includes queries from the /train, /dev, and /test sub-datasets.
Note that this version differs from msmarco-passage, in that it does not correct the encoding problems in the source documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/msmarco")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/msmarco queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/msmarco')
index_ref = pt.IndexRef.of('./indices/beir_msmarco') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/msmarco")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/msmarco docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/msmarco')
# Index beir/msmarco
indexer = pt.IterDictIndexer('./indices/beir_msmarco')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
A version of the MS MARCO passage ranking dev set.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/msmarco/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/msmarco/dev queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/msmarco/dev')
index_ref = pt.IndexRef.of('./indices/beir_msmarco') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Note: Uses docs from beir/msmarco
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/msmarco/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/msmarco/dev docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/msmarco/dev')
# Index beir/msmarco
indexer = pt.IterDictIndexer('./indices/beir_msmarco')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/msmarco/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export beir/msmarco/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:beir/msmarco/dev')
index_ref = pt.IndexRef.of('./indices/beir_msmarco') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
A version of the TREC Deep Learning 2019 set.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/msmarco/test")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/msmarco/test queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/msmarco/test')
index_ref = pt.IndexRef.of('./indices/beir_msmarco') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Note: Uses docs from beir/msmarco
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/msmarco/test")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/msmarco/test docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/msmarco/test')
# Index beir/msmarco
indexer = pt.IterDictIndexer('./indices/beir_msmarco')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/msmarco/test")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export beir/msmarco/test qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:beir/msmarco/test')
index_ref = pt.IndexRef.of('./indices/beir_msmarco') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
A version of the MS MARCO passage ranking train set.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/msmarco/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/msmarco/train queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/msmarco/train')
index_ref = pt.IndexRef.of('./indices/beir_msmarco') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Note: Uses docs from beir/msmarco
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/msmarco/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/msmarco/train docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/msmarco/train')
# Index beir/msmarco
indexer = pt.IterDictIndexer('./indices/beir_msmarco')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/msmarco/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export beir/msmarco/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:beir/msmarco/train')
index_ref = pt.IndexRef.of('./indices/beir_msmarco') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
A version of the NF Corpus (Nutrition Facts). Queries use the "title" variant of the query, which here are often natural language questions. Queries include all those from /train /dev and /test.
Data pre-processing may be different than what is done in nfcorpus.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/nfcorpus")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/nfcorpus queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/nfcorpus')
index_ref = pt.IndexRef.of('./indices/beir_nfcorpus') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/nfcorpus")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/nfcorpus docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/nfcorpus')
# Index beir/nfcorpus
indexer = pt.IterDictIndexer('./indices/beir_nfcorpus')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
Combined dev set of NFCorpus.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/nfcorpus/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/nfcorpus/dev queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/nfcorpus/dev')
index_ref = pt.IndexRef.of('./indices/beir_nfcorpus') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Note: Uses docs from beir/nfcorpus
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/nfcorpus/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/nfcorpus/dev docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/nfcorpus/dev')
# Index beir/nfcorpus
indexer = pt.IterDictIndexer('./indices/beir_nfcorpus')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/nfcorpus/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export beir/nfcorpus/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:beir/nfcorpus/dev')
index_ref = pt.IndexRef.of('./indices/beir_nfcorpus') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
Combined test set of NFCorpus.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/nfcorpus/test")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/nfcorpus/test queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/nfcorpus/test')
index_ref = pt.IndexRef.of('./indices/beir_nfcorpus') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Note: Uses docs from beir/nfcorpus
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/nfcorpus/test")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/nfcorpus/test docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/nfcorpus/test')
# Index beir/nfcorpus
indexer = pt.IterDictIndexer('./indices/beir_nfcorpus')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/nfcorpus/test")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export beir/nfcorpus/test qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:beir/nfcorpus/test')
index_ref = pt.IndexRef.of('./indices/beir_nfcorpus') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
Combined train set of NFCorpus.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/nfcorpus/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/nfcorpus/train queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/nfcorpus/train')
index_ref = pt.IndexRef.of('./indices/beir_nfcorpus') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Note: Uses docs from beir/nfcorpus
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/nfcorpus/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/nfcorpus/train docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/nfcorpus/train')
# Index beir/nfcorpus
indexer = pt.IterDictIndexer('./indices/beir_nfcorpus')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/nfcorpus/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export beir/nfcorpus/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:beir/nfcorpus/train')
index_ref = pt.IndexRef.of('./indices/beir_nfcorpus') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
A version of the Natural Questions dev dataset.
Data pre-processing differs both from what is done in natural-questions and dpr-w100/natural-questions, especially with respect to the document collection and filtering conducted on the queries. See the Beir paper for details.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/nq")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/nq queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/nq')
index_ref = pt.IndexRef.of('./indices/beir_nq') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/nq")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/nq docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/nq')
# Index beir/nq
indexer = pt.IterDictIndexer('./indices/beir_nq')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/nq")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export beir/nq qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:beir/nq')
index_ref = pt.IndexRef.of('./indices/beir_nq') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
A version of the Quora duplicate question detection dataset (QQP). Includes queries from /dev and /test sets.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/quora")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/quora queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/quora')
index_ref = pt.IndexRef.of('./indices/beir_quora') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/quora")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/quora docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/quora')
# Index beir/quora
indexer = pt.IterDictIndexer('./indices/beir_quora')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
A 5,000 question subset of the original dataset, without overlaps in the other subsets.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/quora/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/quora/dev queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/quora/dev')
index_ref = pt.IndexRef.of('./indices/beir_quora') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Note: Uses docs from beir/quora
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/quora/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/quora/dev docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/quora/dev')
# Index beir/quora
indexer = pt.IterDictIndexer('./indices/beir_quora')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/quora/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export beir/quora/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:beir/quora/dev')
index_ref = pt.IndexRef.of('./indices/beir_quora') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
A 10,000 question subset of the original dataset, without overlaps in the other subsets.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/quora/test")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/quora/test queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/quora/test')
index_ref = pt.IndexRef.of('./indices/beir_quora') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Note: Uses docs from beir/quora
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/quora/test")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/quora/test docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/quora/test')
# Index beir/quora
indexer = pt.IterDictIndexer('./indices/beir_quora')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/quora/test")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export beir/quora/test qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:beir/quora/test')
index_ref = pt.IndexRef.of('./indices/beir_quora') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
A version of the SciDocs dataset, used for citation retrieval.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/scidocs")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/scidocs queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/scidocs')
index_ref = pt.IndexRef.of('./indices/beir_scidocs') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/scidocs")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/scidocs docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/scidocs')
# Index beir/scidocs
indexer = pt.IterDictIndexer('./indices/beir_scidocs')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/scidocs")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export beir/scidocs qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:beir/scidocs')
index_ref = pt.IndexRef.of('./indices/beir_scidocs') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
A version of the SciFact dataset, for fact verification. Queries include those form the /train and /test sets.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/scifact")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/scifact queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/scifact')
index_ref = pt.IndexRef.of('./indices/beir_scifact') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/scifact")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/scifact docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/scifact')
# Index beir/scifact
indexer = pt.IterDictIndexer('./indices/beir_scifact')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
The official dev set.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/scifact/test")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/scifact/test queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/scifact/test')
index_ref = pt.IndexRef.of('./indices/beir_scifact') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Note: Uses docs from beir/scifact
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/scifact/test")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/scifact/test docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/scifact/test')
# Index beir/scifact
indexer = pt.IterDictIndexer('./indices/beir_scifact')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/scifact/test")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export beir/scifact/test qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:beir/scifact/test')
index_ref = pt.IndexRef.of('./indices/beir_scifact') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
The official train set.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/scifact/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/scifact/train queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/scifact/train')
index_ref = pt.IndexRef.of('./indices/beir_scifact') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Note: Uses docs from beir/scifact
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/scifact/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/scifact/train docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/scifact/train')
# Index beir/scifact
indexer = pt.IterDictIndexer('./indices/beir_scifact')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/scifact/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export beir/scifact/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:beir/scifact/train')
index_ref = pt.IndexRef.of('./indices/beir_scifact') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
A version of the TREC COVID (complete) dataset, with titles and abstracts as documents. Queries are the question variant.
Data pre-processing may be different than what is done in cord19/trec-covid.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/trec-covid")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/trec-covid queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/trec-covid')
index_ref = pt.IndexRef.of('./indices/beir_trec-covid') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/trec-covid")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/trec-covid docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/trec-covid')
# Index beir/trec-covid
indexer = pt.IterDictIndexer('./indices/beir_trec-covid')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/trec-covid")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export beir/trec-covid qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:beir/trec-covid')
index_ref = pt.IndexRef.of('./indices/beir_trec-covid') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
A version of the Touchè-2020 dataset, for argument retrieval.
Negative relevance judgments from the original dataset are replaced with 0.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/webis-touche2020")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, metadata>
You can find more details about the Python API here.
ir_datasets export beir/webis-touche2020 queries
[query_id] [text] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/webis-touche2020')
index_ref = pt.IndexRef.of('./indices/beir_webis-touche2020') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/webis-touche2020")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text, title, metadata>
You can find more details about the Python API here.
ir_datasets export beir/webis-touche2020 docs
[doc_id] [text] [title] [metadata]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:beir/webis-touche2020')
# Index beir/webis-touche2020
indexer = pt.IterDictIndexer('./indices/beir_webis-touche2020')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text', 'title'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|
Examples:
import ir_datasets
dataset = ir_datasets.load("beir/webis-touche2020")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export beir/webis-touche2020 qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:beir/webis-touche2020')
index_ref = pt.IndexRef.of('./indices/beir_webis-touche2020') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.