← home
Github: datasets/beir.py

ir_datasets: Beir (benchmark suite)

Index
  1. beir
  2. beir/arguana
  3. beir/climate-fever
  4. beir/cqadupstack/android
  5. beir/cqadupstack/english
  6. beir/cqadupstack/gaming
  7. beir/cqadupstack/gis
  8. beir/cqadupstack/mathematica
  9. beir/cqadupstack/physics
  10. beir/cqadupstack/programmers
  11. beir/cqadupstack/stats
  12. beir/cqadupstack/tex
  13. beir/cqadupstack/unix
  14. beir/cqadupstack/webmasters
  15. beir/cqadupstack/wordpress
  16. beir/dbpedia-entity
  17. beir/dbpedia-entity/dev
  18. beir/dbpedia-entity/test
  19. beir/fever
  20. beir/fever/dev
  21. beir/fever/test
  22. beir/fever/train
  23. beir/fiqa
  24. beir/fiqa/dev
  25. beir/fiqa/test
  26. beir/fiqa/train
  27. beir/hotpotqa
  28. beir/hotpotqa/dev
  29. beir/hotpotqa/test
  30. beir/hotpotqa/train
  31. beir/msmarco
  32. beir/msmarco/dev
  33. beir/msmarco/test
  34. beir/msmarco/train
  35. beir/nfcorpus
  36. beir/nfcorpus/dev
  37. beir/nfcorpus/test
  38. beir/nfcorpus/train
  39. beir/nq
  40. beir/quora
  41. beir/quora/dev
  42. beir/quora/test
  43. beir/scidocs
  44. beir/scifact
  45. beir/scifact/test
  46. beir/scifact/train
  47. beir/trec-covid
  48. beir/webis-touche2020

"beir"

Beir is a suite of benchmarks to test zero-shot transfer.

Citation

ir_datasets.bib:

\cite{Thakur2021Beir}

Bibtex:

@article{Thakur2021Beir, title = "BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models", author = "Thakur, Nandan and Reimers, Nils and Rücklé, Andreas and Srivastava, Abhishek and Gurevych, Iryna", journal= "arXiv preprint arXiv:2104.08663", month = "4", year = "2021", url = "https://arxiv.org/abs/2104.08663", }

"beir/arguana"

A version of the ArguAna Counterargs dataset, for argument retrieval.

queriesdocsqrelsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/arguana")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/climate-fever"

A version of the CLIMATE-FEVER dataset, for fact verification on claims about climate.

queriesdocsqrelsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/climate-fever")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/cqadupstack/android"

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the android StackExchange subforum.

queriesdocsqrelsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/android")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/cqadupstack/english"

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the english StackExchange subforum.

queriesdocsqrelsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/english")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/cqadupstack/gaming"

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the gaming StackExchange subforum.

queriesdocsqrelsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/gaming")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/cqadupstack/gis"

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the gis StackExchange subforum.

queriesdocsqrelsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/gis")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/cqadupstack/mathematica"

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the mathematica StackExchange subforum.

queriesdocsqrelsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/mathematica")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/cqadupstack/physics"

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the physics StackExchange subforum.

queriesdocsqrelsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/physics")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/cqadupstack/programmers"

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the programmers StackExchange subforum.

queriesdocsqrelsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/programmers")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/cqadupstack/stats"

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the stats StackExchange subforum.

queriesdocsqrelsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/stats")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/cqadupstack/tex"

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the tex StackExchange subforum.

queriesdocsqrelsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/tex")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/cqadupstack/unix"

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the unix StackExchange subforum.

queriesdocsqrelsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/unix")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/cqadupstack/webmasters"

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the webmasters StackExchange subforum.

queriesdocsqrelsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/webmasters")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/cqadupstack/wordpress"

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the wordpress StackExchange subforum.

queriesdocsqrelsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/cqadupstack/wordpress")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/dbpedia-entity"

A version of the DBPedia-Entity-v2 dataset for entity retrieval.

queriesdocsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/dbpedia-entity")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/dbpedia-entity/dev"

A random sample of 67 queries from the official test set, used as a dev set.

queriesdocsqrelsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/dbpedia-entity/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/dbpedia-entity/test"

A the official test set, without 67 queries used as a dev set.

queriesdocsqrelsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/dbpedia-entity/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/fever"

A version of the FEVER dataset for fact verification. Includes queries from the /train /dev and /test subsets.

queriesdocsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/fever")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/fever/dev"

The official dev set.

queriesdocsqrelsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/fever/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/fever/test"

The official test set.

queriesdocsqrelsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/fever/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/fever/train"

The official train set.

queriesdocsqrelsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/fever/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/fiqa"

A version of the FIQA-2018 dataset (financial opinion question answering). Queries include those in the /train /dev and /test subsets.

queriesdocsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/fiqa")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/fiqa/dev"

Random sample of 500 queries from the official dataset.

queriesdocsqrelsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/fiqa/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/fiqa/test"

Random sample of 648 queries from the official dataset.

queriesdocsqrelsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/fiqa/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/fiqa/train"

Official dataset without the 1148 queries sampled for /dev and /test.

queriesdocsqrelsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/fiqa/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/hotpotqa"

A version of the Hotpot QA dataset for multi-hop question answering. Queries include all those in /train /dev and /test.

queriesdocsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/hotpotqa")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/hotpotqa/dev"

Random selection of the 5447 queries from /train.

queriesdocsqrelsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/hotpotqa/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/hotpotqa/test"

Official dev set from HotpotQA, here used as a test set.

queriesdocsqrelsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/hotpotqa/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/hotpotqa/train"

Official train set, without the random selection of the 5447 queries used for /dev.

queriesdocsqrelsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/hotpotqa/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/msmarco"

A version of the MS MARCO passage ranking dataset. Includes queries from the /train, /dev, and /test sub-datasets.

Note that this version differs from msmarco-passage, in that it does not correct the encoding problems in the source documents.

queriesdocsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/msmarco")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/msmarco/dev"

A version of the MS MARCO passage ranking dev set.

queriesdocsqrelsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/msmarco/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/msmarco/test"

A version of the TREC Deep Learning 2019 set.

queriesdocsqrelsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/msmarco/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/msmarco/train"

A version of the MS MARCO passage ranking train set.

queriesdocsqrelsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/msmarco/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/nfcorpus"

A version of the NF Corpus (Nutrition Facts). Queries use the "title" variant of the query, which here are often natural language questions. Queries include all those from /train /dev and /test.

Data pre-processing may be different than what is done in nfcorpus.

queriesdocsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/nfcorpus")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/nfcorpus/dev"

Combined dev set of NFCorpus.

queriesdocsqrelsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/nfcorpus/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/nfcorpus/test"

Combined test set of NFCorpus.

queriesdocsqrelsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/nfcorpus/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/nfcorpus/train"

Combined train set of NFCorpus.

queriesdocsqrelsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/nfcorpus/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/nq"

A version of the Natural Questions dev dataset.

Data pre-processing differs both from what is done in natural-questions and dpr-w100/natural-questions, especially with respect to the document collection and filtering conducted on the queries. See the Beir paper for details.

queriesdocsqrelsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/nq")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/quora"

A version of the Quora duplicate question detection dataset (QQP). Includes queries from /dev and /test sets.

queriesdocsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/quora")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/quora/dev"

A 5,000 question subset of the original dataset, without overlaps in the other subsets.

queriesdocsqrelsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/quora/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/quora/test"

A 10,000 question subset of the original dataset, without overlaps in the other subsets.

queriesdocsqrelsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/quora/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/scidocs"

A version of the SciDocs dataset, used for citation retrieval.

queriesdocsqrelsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/scidocs")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/scifact"

A version of the SciFact dataset, for fact verification. Queries include those form the /train and /test sets.

queriesdocsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/scifact")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/scifact/test"

The official dev set.

queriesdocsqrelsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/scifact/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/scifact/train"

The official train set.

queriesdocsqrelsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/scifact/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/trec-covid"

A version of the TREC COVID (complete) dataset, with titles and abstracts as documents. Queries are the question variant.

Data pre-processing may be different than what is done in cord19/trec-covid.

queriesdocsqrelsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/trec-covid")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.


"beir/webis-touche2020"

A version of the Touchè-2020 dataset, for argument retrieval.

Negative relevance judgments from the original dataset are replaced with 0.

queriesdocsqrelsCitation

Language: en

Query type:
BeirQuery: (namedtuple)
  1. query_id: str
  2. text: str
  3. metadata: Dict[str,str]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("beir/webis-touche2020")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text, metadata>

You can find more details about the Python API here.