← home
Github: datasets/miracl.py

ir_datasets: MIRACL

Index
  1. miracl
  2. miracl/ar
  3. miracl/ar/dev
  4. miracl/ar/test-a
  5. miracl/ar/test-b
  6. miracl/ar/train
  7. miracl/bn
  8. miracl/bn/dev
  9. miracl/bn/test-a
  10. miracl/bn/test-b
  11. miracl/bn/train
  12. miracl/de
  13. miracl/de/dev
  14. miracl/de/test-b
  15. miracl/en
  16. miracl/en/dev
  17. miracl/en/test-a
  18. miracl/en/test-b
  19. miracl/en/train
  20. miracl/es
  21. miracl/es/dev
  22. miracl/es/test-b
  23. miracl/es/train
  24. miracl/fa
  25. miracl/fa/dev
  26. miracl/fa/test-b
  27. miracl/fa/train
  28. miracl/fi
  29. miracl/fi/dev
  30. miracl/fi/test-a
  31. miracl/fi/test-b
  32. miracl/fi/train
  33. miracl/fr
  34. miracl/fr/dev
  35. miracl/fr/test-b
  36. miracl/fr/train
  37. miracl/hi
  38. miracl/hi/dev
  39. miracl/hi/test-b
  40. miracl/hi/train
  41. miracl/id
  42. miracl/id/dev
  43. miracl/id/test-a
  44. miracl/id/test-b
  45. miracl/id/train
  46. miracl/ja
  47. miracl/ja/dev
  48. miracl/ja/test-a
  49. miracl/ja/test-b
  50. miracl/ja/train
  51. miracl/ko
  52. miracl/ko/dev
  53. miracl/ko/test-a
  54. miracl/ko/test-b
  55. miracl/ko/train
  56. miracl/ru
  57. miracl/ru/dev
  58. miracl/ru/test-a
  59. miracl/ru/test-b
  60. miracl/ru/train
  61. miracl/sw
  62. miracl/sw/dev
  63. miracl/sw/test-a
  64. miracl/sw/test-b
  65. miracl/sw/train
  66. miracl/te
  67. miracl/te/dev
  68. miracl/te/test-a
  69. miracl/te/test-b
  70. miracl/te/train
  71. miracl/th
  72. miracl/th/dev
  73. miracl/th/test-a
  74. miracl/th/test-b
  75. miracl/th/train
  76. miracl/yo
  77. miracl/yo/dev
  78. miracl/yo/test-b
  79. miracl/zh
  80. miracl/zh/dev
  81. miracl/zh/test-b
  82. miracl/zh/train

"miracl"

MIRACL is a multilingual adhoc retrieval dataset covering 18 languages. The document corpora are based on Wikipedia dumps, which are split into passages.

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }

"miracl/ar"

The Arabic corpus.

docs
2.1M docs

Language: ar

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/ar")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/ar docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ar')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/ar/dev"

The dev set for Arabic.

queries
2.9K queries

Language: ar

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/ar/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/ar/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ar.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
2.1M docs

Inherits docs from miracl/ar

Language: ar

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/ar/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/ar/dev docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ar.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
29K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
0Not Relevant24K80.6%
1Relevant5.7K19.4%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/ar/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/ar/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.ar.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/ar/test-a"

The held-out test set (version a) for Arabic.

queries
936 queries

Language: ar

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/ar/test-a")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/ar/test-a queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ar.test-a.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
2.1M docs

Inherits docs from miracl/ar

Language: ar

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/ar/test-a")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/ar/test-a docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ar.test-a')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/ar/test-b"

The held-out test set (version b) for Arabic.

queries
1.4K queries

Language: ar

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/ar/test-b")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/ar/test-b queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ar.test-b.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
2.1M docs

Inherits docs from miracl/ar

Language: ar

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/ar/test-b")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/ar/test-b docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ar.test-b')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/ar/train"

The train set for Arabic.

queries
3.5K queries

Language: ar

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/ar/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/ar/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ar.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
2.1M docs

Inherits docs from miracl/ar

Language: ar

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/ar/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/ar/train docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ar.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
25K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
0Not Relevant19K75.5%
1Relevant6.2K24.5%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/ar/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/ar/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.ar.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/bn"

The Bengali corpus.

docs
297K docs

Language: bn

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/bn")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/bn docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.bn')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/bn/dev"

The dev set for Bengali.

queries
411 queries

Language: bn

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/bn/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/bn/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.bn.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
297K docs

Inherits docs from miracl/bn

Language: bn

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/bn/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/bn/dev docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.bn.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
4.2K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
0Not Relevant3.3K79.5%
1Relevant863 20.5%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/bn/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/bn/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.bn.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/bn/test-a"

The held-out test set (version a) for Bengali.

queries
102 queries

Language: bn

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/bn/test-a")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/bn/test-a queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.bn.test-a.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
297K docs

Inherits docs from miracl/bn

Language: bn

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/bn/test-a")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/bn/test-a docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.bn.test-a')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/bn/test-b"

The held-out test set (version b) for Bengali.

queries
1.1K queries

Language: bn

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/bn/test-b")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/bn/test-b queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.bn.test-b.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
297K docs

Inherits docs from miracl/bn

Language: bn

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/bn/test-b")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/bn/test-b docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.bn.test-b')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/bn/train"

The train set for Bengali.

queries
1.6K queries

Language: bn

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/bn/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/bn/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.bn.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
297K docs

Inherits docs from miracl/bn

Language: bn

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/bn/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/bn/train docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.bn.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
17K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
0Not Relevant13K77.0%
1Relevant3.9K23.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/bn/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/bn/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.bn.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/de"

The German corpus.

docs
16M docs

Language: de

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/de")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/de docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.de')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/de/dev"

The dev set for German.

queries
305 queries

Language: de

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/de/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/de/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.de.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
16M docs

Inherits docs from miracl/de

Language: de

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/de/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/de/dev docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.de.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
3.1K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
0Not Relevant2.3K74.2%
1Relevant811 25.8%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/de/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/de/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.de.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/de/test-b"

The held-out test set (version b) for German.

queries
712 queries

Language: de

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/de/test-b")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/de/test-b queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.de.test-b.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
16M docs

Inherits docs from miracl/de

Language: de

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/de/test-b")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/de/test-b docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.de.test-b')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/en"

The English corpus.

docs
33M docs

Language: en

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/en")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/en docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:miracl/en')
# Index miracl/en
indexer = pt.IterDictIndexer('./indices/miracl_en')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'text'])

You can find more details about PyTerrier indexing here.

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.en')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/en/dev"

The dev set for English.

queries
799 queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/en/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/en/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:miracl/en/dev')
index_ref = pt.IndexRef.of('./indices/miracl_en') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.en.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
33M docs

Inherits docs from miracl/en

Language: en

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/en/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/en/dev docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:miracl/en/dev')
# Index miracl/en
indexer = pt.IterDictIndexer('./indices/miracl_en')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'text'])

You can find more details about PyTerrier indexing here.

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.en.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
8.3K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
0Not Relevant6.0K72.1%
1Relevant2.3K27.9%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/en/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/en/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:miracl/en/dev')
index_ref = pt.IndexRef.of('./indices/miracl_en') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)

You can find more details about PyTerrier experiments here.

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.en.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/en/test-a"

The held-out test set (version a) for English.

queries
734 queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/en/test-a")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/en/test-a queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:miracl/en/test-a')
index_ref = pt.IndexRef.of('./indices/miracl_en') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.en.test-a.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
33M docs

Inherits docs from miracl/en

Language: en

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/en/test-a")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/en/test-a docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:miracl/en/test-a')
# Index miracl/en
indexer = pt.IterDictIndexer('./indices/miracl_en')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'text'])

You can find more details about PyTerrier indexing here.

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.en.test-a')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/en/test-b"

The held-out test set (version b) for English.

queries
1.8K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/en/test-b")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/en/test-b queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:miracl/en/test-b')
index_ref = pt.IndexRef.of('./indices/miracl_en') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.en.test-b.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
33M docs

Inherits docs from miracl/en

Language: en

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/en/test-b")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/en/test-b docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:miracl/en/test-b')
# Index miracl/en
indexer = pt.IterDictIndexer('./indices/miracl_en')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'text'])

You can find more details about PyTerrier indexing here.

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.en.test-b')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/en/train"

The train set for English.

queries
2.9K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/en/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/en/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:miracl/en/train')
index_ref = pt.IndexRef.of('./indices/miracl_en') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.en.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
33M docs

Inherits docs from miracl/en

Language: en

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/en/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/en/train docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:miracl/en/train')
# Index miracl/en
indexer = pt.IterDictIndexer('./indices/miracl_en')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'text'])

You can find more details about PyTerrier indexing here.

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.en.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
29K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
0Not Relevant22K73.1%
1Relevant7.9K26.9%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/en/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/en/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:miracl/en/train')
index_ref = pt.IndexRef.of('./indices/miracl_en') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)

You can find more details about PyTerrier experiments here.

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.en.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/es"

The Spanish corpus.

docs
10M docs

Language: es

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/es")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/es docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.es')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/es/dev"

The dev set for Spanish.

queries
648 queries

Language: es

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/es/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/es/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.es.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
10M docs

Inherits docs from miracl/es

Language: es

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/es/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/es/dev docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.es.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
6.4K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
0Not Relevant3.5K53.6%
1Relevant3.0K46.4%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/es/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/es/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.es.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/es/test-b"

The held-out test set (version b) for Spanish.

queries
1.5K queries

Language: es

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/es/test-b")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/es/test-b queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.es.test-b.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
10M docs

Inherits docs from miracl/es

Language: es

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/es/test-b")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/es/test-b docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.es.test-b')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/es/train"

The train set for Spanish.

queries
2.2K queries

Language: es

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/es/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/es/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.es.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
10M docs

Inherits docs from miracl/es

Language: es

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/es/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/es/train docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.es.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
22K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
0Not Relevant12K53.4%
1Relevant10K46.6%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/es/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/es/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.es.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/fa"

The Persian corpus.

docs
2.2M docs

Language: fa

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/fa")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/fa docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fa')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/fa/dev"

The dev set for Persian.

queries
632 queries

Language: fa

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/fa/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/fa/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.fa.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
2.2M docs

Inherits docs from miracl/fa

Language: fa

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/fa/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/fa/dev docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fa.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
6.6K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
0Not Relevant5.3K80.0%
1Relevant1.3K20.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/fa/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/fa/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.fa.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/fa/test-b"

The held-out test set (version b) for Persian.

queries
1.5K queries

Language: fa

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/fa/test-b")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/fa/test-b queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.fa.test-b.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
2.2M docs

Inherits docs from miracl/fa

Language: fa

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/fa/test-b")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/fa/test-b docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fa.test-b')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/fa/train"

The train set for Persian.

queries
2.1K queries

Language: fa

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/fa/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/fa/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.fa.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
2.2M docs

Inherits docs from miracl/fa

Language: fa

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/fa/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/fa/train docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fa.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
22K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
0Not Relevant18K80.4%
1Relevant4.3K19.6%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/fa/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/fa/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.fa.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/fi"

The Finnish corpus.

docs
1.9M docs

Language: fi

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/fi")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/fi docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fi')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/fi/dev"

The dev set for Finnish.

queries
1.3K queries

Language: fi

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/fi/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/fi/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.fi.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
1.9M docs

Inherits docs from miracl/fi

Language: fi

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/fi/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/fi/dev docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fi.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
12K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
0Not Relevant9.6K79.6%
1Relevant2.4K20.4%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/fi/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/fi/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.fi.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/fi/test-a"

The held-out test set (version a) for Finnish.

queries
1.1K queries

Language: fi

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/fi/test-a")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/fi/test-a queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.fi.test-a.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
1.9M docs

Inherits docs from miracl/fi

Language: fi

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/fi/test-a")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/fi/test-a docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fi.test-a')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/fi/test-b"

The held-out test set (version b) for Finnish.

queries
711 queries

Language: fi

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/fi/test-b")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/fi/test-b queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.fi.test-b.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
1.9M docs

Inherits docs from miracl/fi

Language: fi

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/fi/test-b")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/fi/test-b docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fi.test-b')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/fi/train"

The train set for Finnish.

queries
2.9K queries

Language: fi

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/fi/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/fi/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.fi.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
1.9M docs

Inherits docs from miracl/fi

Language: fi

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/fi/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/fi/train docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fi.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
20K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
0Not Relevant15K75.8%
1Relevant4.9K24.2%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/fi/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/fi/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.fi.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/fr"

The French corpus.

docs
15M docs

Language: fr

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/fr")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/fr docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fr')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/fr/dev"

The dev set for French.

queries
343 queries

Language: fr

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/fr/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/fr/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.fr.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
15M docs

Inherits docs from miracl/fr

Language: fr

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/fr/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/fr/dev docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fr.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
3.4K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
0Not Relevant2.7K78.7%
1Relevant731 21.3%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/fr/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/fr/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.fr.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/fr/test-b"

The held-out test set (version b) for French.

queries
801 queries

Language: fr

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/fr/test-b")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/fr/test-b queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.fr.test-b.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
15M docs

Inherits docs from miracl/fr

Language: fr

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/fr/test-b")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/fr/test-b docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fr.test-b')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/fr/train"

The train set for French.

queries
1.1K queries

Language: fr

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/fr/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/fr/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.fr.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
15M docs

Inherits docs from miracl/fr

Language: fr

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/fr/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/fr/train docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fr.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
11K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
0Not Relevant9.1K79.7%
1Relevant2.3K20.3%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/fr/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/fr/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.fr.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/hi"

The Hindi corpus.

docs
506K docs

Language: hi

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/hi")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/hi docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.hi')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/hi/dev"

The dev set for Hindi.

queries
350 queries

Language: hi

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/hi/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/hi/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.hi.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
506K docs

Inherits docs from miracl/hi

Language: hi

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/hi/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/hi/dev docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.hi.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
3.5K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
0Not Relevant2.7K78.5%
1Relevant752 21.5%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/hi/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/hi/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.hi.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/hi/test-b"

The held-out test set (version b) for Hindi.

queries
819 queries

Language: hi

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/hi/test-b")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/hi/test-b queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.hi.test-b.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
506K docs

Inherits docs from miracl/hi

Language: hi

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/hi/test-b")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/hi/test-b docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.hi.test-b')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/hi/train"

The train set for Hindi.

queries
1.2K queries

Language: hi

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/hi/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/hi/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.hi.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
506K docs

Inherits docs from miracl/hi

Language: hi

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/hi/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/hi/train docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.hi.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
12K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
0Not Relevant9.2K78.8%
1Relevant2.5K21.2%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/hi/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/hi/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.hi.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/id"

The Indonesian corpus.

docs
1.4M docs

Language: id

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/id")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/id docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.id')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/id/dev"

The dev set for Indonesian.

queries
960 queries

Language: id

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/id/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/id/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.id.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
1.4M docs

Inherits docs from miracl/id

Language: id

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/id/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/id/dev docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.id.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
9.7K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
0Not Relevant6.6K68.1%
1Relevant3.1K31.9%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/id/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/id/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.id.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/id/test-a"

The held-out test set (version a) for Indonesian.

queries
731 queries

Language: id

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/id/test-a")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/id/test-a queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.id.test-a.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
1.4M docs

Inherits docs from miracl/id

Language: id

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/id/test-a")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/id/test-a docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.id.test-a')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/id/test-b"

The held-out test set (version b) for Indonesian.

queries
611 queries

Language: id

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/id/test-b")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/id/test-b queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.id.test-b.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
1.4M docs

Inherits docs from miracl/id

Language: id

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/id/test-b")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/id/test-b docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.id.test-b')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/id/train"

The train set for Indonesian.

queries
4.1K queries

Language: id

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/id/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/id/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.id.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
1.4M docs

Inherits docs from miracl/id

Language: id

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/id/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/id/train docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.id.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
41K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
0Not Relevant29K69.8%
1Relevant13K30.2%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/id/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/id/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.id.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/ja"

The Japanese corpus.

docs
7.0M docs

Language: ja

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/ja")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/ja docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ja')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/ja/dev"

The dev set for Japanese.

queries
860 queries

Language: ja

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/ja/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/ja/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ja.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
7.0M docs

Inherits docs from miracl/ja

Language: ja

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/ja/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/ja/dev docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ja.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
8.4K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
0Not Relevant6.6K78.6%
1Relevant1.8K21.4%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/ja/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/ja/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.ja.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/ja/test-a"

The held-out test set (version a) for Japanese.

queries
650 queries

Language: ja

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/ja/test-a")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/ja/test-a queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ja.test-a.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
7.0M docs

Inherits docs from miracl/ja

Language: ja

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/ja/test-a")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/ja/test-a docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ja.test-a')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/ja/test-b"

The held-out test set (version b) for Japanese.

queries
1.1K queries

Language: ja

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/ja/test-b")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/ja/test-b queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ja.test-b.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
7.0M docs

Inherits docs from miracl/ja

Language: ja

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/ja/test-b")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/ja/test-b docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ja.test-b')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/ja/train"

The train set for Japanese.

queries
3.5K queries

Language: ja

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/ja/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/ja/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ja.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
7.0M docs

Inherits docs from miracl/ja

Language: ja

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/ja/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/ja/train docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ja.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
34K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
0Not Relevant27K79.7%
1Relevant7.0K20.3%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/ja/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/ja/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.ja.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/ko"

The Korean corpus.

docs
1.5M docs

Language: ko

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/ko")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/ko docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ko')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/ko/dev"

The dev set for Korean.

queries
213 queries

Language: ko

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/ko/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/ko/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ko.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
1.5M docs

Inherits docs from miracl/ko

Language: ko

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/ko/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/ko/dev docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ko.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
3.1K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
0Not Relevant2.5K82.1%
1Relevant547 17.9%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/ko/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/ko/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.ko.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/ko/test-a"

The held-out test set (version a) for Korean.

queries
263 queries

Language: ko

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/ko/test-a")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/ko/test-a queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ko.test-a.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
1.5M docs

Inherits docs from miracl/ko

Language: ko

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/ko/test-a")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/ko/test-a docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ko.test-a')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/ko/test-b"

The held-out test set (version b) for Korean.

queries
1.4K queries

Language: ko

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/ko/test-b")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/ko/test-b queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ko.test-b.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
1.5M docs

Inherits docs from miracl/ko

Language: ko

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/ko/test-b")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/ko/test-b docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ko.test-b')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/ko/train"

The train set for Korean.

queries
868 queries

Language: ko

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/ko/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/ko/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ko.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
1.5M docs

Inherits docs from miracl/ko

Language: ko

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/ko/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/ko/train docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ko.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
13K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
0Not Relevant11K84.5%
1Relevant2.0K15.5%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/ko/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/ko/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.ko.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/ru"

The Russian corpus.

docs
9.5M docs

Language: ru

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/ru")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/ru docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ru')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/ru/dev"

The dev set for Russian.

queries
1.3K queries

Language: ru

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/ru/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/ru/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ru.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
9.5M docs

Inherits docs from miracl/ru

Language: ru

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/ru/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/ru/dev docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ru.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
13K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
0Not Relevant9.5K72.8%
1Relevant3.6K27.2%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/ru/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/ru/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.ru.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/ru/test-a"

The held-out test set (version a) for Russian.

queries
911 queries

Language: ru

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/ru/test-a")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/ru/test-a queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ru.test-a.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
9.5M docs

Inherits docs from miracl/ru

Language: ru

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/ru/test-a")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/ru/test-a docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ru.test-a')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/ru/test-b"

The held-out test set (version b) for Russian.

queries
718 queries

Language: ru

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/ru/test-b")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/ru/test-b queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ru.test-b.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
9.5M docs

Inherits docs from miracl/ru

Language: ru

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/ru/test-b")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/ru/test-b docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ru.test-b')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/ru/train"

The train set for Russian.

queries
4.7K queries

Language: ru

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/ru/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/ru/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ru.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
9.5M docs

Inherits docs from miracl/ru

Language: ru

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/ru/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/ru/train docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ru.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
34K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
0Not Relevant24K70.5%
1Relevant10K29.5%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/ru/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/ru/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.ru.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/sw"

The Swahili corpus.

docs
132K docs

Language: sw

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/sw")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/sw docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.sw')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/sw/dev"

The dev set for Swahili.

queries
482 queries

Language: sw

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/sw/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/sw/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.sw.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
132K docs

Inherits docs from miracl/sw

Language: sw

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/sw/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/sw/dev docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.sw.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
5.1K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
0Not Relevant4.2K82.1%
1Relevant910 17.9%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/sw/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/sw/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.sw.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/sw/test-a"

The held-out test set (version a) for Swahili.

queries
638 queries

Language: sw

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/sw/test-a")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/sw/test-a queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.sw.test-a.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
132K docs

Inherits docs from miracl/sw

Language: sw

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/sw/test-a")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/sw/test-a docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.sw.test-a')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/sw/test-b"

The held-out test set (version b) for Swahili.

queries
465 queries

Language: sw

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/sw/test-b")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/sw/test-b queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.sw.test-b.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
132K docs

Inherits docs from miracl/sw

Language: sw

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/sw/test-b")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/sw/test-b docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.sw.test-b')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/sw/train"

The train set for Swahili.

queries
1.9K queries

Language: sw

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/sw/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/sw/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.sw.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
132K docs

Inherits docs from miracl/sw

Language: sw

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/sw/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/sw/train docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.sw.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
9.4K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
0Not Relevant6.7K71.3%
1Relevant2.7K28.7%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/sw/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/sw/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.sw.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/te"

The Telugu corpus.

docs
518K docs

Language: te

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/te")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/te docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.te')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/te/dev"

The dev set for Telugu.

queries
828 queries

Language: te

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/te/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/te/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.te.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
518K docs

Inherits docs from miracl/te

Language: te

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/te/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/te/dev docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.te.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
1.6K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
0Not Relevant752 46.8%
1Relevant854 53.2%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/te/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/te/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.te.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/te/test-a"

The held-out test set (version a) for Telugu.

queries
594 queries

Language: te

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/te/test-a")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/te/test-a queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.te.test-a.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
518K docs

Inherits docs from miracl/te

Language: te

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/te/test-a")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/te/test-a docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.te.test-a')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/te/test-b"

The held-out test set (version b) for Telugu.

queries
793 queries

Language: te

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/te/test-b")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/te/test-b queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.te.test-b.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
518K docs

Inherits docs from miracl/te

Language: te

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/te/test-b")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/te/test-b docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.te.test-b')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/te/train"

The train set for Telugu.

queries
3.5K queries

Language: te

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/te/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/te/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.te.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
518K docs

Inherits docs from miracl/te

Language: te

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/te/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/te/train docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.te.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
19K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
0Not Relevant14K77.9%
1Relevant4.1K22.1%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/te/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/te/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.te.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/th"

The Thai corpus.

docs
542K docs

Language: th

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/th")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/th docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.th')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/th/dev"

The dev set for Thai.

queries
733 queries

Language: th

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/th/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/th/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.th.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
542K docs

Inherits docs from miracl/th

Language: th

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/th/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/th/dev docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.th.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
7.6K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
0Not Relevant6.2K82.3%
1Relevant1.3K17.7%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/th/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/th/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.th.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/th/test-a"

The held-out test set (version a) for Thai.

queries
992 queries

Language: th

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/th/test-a")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/th/test-a queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.th.test-a.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
542K docs

Inherits docs from miracl/th

Language: th

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/th/test-a")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/th/test-a docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.th.test-a')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/th/test-b"

The held-out test set (version b) for Thai.

queries
650 queries

Language: th

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/th/test-b")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/th/test-b queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.th.test-b.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
542K docs

Inherits docs from miracl/th

Language: th

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/th/test-b")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/th/test-b docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.th.test-b')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/th/train"

The train set for Thai.

queries
3.0K queries

Language: th

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/th/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/th/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.th.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
542K docs

Inherits docs from miracl/th

Language: th

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/th/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/th/train docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.th.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
21K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
0Not Relevant17K77.6%
1Relevant4.8K22.4%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/th/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/th/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.th.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/yo"

The Yoruba corpus.

docs
49K docs

Language: yo

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/yo")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/yo docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.yo')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/yo/dev"

The dev set for Yoruba.

queries
119 queries

Language: yo

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/yo/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/yo/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.yo.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
49K docs

Inherits docs from miracl/yo

Language: yo

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/yo/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/yo/dev docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.yo.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
1.2K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
0Not Relevant1.0K87.9%
1Relevant144 12.1%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/yo/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/yo/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.yo.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/yo/test-b"

The held-out test set (version b) for Yoruba.

queries
288 queries

Language: yo

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/yo/test-b")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/yo/test-b queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.yo.test-b.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
49K docs

Inherits docs from miracl/yo

Language: yo

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/yo/test-b")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/yo/test-b docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.yo.test-b')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/zh"

The Chinese corpus.

docs
4.9M docs

Language: zh

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/zh")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/zh docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.zh')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/zh/dev"

The dev set for Chinese.

queries
393 queries

Language: zh

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/zh/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/zh/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.zh.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
4.9M docs

Inherits docs from miracl/zh

Language: zh

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/zh/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/zh/dev docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.zh.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
3.9K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
0Not Relevant2.9K74.7%
1Relevant994 25.3%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/zh/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/zh/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.zh.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/zh/test-b"

The held-out test set (version b) for Chinese.

queries
920 queries

Language: zh

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/zh/test-b")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/zh/test-b queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.zh.test-b.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
4.9M docs

Inherits docs from miracl/zh

Language: zh

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/zh/test-b")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/zh/test-b docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.zh.test-b')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata

"miracl/zh/train"

The train set for Chinese.

queries
1.3K queries

Language: zh

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/zh/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/zh/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.zh.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
4.9M docs

Inherits docs from miracl/zh

Language: zh

Document type:
MiraclDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/zh/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/zh/train docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.zh.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
13K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
0Not Relevant9.9K75.7%
1Relevant3.2K24.3%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("miracl/zh/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export miracl/zh/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.zh.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2022Miracl}

Bibtex:

@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }
Metadata