ir_datasets: MIRACLMIRACL is a multilingual adhoc retrieval dataset covering 18 languages. The document corpora are based on Wikipedia dumps, which are split into passages.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }The Arabic corpus.
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ar")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ar docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ar')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 2061414,
    "fields": {
      "doc_id": {
        "max_len": 11,
        "common_prefix": ""
      }
    }
  }
}
The dev set for Arabic.
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ar/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ar/dev queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ar.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ar
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ar/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ar/dev docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ar.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 0 | Not Relevant | 24K | 80.6% | 
| 1 | Relevant | 5.7K | 19.4% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ar/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/ar/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.ar.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 2061414,
    "fields": {
      "doc_id": {
        "max_len": 11,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 2896
  },
  "qrels": {
    "count": 29197,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 5658,
          "0": 23539
        }
      }
    }
  }
}
The held-out test set (version a) for Arabic.
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ar/test-a")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ar/test-a queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ar.test-a.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ar
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ar/test-a")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ar/test-a docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ar.test-a')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 2061414,
    "fields": {
      "doc_id": {
        "max_len": 11,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 936
  }
}
The held-out test set (version b) for Arabic.
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ar/test-b")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ar/test-b queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ar.test-b.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ar
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ar/test-b")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ar/test-b docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ar.test-b')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 2061414,
    "fields": {
      "doc_id": {
        "max_len": 11,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 1405
  }
}
The train set for Arabic.
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ar/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ar/train queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ar.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ar
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ar/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ar/train docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ar.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 0 | Not Relevant | 19K | 75.5% | 
| 1 | Relevant | 6.2K | 24.5% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ar/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/ar/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.ar.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 2061414,
    "fields": {
      "doc_id": {
        "max_len": 11,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 3495
  },
  "qrels": {
    "count": 25382,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 6217,
          "0": 19165
        }
      }
    }
  }
}
The Bengali corpus.
Language: bn
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/bn")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/bn docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.bn')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 297265,
    "fields": {
      "doc_id": {
        "max_len": 10,
        "common_prefix": ""
      }
    }
  }
}
The dev set for Bengali.
Language: bn
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/bn/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/bn/dev queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.bn.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/bn
Language: bn
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/bn/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/bn/dev docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.bn.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 0 | Not Relevant | 3.3K | 79.5% | 
| 1 | Relevant | 863 | 20.5% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/bn/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/bn/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.bn.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 297265,
    "fields": {
      "doc_id": {
        "max_len": 10,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 411
  },
  "qrels": {
    "count": 4206,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 863,
          "0": 3343
        }
      }
    }
  }
}
The held-out test set (version a) for Bengali.
Language: bn
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/bn/test-a")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/bn/test-a queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.bn.test-a.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/bn
Language: bn
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/bn/test-a")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/bn/test-a docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.bn.test-a')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 297265,
    "fields": {
      "doc_id": {
        "max_len": 10,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 102
  }
}
The held-out test set (version b) for Bengali.
Language: bn
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/bn/test-b")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/bn/test-b queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.bn.test-b.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/bn
Language: bn
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/bn/test-b")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/bn/test-b docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.bn.test-b')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 297265,
    "fields": {
      "doc_id": {
        "max_len": 10,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 1130
  }
}
The train set for Bengali.
Language: bn
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/bn/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/bn/train queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.bn.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/bn
Language: bn
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/bn/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/bn/train docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.bn.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 0 | Not Relevant | 13K | 77.0% | 
| 1 | Relevant | 3.9K | 23.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/bn/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/bn/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.bn.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 297265,
    "fields": {
      "doc_id": {
        "max_len": 10,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 1631
  },
  "qrels": {
    "count": 16754,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 3859,
          "0": 12895
        }
      }
    }
  }
}
The German corpus.
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/de")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/de docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.de')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 15866222,
    "fields": {
      "doc_id": {
        "max_len": 12,
        "common_prefix": ""
      }
    }
  }
}
The dev set for German.
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/de/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/de/dev queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.de.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/de
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/de/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/de/dev docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.de.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 0 | Not Relevant | 2.3K | 74.2% | 
| 1 | Relevant | 811 | 25.8% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/de/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/de/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.de.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 15866222,
    "fields": {
      "doc_id": {
        "max_len": 12,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 305
  },
  "qrels": {
    "count": 3144,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 811,
          "0": 2333
        }
      }
    }
  }
}
The held-out test set (version b) for German.
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/de/test-b")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/de/test-b queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.de.test-b.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/de
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/de/test-b")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/de/test-b docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.de.test-b')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 15866222,
    "fields": {
      "doc_id": {
        "max_len": 12,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 712
  }
}
The English corpus.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/en")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/en docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:miracl/en')
# Index miracl/en
indexer = pt.IterDictIndexer('./indices/miracl_en')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.en')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 32893221,
    "fields": {
      "doc_id": {
        "max_len": 13,
        "common_prefix": ""
      }
    }
  }
}
The dev set for English.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/en/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/en/dev queries
[query_id]    [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:miracl/en/dev')
index_ref = pt.IndexRef.of('./indices/miracl_en') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.en.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/en
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/en/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/en/dev docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:miracl/en/dev')
# Index miracl/en
indexer = pt.IterDictIndexer('./indices/miracl_en')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.en.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 0 | Not Relevant | 6.0K | 72.1% | 
| 1 | Relevant | 2.3K | 27.9% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/en/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/en/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:miracl/en/dev')
index_ref = pt.IndexRef.of('./indices/miracl_en') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.en.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 32893221,
    "fields": {
      "doc_id": {
        "max_len": 13,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 799
  },
  "qrels": {
    "count": 8350,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 2326,
          "0": 6024
        }
      }
    }
  }
}
The held-out test set (version a) for English.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/en/test-a")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/en/test-a queries
[query_id]    [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:miracl/en/test-a')
index_ref = pt.IndexRef.of('./indices/miracl_en') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.en.test-a.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/en
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/en/test-a")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/en/test-a docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:miracl/en/test-a')
# Index miracl/en
indexer = pt.IterDictIndexer('./indices/miracl_en')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.en.test-a')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 32893221,
    "fields": {
      "doc_id": {
        "max_len": 13,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 734
  }
}
The held-out test set (version b) for English.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/en/test-b")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/en/test-b queries
[query_id]    [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:miracl/en/test-b')
index_ref = pt.IndexRef.of('./indices/miracl_en') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.en.test-b.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/en
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/en/test-b")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/en/test-b docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:miracl/en/test-b')
# Index miracl/en
indexer = pt.IterDictIndexer('./indices/miracl_en')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.en.test-b')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 32893221,
    "fields": {
      "doc_id": {
        "max_len": 13,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 1790
  }
}
The train set for English.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/en/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/en/train queries
[query_id]    [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:miracl/en/train')
index_ref = pt.IndexRef.of('./indices/miracl_en') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.en.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/en
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/en/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/en/train docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:miracl/en/train')
# Index miracl/en
indexer = pt.IterDictIndexer('./indices/miracl_en')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.en.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 0 | Not Relevant | 22K | 73.1% | 
| 1 | Relevant | 7.9K | 26.9% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/en/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/en/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:miracl/en/train')
index_ref = pt.IndexRef.of('./indices/miracl_en') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.en.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 32893221,
    "fields": {
      "doc_id": {
        "max_len": 13,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 2863
  },
  "qrels": {
    "count": 29416,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 7899,
          "0": 21517
        }
      }
    }
  }
}
The Spanish corpus.
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/es")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/es docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.es')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 10373953,
    "fields": {
      "doc_id": {
        "max_len": 12,
        "common_prefix": ""
      }
    }
  }
}
The dev set for Spanish.
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/es/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/es/dev queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.es.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/es
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/es/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/es/dev docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.es.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 0 | Not Relevant | 3.5K | 53.6% | 
| 1 | Relevant | 3.0K | 46.4% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/es/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/es/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.es.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 10373953,
    "fields": {
      "doc_id": {
        "max_len": 12,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 648
  },
  "qrels": {
    "count": 6443,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 2987,
          "0": 3456
        }
      }
    }
  }
}
The held-out test set (version b) for Spanish.
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/es/test-b")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/es/test-b queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.es.test-b.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/es
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/es/test-b")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/es/test-b docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.es.test-b')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 10373953,
    "fields": {
      "doc_id": {
        "max_len": 12,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 1515
  }
}
The train set for Spanish.
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/es/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/es/train queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.es.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/es
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/es/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/es/train docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.es.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 0 | Not Relevant | 12K | 53.4% | 
| 1 | Relevant | 10K | 46.6% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/es/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/es/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.es.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 10373953,
    "fields": {
      "doc_id": {
        "max_len": 12,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 2162
  },
  "qrels": {
    "count": 21531,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 10025,
          "0": 11506
        }
      }
    }
  }
}
The Persian corpus.
Language: fa
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fa")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/fa docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fa')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 2207172,
    "fields": {
      "doc_id": {
        "max_len": 11,
        "common_prefix": ""
      }
    }
  }
}
The dev set for Persian.
Language: fa
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fa/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/fa/dev queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.fa.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/fa
Language: fa
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fa/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/fa/dev docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fa.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 0 | Not Relevant | 5.3K | 80.0% | 
| 1 | Relevant | 1.3K | 20.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fa/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/fa/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.fa.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 2207172,
    "fields": {
      "doc_id": {
        "max_len": 11,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 632
  },
  "qrels": {
    "count": 6571,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 1314,
          "0": 5257
        }
      }
    }
  }
}
The held-out test set (version b) for Persian.
Language: fa
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fa/test-b")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/fa/test-b queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.fa.test-b.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/fa
Language: fa
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fa/test-b")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/fa/test-b docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fa.test-b')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 2207172,
    "fields": {
      "doc_id": {
        "max_len": 11,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 1476
  }
}
The train set for Persian.
Language: fa
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fa/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/fa/train queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.fa.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/fa
Language: fa
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fa/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/fa/train docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fa.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 0 | Not Relevant | 18K | 80.4% | 
| 1 | Relevant | 4.3K | 19.6% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fa/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/fa/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.fa.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 2207172,
    "fields": {
      "doc_id": {
        "max_len": 11,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 2107
  },
  "qrels": {
    "count": 21844,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 4277,
          "0": 17567
        }
      }
    }
  }
}
The Finnish corpus.
Language: fi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fi")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/fi docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fi')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 1883509,
    "fields": {
      "doc_id": {
        "max_len": 11,
        "common_prefix": ""
      }
    }
  }
}
The dev set for Finnish.
Language: fi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fi/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/fi/dev queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.fi.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/fi
Language: fi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fi/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/fi/dev docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fi.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 0 | Not Relevant | 9.6K | 79.6% | 
| 1 | Relevant | 2.4K | 20.4% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fi/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/fi/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.fi.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 1883509,
    "fields": {
      "doc_id": {
        "max_len": 11,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 1271
  },
  "qrels": {
    "count": 12008,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 2447,
          "0": 9561
        }
      }
    }
  }
}
The held-out test set (version a) for Finnish.
Language: fi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fi/test-a")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/fi/test-a queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.fi.test-a.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/fi
Language: fi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fi/test-a")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/fi/test-a docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fi.test-a')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 1883509,
    "fields": {
      "doc_id": {
        "max_len": 11,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 1060
  }
}
The held-out test set (version b) for Finnish.
Language: fi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fi/test-b")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/fi/test-b queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.fi.test-b.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/fi
Language: fi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fi/test-b")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/fi/test-b docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fi.test-b')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 1883509,
    "fields": {
      "doc_id": {
        "max_len": 11,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 711
  }
}
The train set for Finnish.
Language: fi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fi/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/fi/train queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.fi.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/fi
Language: fi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fi/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/fi/train docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fi.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 0 | Not Relevant | 15K | 75.8% | 
| 1 | Relevant | 4.9K | 24.2% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fi/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/fi/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.fi.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 1883509,
    "fields": {
      "doc_id": {
        "max_len": 11,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 2897
  },
  "qrels": {
    "count": 20350,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 4928,
          "0": 15422
        }
      }
    }
  }
}
The French corpus.
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fr")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/fr docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fr')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 14636953,
    "fields": {
      "doc_id": {
        "max_len": 12,
        "common_prefix": ""
      }
    }
  }
}
The dev set for French.
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fr/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/fr/dev queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.fr.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/fr
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fr/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/fr/dev docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fr.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 0 | Not Relevant | 2.7K | 78.7% | 
| 1 | Relevant | 731 | 21.3% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fr/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/fr/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.fr.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 14636953,
    "fields": {
      "doc_id": {
        "max_len": 12,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 343
  },
  "qrels": {
    "count": 3429,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 731,
          "0": 2698
        }
      }
    }
  }
}
The held-out test set (version b) for French.
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fr/test-b")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/fr/test-b queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.fr.test-b.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/fr
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fr/test-b")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/fr/test-b docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fr.test-b')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 14636953,
    "fields": {
      "doc_id": {
        "max_len": 12,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 801
  }
}
The train set for French.
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fr/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/fr/train queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.fr.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/fr
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fr/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/fr/train docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fr.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 0 | Not Relevant | 9.1K | 79.7% | 
| 1 | Relevant | 2.3K | 20.3% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fr/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/fr/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.fr.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 14636953,
    "fields": {
      "doc_id": {
        "max_len": 12,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 1143
  },
  "qrels": {
    "count": 11426,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 2321,
          "0": 9105
        }
      }
    }
  }
}
The Hindi corpus.
Language: hi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/hi")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/hi docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.hi')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 506264,
    "fields": {
      "doc_id": {
        "max_len": 11,
        "common_prefix": ""
      }
    }
  }
}
The dev set for Hindi.
Language: hi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/hi/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/hi/dev queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.hi.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/hi
Language: hi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/hi/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/hi/dev docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.hi.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 0 | Not Relevant | 2.7K | 78.5% | 
| 1 | Relevant | 752 | 21.5% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/hi/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/hi/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.hi.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 506264,
    "fields": {
      "doc_id": {
        "max_len": 11,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 350
  },
  "qrels": {
    "count": 3494,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 752,
          "0": 2742
        }
      }
    }
  }
}
The held-out test set (version b) for Hindi.
Language: hi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/hi/test-b")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/hi/test-b queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.hi.test-b.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/hi
Language: hi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/hi/test-b")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/hi/test-b docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.hi.test-b')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 506264,
    "fields": {
      "doc_id": {
        "max_len": 11,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 819
  }
}
The train set for Hindi.
Language: hi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/hi/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/hi/train queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.hi.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/hi
Language: hi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/hi/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/hi/train docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.hi.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 0 | Not Relevant | 9.2K | 78.8% | 
| 1 | Relevant | 2.5K | 21.2% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/hi/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/hi/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.hi.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 506264,
    "fields": {
      "doc_id": {
        "max_len": 11,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 1169
  },
  "qrels": {
    "count": 11668,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 2469,
          "0": 9199
        }
      }
    }
  }
}
The Indonesian corpus.
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/id")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/id docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.id')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 1446315,
    "fields": {
      "doc_id": {
        "max_len": 11,
        "common_prefix": ""
      }
    }
  }
}
The dev set for Indonesian.
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/id/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/id/dev queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.id.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/id
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/id/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/id/dev docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.id.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 0 | Not Relevant | 6.6K | 68.1% | 
| 1 | Relevant | 3.1K | 31.9% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/id/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/id/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.id.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 1446315,
    "fields": {
      "doc_id": {
        "max_len": 11,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 960
  },
  "qrels": {
    "count": 9668,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 3088,
          "0": 6580
        }
      }
    }
  }
}
The held-out test set (version a) for Indonesian.
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/id/test-a")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/id/test-a queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.id.test-a.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/id
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/id/test-a")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/id/test-a docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.id.test-a')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 1446315,
    "fields": {
      "doc_id": {
        "max_len": 11,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 731
  }
}
The held-out test set (version b) for Indonesian.
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/id/test-b")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/id/test-b queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.id.test-b.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/id
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/id/test-b")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/id/test-b docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.id.test-b')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 1446315,
    "fields": {
      "doc_id": {
        "max_len": 11,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 611
  }
}
The train set for Indonesian.
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/id/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/id/train queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.id.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/id
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/id/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/id/train docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.id.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 0 | Not Relevant | 29K | 69.8% | 
| 1 | Relevant | 13K | 30.2% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/id/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/id/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.id.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 1446315,
    "fields": {
      "doc_id": {
        "max_len": 11,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 4071
  },
  "qrels": {
    "count": 41358,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 12505,
          "0": 28853
        }
      }
    }
  }
}
The Japanese corpus.
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ja")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ja docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ja')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 6953614,
    "fields": {
      "doc_id": {
        "max_len": 12,
        "common_prefix": ""
      }
    }
  }
}
The dev set for Japanese.
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ja/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ja/dev queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ja.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ja
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ja/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ja/dev docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ja.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 0 | Not Relevant | 6.6K | 78.6% | 
| 1 | Relevant | 1.8K | 21.4% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ja/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/ja/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.ja.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 6953614,
    "fields": {
      "doc_id": {
        "max_len": 12,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 860
  },
  "qrels": {
    "count": 8354,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 1790,
          "0": 6564
        }
      }
    }
  }
}
The held-out test set (version a) for Japanese.
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ja/test-a")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ja/test-a queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ja.test-a.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ja
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ja/test-a")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ja/test-a docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ja.test-a')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 6953614,
    "fields": {
      "doc_id": {
        "max_len": 12,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 650
  }
}
The held-out test set (version b) for Japanese.
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ja/test-b")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ja/test-b queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ja.test-b.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ja
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ja/test-b")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ja/test-b docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ja.test-b')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 6953614,
    "fields": {
      "doc_id": {
        "max_len": 12,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 1141
  }
}
The train set for Japanese.
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ja/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ja/train queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ja.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ja
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ja/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ja/train docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ja.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 0 | Not Relevant | 27K | 79.7% | 
| 1 | Relevant | 7.0K | 20.3% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ja/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/ja/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.ja.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 6953614,
    "fields": {
      "doc_id": {
        "max_len": 12,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 3477
  },
  "qrels": {
    "count": 34387,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 6984,
          "0": 27403
        }
      }
    }
  }
}
The Korean corpus.
Language: ko
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ko")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ko docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ko')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 1486752,
    "fields": {
      "doc_id": {
        "max_len": 12,
        "common_prefix": ""
      }
    }
  }
}
The dev set for Korean.
Language: ko
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ko/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ko/dev queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ko.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ko
Language: ko
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ko/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ko/dev docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ko.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 0 | Not Relevant | 2.5K | 82.1% | 
| 1 | Relevant | 547 | 17.9% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ko/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/ko/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.ko.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 1486752,
    "fields": {
      "doc_id": {
        "max_len": 12,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 213
  },
  "qrels": {
    "count": 3057,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 547,
          "0": 2510
        }
      }
    }
  }
}
The held-out test set (version a) for Korean.
Language: ko
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ko/test-a")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ko/test-a queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ko.test-a.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ko
Language: ko
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ko/test-a")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ko/test-a docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ko.test-a')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 1486752,
    "fields": {
      "doc_id": {
        "max_len": 12,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 263
  }
}
The held-out test set (version b) for Korean.
Language: ko
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ko/test-b")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ko/test-b queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ko.test-b.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ko
Language: ko
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ko/test-b")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ko/test-b docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ko.test-b')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 1486752,
    "fields": {
      "doc_id": {
        "max_len": 12,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 1417
  }
}
The train set for Korean.
Language: ko
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ko/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ko/train queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ko.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ko
Language: ko
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ko/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ko/train docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ko.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 0 | Not Relevant | 11K | 84.5% | 
| 1 | Relevant | 2.0K | 15.5% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ko/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/ko/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.ko.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 1486752,
    "fields": {
      "doc_id": {
        "max_len": 12,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 868
  },
  "qrels": {
    "count": 12767,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 1973,
          "0": 10794
        }
      }
    }
  }
}
The Russian corpus.
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ru")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ru docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ru')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 9543918,
    "fields": {
      "doc_id": {
        "max_len": 12,
        "common_prefix": ""
      }
    }
  }
}
The dev set for Russian.
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ru/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ru/dev queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ru.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ru
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ru/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ru/dev docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ru.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 0 | Not Relevant | 9.5K | 72.8% | 
| 1 | Relevant | 3.6K | 27.2% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ru/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/ru/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.ru.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 9543918,
    "fields": {
      "doc_id": {
        "max_len": 12,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 1252
  },
  "qrels": {
    "count": 13100,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 3560,
          "0": 9540
        }
      }
    }
  }
}
The held-out test set (version a) for Russian.
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ru/test-a")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ru/test-a queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ru.test-a.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ru
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ru/test-a")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ru/test-a docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ru.test-a')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 9543918,
    "fields": {
      "doc_id": {
        "max_len": 12,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 911
  }
}
The held-out test set (version b) for Russian.
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ru/test-b")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ru/test-b queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ru.test-b.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ru
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ru/test-b")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ru/test-b docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ru.test-b')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 9543918,
    "fields": {
      "doc_id": {
        "max_len": 12,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 718
  }
}
The train set for Russian.
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ru/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ru/train queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ru.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ru
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ru/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ru/train docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ru.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 0 | Not Relevant | 24K | 70.5% | 
| 1 | Relevant | 10K | 29.5% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ru/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/ru/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.ru.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 9543918,
    "fields": {
      "doc_id": {
        "max_len": 12,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 4683
  },
  "qrels": {
    "count": 33921,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 10000,
          "0": 23921
        }
      }
    }
  }
}
The Swahili corpus.
Language: sw
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/sw")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/sw docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.sw')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 131924,
    "fields": {
      "doc_id": {
        "max_len": 9,
        "common_prefix": ""
      }
    }
  }
}
The dev set for Swahili.
Language: sw
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/sw/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/sw/dev queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.sw.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/sw
Language: sw
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/sw/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/sw/dev docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.sw.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 0 | Not Relevant | 4.2K | 82.1% | 
| 1 | Relevant | 910 | 17.9% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/sw/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/sw/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.sw.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 131924,
    "fields": {
      "doc_id": {
        "max_len": 9,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 482
  },
  "qrels": {
    "count": 5092,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 910,
          "0": 4182
        }
      }
    }
  }
}
The held-out test set (version a) for Swahili.
Language: sw
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/sw/test-a")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/sw/test-a queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.sw.test-a.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/sw
Language: sw
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/sw/test-a")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/sw/test-a docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.sw.test-a')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 131924,
    "fields": {
      "doc_id": {
        "max_len": 9,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 638
  }
}
The held-out test set (version b) for Swahili.
Language: sw
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/sw/test-b")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/sw/test-b queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.sw.test-b.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/sw
Language: sw
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/sw/test-b")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/sw/test-b docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.sw.test-b')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 131924,
    "fields": {
      "doc_id": {
        "max_len": 9,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 465
  }
}
The train set for Swahili.
Language: sw
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/sw/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/sw/train queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.sw.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/sw
Language: sw
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/sw/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/sw/train docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.sw.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 0 | Not Relevant | 6.7K | 71.3% | 
| 1 | Relevant | 2.7K | 28.7% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/sw/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/sw/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.sw.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 131924,
    "fields": {
      "doc_id": {
        "max_len": 9,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 1901
  },
  "qrels": {
    "count": 9359,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 2687,
          "0": 6672
        }
      }
    }
  }
}
The Telugu corpus.
Language: te
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/te")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/te docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.te')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 518079,
    "fields": {
      "doc_id": {
        "max_len": 10,
        "common_prefix": ""
      }
    }
  }
}
The dev set for Telugu.
Language: te
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/te/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/te/dev queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.te.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/te
Language: te
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/te/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/te/dev docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.te.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 0 | Not Relevant | 752 | 46.8% | 
| 1 | Relevant | 854 | 53.2% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/te/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/te/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.te.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 518079,
    "fields": {
      "doc_id": {
        "max_len": 10,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 828
  },
  "qrels": {
    "count": 1606,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 854,
          "0": 752
        }
      }
    }
  }
}
The held-out test set (version a) for Telugu.
Language: te
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/te/test-a")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/te/test-a queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.te.test-a.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/te
Language: te
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/te/test-a")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/te/test-a docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.te.test-a')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 518079,
    "fields": {
      "doc_id": {
        "max_len": 10,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 594
  }
}
The held-out test set (version b) for Telugu.
Language: te
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/te/test-b")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/te/test-b queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.te.test-b.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/te
Language: te
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/te/test-b")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/te/test-b docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.te.test-b')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 518079,
    "fields": {
      "doc_id": {
        "max_len": 10,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 793
  }
}
The train set for Telugu.
Language: te
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/te/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/te/train queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.te.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/te
Language: te
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/te/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/te/train docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.te.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 0 | Not Relevant | 14K | 77.9% | 
| 1 | Relevant | 4.1K | 22.1% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/te/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/te/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.te.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 518079,
    "fields": {
      "doc_id": {
        "max_len": 10,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 3452
  },
  "qrels": {
    "count": 18608,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 4119,
          "0": 14489
        }
      }
    }
  }
}
The Thai corpus.
Language: th
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/th")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/th docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.th')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 542166,
    "fields": {
      "doc_id": {
        "max_len": 10,
        "common_prefix": ""
      }
    }
  }
}
The dev set for Thai.
Language: th
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/th/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/th/dev queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.th.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/th
Language: th
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/th/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/th/dev docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.th.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 0 | Not Relevant | 6.2K | 82.3% | 
| 1 | Relevant | 1.3K | 17.7% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/th/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/th/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.th.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 542166,
    "fields": {
      "doc_id": {
        "max_len": 10,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 733
  },
  "qrels": {
    "count": 7573,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 1343,
          "0": 6230
        }
      }
    }
  }
}
The held-out test set (version a) for Thai.
Language: th
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/th/test-a")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/th/test-a queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.th.test-a.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/th
Language: th
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/th/test-a")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/th/test-a docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.th.test-a')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 542166,
    "fields": {
      "doc_id": {
        "max_len": 10,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 992
  }
}
The held-out test set (version b) for Thai.
Language: th
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/th/test-b")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/th/test-b queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.th.test-b.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/th
Language: th
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/th/test-b")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/th/test-b docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.th.test-b')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 542166,
    "fields": {
      "doc_id": {
        "max_len": 10,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 650
  }
}
The train set for Thai.
Language: th
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/th/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/th/train queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.th.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/th
Language: th
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/th/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/th/train docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.th.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 0 | Not Relevant | 17K | 77.6% | 
| 1 | Relevant | 4.8K | 22.4% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/th/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/th/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.th.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 542166,
    "fields": {
      "doc_id": {
        "max_len": 10,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 2972
  },
  "qrels": {
    "count": 21293,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 4778,
          "0": 16515
        }
      }
    }
  }
}
The Yoruba corpus.
Language: yo
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/yo")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/yo docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.yo')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 49043,
    "fields": {
      "doc_id": {
        "max_len": 9,
        "common_prefix": ""
      }
    }
  }
}
The dev set for Yoruba.
Language: yo
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/yo/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/yo/dev queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.yo.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/yo
Language: yo
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/yo/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/yo/dev docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.yo.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 0 | Not Relevant | 1.0K | 87.9% | 
| 1 | Relevant | 144 | 12.1% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/yo/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/yo/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.yo.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 49043,
    "fields": {
      "doc_id": {
        "max_len": 9,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 119
  },
  "qrels": {
    "count": 1188,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 144,
          "0": 1044
        }
      }
    }
  }
}
The held-out test set (version b) for Yoruba.
Language: yo
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/yo/test-b")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/yo/test-b queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.yo.test-b.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/yo
Language: yo
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/yo/test-b")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/yo/test-b docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.yo.test-b')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 49043,
    "fields": {
      "doc_id": {
        "max_len": 9,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 288
  }
}
The Chinese corpus.
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/zh")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/zh docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.zh')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 4934368,
    "fields": {
      "doc_id": {
        "max_len": 12,
        "common_prefix": ""
      }
    }
  }
}
The dev set for Chinese.
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/zh/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/zh/dev queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.zh.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/zh
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/zh/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/zh/dev docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.zh.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 0 | Not Relevant | 2.9K | 74.7% | 
| 1 | Relevant | 994 | 25.3% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/zh/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/zh/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.zh.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 4934368,
    "fields": {
      "doc_id": {
        "max_len": 12,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 393
  },
  "qrels": {
    "count": 3928,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 994,
          "0": 2934
        }
      }
    }
  }
}
The held-out test set (version b) for Chinese.
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/zh/test-b")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/zh/test-b queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.zh.test-b.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/zh
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/zh/test-b")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/zh/test-b docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.zh.test-b')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 4934368,
    "fields": {
      "doc_id": {
        "max_len": 12,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 920
  }
}
The train set for Chinese.
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/zh/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/zh/train queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.zh.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/zh
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/zh/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/zh/train docs
[doc_id]    [title]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.zh.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 0 | Not Relevant | 9.9K | 75.7% | 
| 1 | Relevant | 3.2K | 24.3% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/zh/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/zh/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.zh.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
  "docs": {
    "count": 4934368,
    "fields": {
      "doc_id": {
        "max_len": 12,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 1312
  },
  "qrels": {
    "count": 13113,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 3187,
          "0": 9926
        }
      }
    }
  }
}