ir_datasets: MIRACLMIRACL is a multilingual adhoc retrieval dataset covering 18 languages. The document corpora are based on Wikipedia dumps, which are split into passages.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }The Arabic corpus.
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ar")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ar docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ar')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 2061414,
"fields": {
"doc_id": {
"max_len": 11,
"common_prefix": ""
}
}
}
}
The dev set for Arabic.
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ar/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ar/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ar.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ar
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ar/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ar/dev docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ar.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 0 | Not Relevant | 24K | 80.6% |
| 1 | Relevant | 5.7K | 19.4% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ar/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/ar/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.ar.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 2061414,
"fields": {
"doc_id": {
"max_len": 11,
"common_prefix": ""
}
}
},
"queries": {
"count": 2896
},
"qrels": {
"count": 29197,
"fields": {
"relevance": {
"counts_by_value": {
"1": 5658,
"0": 23539
}
}
}
}
}
The held-out test set (version a) for Arabic.
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ar/test-a")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ar/test-a queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ar.test-a.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ar
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ar/test-a")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ar/test-a docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ar.test-a')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 2061414,
"fields": {
"doc_id": {
"max_len": 11,
"common_prefix": ""
}
}
},
"queries": {
"count": 936
}
}
The held-out test set (version b) for Arabic.
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ar/test-b")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ar/test-b queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ar.test-b.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ar
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ar/test-b")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ar/test-b docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ar.test-b')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 2061414,
"fields": {
"doc_id": {
"max_len": 11,
"common_prefix": ""
}
}
},
"queries": {
"count": 1405
}
}
The train set for Arabic.
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ar/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ar/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ar.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ar
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ar/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ar/train docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ar.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 0 | Not Relevant | 19K | 75.5% |
| 1 | Relevant | 6.2K | 24.5% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ar/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/ar/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.ar.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 2061414,
"fields": {
"doc_id": {
"max_len": 11,
"common_prefix": ""
}
}
},
"queries": {
"count": 3495
},
"qrels": {
"count": 25382,
"fields": {
"relevance": {
"counts_by_value": {
"1": 6217,
"0": 19165
}
}
}
}
}
The Bengali corpus.
Language: bn
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/bn")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/bn docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.bn')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 297265,
"fields": {
"doc_id": {
"max_len": 10,
"common_prefix": ""
}
}
}
}
The dev set for Bengali.
Language: bn
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/bn/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/bn/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.bn.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/bn
Language: bn
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/bn/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/bn/dev docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.bn.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 0 | Not Relevant | 3.3K | 79.5% |
| 1 | Relevant | 863 | 20.5% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/bn/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/bn/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.bn.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 297265,
"fields": {
"doc_id": {
"max_len": 10,
"common_prefix": ""
}
}
},
"queries": {
"count": 411
},
"qrels": {
"count": 4206,
"fields": {
"relevance": {
"counts_by_value": {
"1": 863,
"0": 3343
}
}
}
}
}
The held-out test set (version a) for Bengali.
Language: bn
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/bn/test-a")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/bn/test-a queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.bn.test-a.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/bn
Language: bn
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/bn/test-a")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/bn/test-a docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.bn.test-a')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 297265,
"fields": {
"doc_id": {
"max_len": 10,
"common_prefix": ""
}
}
},
"queries": {
"count": 102
}
}
The held-out test set (version b) for Bengali.
Language: bn
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/bn/test-b")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/bn/test-b queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.bn.test-b.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/bn
Language: bn
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/bn/test-b")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/bn/test-b docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.bn.test-b')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 297265,
"fields": {
"doc_id": {
"max_len": 10,
"common_prefix": ""
}
}
},
"queries": {
"count": 1130
}
}
The train set for Bengali.
Language: bn
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/bn/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/bn/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.bn.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/bn
Language: bn
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/bn/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/bn/train docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.bn.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 0 | Not Relevant | 13K | 77.0% |
| 1 | Relevant | 3.9K | 23.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/bn/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/bn/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.bn.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 297265,
"fields": {
"doc_id": {
"max_len": 10,
"common_prefix": ""
}
}
},
"queries": {
"count": 1631
},
"qrels": {
"count": 16754,
"fields": {
"relevance": {
"counts_by_value": {
"1": 3859,
"0": 12895
}
}
}
}
}
The German corpus.
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/de")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/de docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.de')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 15866222,
"fields": {
"doc_id": {
"max_len": 12,
"common_prefix": ""
}
}
}
}
The dev set for German.
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/de/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/de/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.de.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/de
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/de/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/de/dev docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.de.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 0 | Not Relevant | 2.3K | 74.2% |
| 1 | Relevant | 811 | 25.8% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/de/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/de/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.de.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 15866222,
"fields": {
"doc_id": {
"max_len": 12,
"common_prefix": ""
}
}
},
"queries": {
"count": 305
},
"qrels": {
"count": 3144,
"fields": {
"relevance": {
"counts_by_value": {
"1": 811,
"0": 2333
}
}
}
}
}
The held-out test set (version b) for German.
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/de/test-b")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/de/test-b queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.de.test-b.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/de
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/de/test-b")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/de/test-b docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.de.test-b')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 15866222,
"fields": {
"doc_id": {
"max_len": 12,
"common_prefix": ""
}
}
},
"queries": {
"count": 712
}
}
The English corpus.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/en")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/en docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:miracl/en')
# Index miracl/en
indexer = pt.IterDictIndexer('./indices/miracl_en')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.en')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 32893221,
"fields": {
"doc_id": {
"max_len": 13,
"common_prefix": ""
}
}
}
}
The dev set for English.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/en/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/en/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:miracl/en/dev')
index_ref = pt.IndexRef.of('./indices/miracl_en') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.en.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/en
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/en/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/en/dev docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:miracl/en/dev')
# Index miracl/en
indexer = pt.IterDictIndexer('./indices/miracl_en')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.en.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 0 | Not Relevant | 6.0K | 72.1% |
| 1 | Relevant | 2.3K | 27.9% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/en/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/en/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:miracl/en/dev')
index_ref = pt.IndexRef.of('./indices/miracl_en') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.en.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 32893221,
"fields": {
"doc_id": {
"max_len": 13,
"common_prefix": ""
}
}
},
"queries": {
"count": 799
},
"qrels": {
"count": 8350,
"fields": {
"relevance": {
"counts_by_value": {
"1": 2326,
"0": 6024
}
}
}
}
}
The held-out test set (version a) for English.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/en/test-a")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/en/test-a queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:miracl/en/test-a')
index_ref = pt.IndexRef.of('./indices/miracl_en') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.en.test-a.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/en
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/en/test-a")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/en/test-a docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:miracl/en/test-a')
# Index miracl/en
indexer = pt.IterDictIndexer('./indices/miracl_en')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.en.test-a')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 32893221,
"fields": {
"doc_id": {
"max_len": 13,
"common_prefix": ""
}
}
},
"queries": {
"count": 734
}
}
The held-out test set (version b) for English.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/en/test-b")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/en/test-b queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:miracl/en/test-b')
index_ref = pt.IndexRef.of('./indices/miracl_en') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.en.test-b.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/en
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/en/test-b")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/en/test-b docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:miracl/en/test-b')
# Index miracl/en
indexer = pt.IterDictIndexer('./indices/miracl_en')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.en.test-b')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 32893221,
"fields": {
"doc_id": {
"max_len": 13,
"common_prefix": ""
}
}
},
"queries": {
"count": 1790
}
}
The train set for English.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/en/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/en/train queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:miracl/en/train')
index_ref = pt.IndexRef.of('./indices/miracl_en') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.en.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/en
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/en/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/en/train docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:miracl/en/train')
# Index miracl/en
indexer = pt.IterDictIndexer('./indices/miracl_en')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.en.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 0 | Not Relevant | 22K | 73.1% |
| 1 | Relevant | 7.9K | 26.9% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/en/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/en/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:miracl/en/train')
index_ref = pt.IndexRef.of('./indices/miracl_en') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.en.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 32893221,
"fields": {
"doc_id": {
"max_len": 13,
"common_prefix": ""
}
}
},
"queries": {
"count": 2863
},
"qrels": {
"count": 29416,
"fields": {
"relevance": {
"counts_by_value": {
"1": 7899,
"0": 21517
}
}
}
}
}
The Spanish corpus.
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/es")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/es docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.es')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 10373953,
"fields": {
"doc_id": {
"max_len": 12,
"common_prefix": ""
}
}
}
}
The dev set for Spanish.
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/es/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/es/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.es.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/es
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/es/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/es/dev docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.es.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 0 | Not Relevant | 3.5K | 53.6% |
| 1 | Relevant | 3.0K | 46.4% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/es/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/es/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.es.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 10373953,
"fields": {
"doc_id": {
"max_len": 12,
"common_prefix": ""
}
}
},
"queries": {
"count": 648
},
"qrels": {
"count": 6443,
"fields": {
"relevance": {
"counts_by_value": {
"1": 2987,
"0": 3456
}
}
}
}
}
The held-out test set (version b) for Spanish.
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/es/test-b")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/es/test-b queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.es.test-b.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/es
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/es/test-b")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/es/test-b docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.es.test-b')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 10373953,
"fields": {
"doc_id": {
"max_len": 12,
"common_prefix": ""
}
}
},
"queries": {
"count": 1515
}
}
The train set for Spanish.
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/es/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/es/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.es.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/es
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/es/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/es/train docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.es.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 0 | Not Relevant | 12K | 53.4% |
| 1 | Relevant | 10K | 46.6% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/es/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/es/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.es.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 10373953,
"fields": {
"doc_id": {
"max_len": 12,
"common_prefix": ""
}
}
},
"queries": {
"count": 2162
},
"qrels": {
"count": 21531,
"fields": {
"relevance": {
"counts_by_value": {
"1": 10025,
"0": 11506
}
}
}
}
}
The Persian corpus.
Language: fa
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fa")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/fa docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fa')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 2207172,
"fields": {
"doc_id": {
"max_len": 11,
"common_prefix": ""
}
}
}
}
The dev set for Persian.
Language: fa
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fa/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/fa/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.fa.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/fa
Language: fa
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fa/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/fa/dev docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fa.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 0 | Not Relevant | 5.3K | 80.0% |
| 1 | Relevant | 1.3K | 20.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fa/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/fa/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.fa.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 2207172,
"fields": {
"doc_id": {
"max_len": 11,
"common_prefix": ""
}
}
},
"queries": {
"count": 632
},
"qrels": {
"count": 6571,
"fields": {
"relevance": {
"counts_by_value": {
"1": 1314,
"0": 5257
}
}
}
}
}
The held-out test set (version b) for Persian.
Language: fa
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fa/test-b")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/fa/test-b queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.fa.test-b.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/fa
Language: fa
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fa/test-b")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/fa/test-b docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fa.test-b')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 2207172,
"fields": {
"doc_id": {
"max_len": 11,
"common_prefix": ""
}
}
},
"queries": {
"count": 1476
}
}
The train set for Persian.
Language: fa
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fa/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/fa/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.fa.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/fa
Language: fa
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fa/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/fa/train docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fa.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 0 | Not Relevant | 18K | 80.4% |
| 1 | Relevant | 4.3K | 19.6% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fa/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/fa/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.fa.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 2207172,
"fields": {
"doc_id": {
"max_len": 11,
"common_prefix": ""
}
}
},
"queries": {
"count": 2107
},
"qrels": {
"count": 21844,
"fields": {
"relevance": {
"counts_by_value": {
"1": 4277,
"0": 17567
}
}
}
}
}
The Finnish corpus.
Language: fi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fi")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/fi docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fi')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 1883509,
"fields": {
"doc_id": {
"max_len": 11,
"common_prefix": ""
}
}
}
}
The dev set for Finnish.
Language: fi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fi/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/fi/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.fi.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/fi
Language: fi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fi/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/fi/dev docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fi.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 0 | Not Relevant | 9.6K | 79.6% |
| 1 | Relevant | 2.4K | 20.4% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fi/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/fi/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.fi.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 1883509,
"fields": {
"doc_id": {
"max_len": 11,
"common_prefix": ""
}
}
},
"queries": {
"count": 1271
},
"qrels": {
"count": 12008,
"fields": {
"relevance": {
"counts_by_value": {
"1": 2447,
"0": 9561
}
}
}
}
}
The held-out test set (version a) for Finnish.
Language: fi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fi/test-a")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/fi/test-a queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.fi.test-a.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/fi
Language: fi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fi/test-a")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/fi/test-a docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fi.test-a')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 1883509,
"fields": {
"doc_id": {
"max_len": 11,
"common_prefix": ""
}
}
},
"queries": {
"count": 1060
}
}
The held-out test set (version b) for Finnish.
Language: fi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fi/test-b")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/fi/test-b queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.fi.test-b.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/fi
Language: fi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fi/test-b")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/fi/test-b docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fi.test-b')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 1883509,
"fields": {
"doc_id": {
"max_len": 11,
"common_prefix": ""
}
}
},
"queries": {
"count": 711
}
}
The train set for Finnish.
Language: fi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fi/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/fi/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.fi.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/fi
Language: fi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fi/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/fi/train docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fi.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 0 | Not Relevant | 15K | 75.8% |
| 1 | Relevant | 4.9K | 24.2% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fi/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/fi/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.fi.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 1883509,
"fields": {
"doc_id": {
"max_len": 11,
"common_prefix": ""
}
}
},
"queries": {
"count": 2897
},
"qrels": {
"count": 20350,
"fields": {
"relevance": {
"counts_by_value": {
"1": 4928,
"0": 15422
}
}
}
}
}
The French corpus.
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fr")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/fr docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fr')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 14636953,
"fields": {
"doc_id": {
"max_len": 12,
"common_prefix": ""
}
}
}
}
The dev set for French.
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fr/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/fr/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.fr.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/fr
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fr/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/fr/dev docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fr.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 0 | Not Relevant | 2.7K | 78.7% |
| 1 | Relevant | 731 | 21.3% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fr/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/fr/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.fr.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 14636953,
"fields": {
"doc_id": {
"max_len": 12,
"common_prefix": ""
}
}
},
"queries": {
"count": 343
},
"qrels": {
"count": 3429,
"fields": {
"relevance": {
"counts_by_value": {
"1": 731,
"0": 2698
}
}
}
}
}
The held-out test set (version b) for French.
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fr/test-b")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/fr/test-b queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.fr.test-b.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/fr
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fr/test-b")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/fr/test-b docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fr.test-b')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 14636953,
"fields": {
"doc_id": {
"max_len": 12,
"common_prefix": ""
}
}
},
"queries": {
"count": 801
}
}
The train set for French.
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fr/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/fr/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.fr.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/fr
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fr/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/fr/train docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fr.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 0 | Not Relevant | 9.1K | 79.7% |
| 1 | Relevant | 2.3K | 20.3% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fr/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/fr/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.fr.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 14636953,
"fields": {
"doc_id": {
"max_len": 12,
"common_prefix": ""
}
}
},
"queries": {
"count": 1143
},
"qrels": {
"count": 11426,
"fields": {
"relevance": {
"counts_by_value": {
"1": 2321,
"0": 9105
}
}
}
}
}
The Hindi corpus.
Language: hi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/hi")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/hi docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.hi')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 506264,
"fields": {
"doc_id": {
"max_len": 11,
"common_prefix": ""
}
}
}
}
The dev set for Hindi.
Language: hi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/hi/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/hi/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.hi.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/hi
Language: hi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/hi/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/hi/dev docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.hi.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 0 | Not Relevant | 2.7K | 78.5% |
| 1 | Relevant | 752 | 21.5% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/hi/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/hi/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.hi.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 506264,
"fields": {
"doc_id": {
"max_len": 11,
"common_prefix": ""
}
}
},
"queries": {
"count": 350
},
"qrels": {
"count": 3494,
"fields": {
"relevance": {
"counts_by_value": {
"1": 752,
"0": 2742
}
}
}
}
}
The held-out test set (version b) for Hindi.
Language: hi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/hi/test-b")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/hi/test-b queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.hi.test-b.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/hi
Language: hi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/hi/test-b")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/hi/test-b docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.hi.test-b')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 506264,
"fields": {
"doc_id": {
"max_len": 11,
"common_prefix": ""
}
}
},
"queries": {
"count": 819
}
}
The train set for Hindi.
Language: hi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/hi/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/hi/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.hi.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/hi
Language: hi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/hi/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/hi/train docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.hi.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 0 | Not Relevant | 9.2K | 78.8% |
| 1 | Relevant | 2.5K | 21.2% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/hi/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/hi/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.hi.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 506264,
"fields": {
"doc_id": {
"max_len": 11,
"common_prefix": ""
}
}
},
"queries": {
"count": 1169
},
"qrels": {
"count": 11668,
"fields": {
"relevance": {
"counts_by_value": {
"1": 2469,
"0": 9199
}
}
}
}
}
The Indonesian corpus.
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/id")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/id docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.id')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 1446315,
"fields": {
"doc_id": {
"max_len": 11,
"common_prefix": ""
}
}
}
}
The dev set for Indonesian.
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/id/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/id/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.id.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/id
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/id/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/id/dev docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.id.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 0 | Not Relevant | 6.6K | 68.1% |
| 1 | Relevant | 3.1K | 31.9% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/id/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/id/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.id.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 1446315,
"fields": {
"doc_id": {
"max_len": 11,
"common_prefix": ""
}
}
},
"queries": {
"count": 960
},
"qrels": {
"count": 9668,
"fields": {
"relevance": {
"counts_by_value": {
"1": 3088,
"0": 6580
}
}
}
}
}
The held-out test set (version a) for Indonesian.
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/id/test-a")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/id/test-a queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.id.test-a.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/id
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/id/test-a")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/id/test-a docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.id.test-a')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 1446315,
"fields": {
"doc_id": {
"max_len": 11,
"common_prefix": ""
}
}
},
"queries": {
"count": 731
}
}
The held-out test set (version b) for Indonesian.
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/id/test-b")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/id/test-b queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.id.test-b.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/id
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/id/test-b")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/id/test-b docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.id.test-b')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 1446315,
"fields": {
"doc_id": {
"max_len": 11,
"common_prefix": ""
}
}
},
"queries": {
"count": 611
}
}
The train set for Indonesian.
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/id/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/id/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.id.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/id
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/id/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/id/train docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.id.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 0 | Not Relevant | 29K | 69.8% |
| 1 | Relevant | 13K | 30.2% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/id/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/id/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.id.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 1446315,
"fields": {
"doc_id": {
"max_len": 11,
"common_prefix": ""
}
}
},
"queries": {
"count": 4071
},
"qrels": {
"count": 41358,
"fields": {
"relevance": {
"counts_by_value": {
"1": 12505,
"0": 28853
}
}
}
}
}
The Japanese corpus.
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ja")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ja docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ja')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 6953614,
"fields": {
"doc_id": {
"max_len": 12,
"common_prefix": ""
}
}
}
}
The dev set for Japanese.
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ja/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ja/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ja.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ja
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ja/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ja/dev docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ja.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 0 | Not Relevant | 6.6K | 78.6% |
| 1 | Relevant | 1.8K | 21.4% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ja/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/ja/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.ja.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 6953614,
"fields": {
"doc_id": {
"max_len": 12,
"common_prefix": ""
}
}
},
"queries": {
"count": 860
},
"qrels": {
"count": 8354,
"fields": {
"relevance": {
"counts_by_value": {
"1": 1790,
"0": 6564
}
}
}
}
}
The held-out test set (version a) for Japanese.
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ja/test-a")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ja/test-a queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ja.test-a.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ja
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ja/test-a")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ja/test-a docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ja.test-a')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 6953614,
"fields": {
"doc_id": {
"max_len": 12,
"common_prefix": ""
}
}
},
"queries": {
"count": 650
}
}
The held-out test set (version b) for Japanese.
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ja/test-b")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ja/test-b queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ja.test-b.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ja
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ja/test-b")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ja/test-b docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ja.test-b')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 6953614,
"fields": {
"doc_id": {
"max_len": 12,
"common_prefix": ""
}
}
},
"queries": {
"count": 1141
}
}
The train set for Japanese.
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ja/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ja/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ja.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ja
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ja/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ja/train docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ja.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 0 | Not Relevant | 27K | 79.7% |
| 1 | Relevant | 7.0K | 20.3% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ja/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/ja/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.ja.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 6953614,
"fields": {
"doc_id": {
"max_len": 12,
"common_prefix": ""
}
}
},
"queries": {
"count": 3477
},
"qrels": {
"count": 34387,
"fields": {
"relevance": {
"counts_by_value": {
"1": 6984,
"0": 27403
}
}
}
}
}
The Korean corpus.
Language: ko
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ko")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ko docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ko')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 1486752,
"fields": {
"doc_id": {
"max_len": 12,
"common_prefix": ""
}
}
}
}
The dev set for Korean.
Language: ko
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ko/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ko/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ko.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ko
Language: ko
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ko/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ko/dev docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ko.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 0 | Not Relevant | 2.5K | 82.1% |
| 1 | Relevant | 547 | 17.9% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ko/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/ko/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.ko.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 1486752,
"fields": {
"doc_id": {
"max_len": 12,
"common_prefix": ""
}
}
},
"queries": {
"count": 213
},
"qrels": {
"count": 3057,
"fields": {
"relevance": {
"counts_by_value": {
"1": 547,
"0": 2510
}
}
}
}
}
The held-out test set (version a) for Korean.
Language: ko
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ko/test-a")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ko/test-a queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ko.test-a.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ko
Language: ko
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ko/test-a")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ko/test-a docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ko.test-a')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 1486752,
"fields": {
"doc_id": {
"max_len": 12,
"common_prefix": ""
}
}
},
"queries": {
"count": 263
}
}
The held-out test set (version b) for Korean.
Language: ko
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ko/test-b")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ko/test-b queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ko.test-b.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ko
Language: ko
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ko/test-b")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ko/test-b docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ko.test-b')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 1486752,
"fields": {
"doc_id": {
"max_len": 12,
"common_prefix": ""
}
}
},
"queries": {
"count": 1417
}
}
The train set for Korean.
Language: ko
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ko/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ko/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ko.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ko
Language: ko
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ko/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ko/train docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ko.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 0 | Not Relevant | 11K | 84.5% |
| 1 | Relevant | 2.0K | 15.5% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ko/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/ko/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.ko.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 1486752,
"fields": {
"doc_id": {
"max_len": 12,
"common_prefix": ""
}
}
},
"queries": {
"count": 868
},
"qrels": {
"count": 12767,
"fields": {
"relevance": {
"counts_by_value": {
"1": 1973,
"0": 10794
}
}
}
}
}
The Russian corpus.
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ru")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ru docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ru')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 9543918,
"fields": {
"doc_id": {
"max_len": 12,
"common_prefix": ""
}
}
}
}
The dev set for Russian.
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ru/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ru/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ru.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ru
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ru/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ru/dev docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ru.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 0 | Not Relevant | 9.5K | 72.8% |
| 1 | Relevant | 3.6K | 27.2% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ru/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/ru/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.ru.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 9543918,
"fields": {
"doc_id": {
"max_len": 12,
"common_prefix": ""
}
}
},
"queries": {
"count": 1252
},
"qrels": {
"count": 13100,
"fields": {
"relevance": {
"counts_by_value": {
"1": 3560,
"0": 9540
}
}
}
}
}
The held-out test set (version a) for Russian.
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ru/test-a")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ru/test-a queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ru.test-a.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ru
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ru/test-a")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ru/test-a docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ru.test-a')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 9543918,
"fields": {
"doc_id": {
"max_len": 12,
"common_prefix": ""
}
}
},
"queries": {
"count": 911
}
}
The held-out test set (version b) for Russian.
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ru/test-b")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ru/test-b queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ru.test-b.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ru
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ru/test-b")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ru/test-b docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ru.test-b')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 9543918,
"fields": {
"doc_id": {
"max_len": 12,
"common_prefix": ""
}
}
},
"queries": {
"count": 718
}
}
The train set for Russian.
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ru/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ru/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ru.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ru
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ru/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ru/train docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ru.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 0 | Not Relevant | 24K | 70.5% |
| 1 | Relevant | 10K | 29.5% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ru/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/ru/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.ru.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 9543918,
"fields": {
"doc_id": {
"max_len": 12,
"common_prefix": ""
}
}
},
"queries": {
"count": 4683
},
"qrels": {
"count": 33921,
"fields": {
"relevance": {
"counts_by_value": {
"1": 10000,
"0": 23921
}
}
}
}
}
The Swahili corpus.
Language: sw
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/sw")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/sw docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.sw')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 131924,
"fields": {
"doc_id": {
"max_len": 9,
"common_prefix": ""
}
}
}
}
The dev set for Swahili.
Language: sw
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/sw/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/sw/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.sw.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/sw
Language: sw
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/sw/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/sw/dev docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.sw.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 0 | Not Relevant | 4.2K | 82.1% |
| 1 | Relevant | 910 | 17.9% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/sw/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/sw/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.sw.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 131924,
"fields": {
"doc_id": {
"max_len": 9,
"common_prefix": ""
}
}
},
"queries": {
"count": 482
},
"qrels": {
"count": 5092,
"fields": {
"relevance": {
"counts_by_value": {
"1": 910,
"0": 4182
}
}
}
}
}
The held-out test set (version a) for Swahili.
Language: sw
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/sw/test-a")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/sw/test-a queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.sw.test-a.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/sw
Language: sw
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/sw/test-a")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/sw/test-a docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.sw.test-a')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 131924,
"fields": {
"doc_id": {
"max_len": 9,
"common_prefix": ""
}
}
},
"queries": {
"count": 638
}
}
The held-out test set (version b) for Swahili.
Language: sw
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/sw/test-b")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/sw/test-b queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.sw.test-b.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/sw
Language: sw
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/sw/test-b")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/sw/test-b docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.sw.test-b')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 131924,
"fields": {
"doc_id": {
"max_len": 9,
"common_prefix": ""
}
}
},
"queries": {
"count": 465
}
}
The train set for Swahili.
Language: sw
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/sw/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/sw/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.sw.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/sw
Language: sw
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/sw/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/sw/train docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.sw.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 0 | Not Relevant | 6.7K | 71.3% |
| 1 | Relevant | 2.7K | 28.7% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/sw/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/sw/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.sw.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 131924,
"fields": {
"doc_id": {
"max_len": 9,
"common_prefix": ""
}
}
},
"queries": {
"count": 1901
},
"qrels": {
"count": 9359,
"fields": {
"relevance": {
"counts_by_value": {
"1": 2687,
"0": 6672
}
}
}
}
}
The Telugu corpus.
Language: te
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/te")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/te docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.te')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 518079,
"fields": {
"doc_id": {
"max_len": 10,
"common_prefix": ""
}
}
}
}
The dev set for Telugu.
Language: te
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/te/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/te/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.te.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/te
Language: te
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/te/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/te/dev docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.te.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 0 | Not Relevant | 752 | 46.8% |
| 1 | Relevant | 854 | 53.2% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/te/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/te/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.te.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 518079,
"fields": {
"doc_id": {
"max_len": 10,
"common_prefix": ""
}
}
},
"queries": {
"count": 828
},
"qrels": {
"count": 1606,
"fields": {
"relevance": {
"counts_by_value": {
"1": 854,
"0": 752
}
}
}
}
}
The held-out test set (version a) for Telugu.
Language: te
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/te/test-a")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/te/test-a queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.te.test-a.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/te
Language: te
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/te/test-a")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/te/test-a docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.te.test-a')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 518079,
"fields": {
"doc_id": {
"max_len": 10,
"common_prefix": ""
}
}
},
"queries": {
"count": 594
}
}
The held-out test set (version b) for Telugu.
Language: te
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/te/test-b")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/te/test-b queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.te.test-b.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/te
Language: te
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/te/test-b")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/te/test-b docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.te.test-b')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 518079,
"fields": {
"doc_id": {
"max_len": 10,
"common_prefix": ""
}
}
},
"queries": {
"count": 793
}
}
The train set for Telugu.
Language: te
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/te/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/te/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.te.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/te
Language: te
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/te/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/te/train docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.te.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 0 | Not Relevant | 14K | 77.9% |
| 1 | Relevant | 4.1K | 22.1% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/te/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/te/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.te.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 518079,
"fields": {
"doc_id": {
"max_len": 10,
"common_prefix": ""
}
}
},
"queries": {
"count": 3452
},
"qrels": {
"count": 18608,
"fields": {
"relevance": {
"counts_by_value": {
"1": 4119,
"0": 14489
}
}
}
}
}
The Thai corpus.
Language: th
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/th")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/th docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.th')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 542166,
"fields": {
"doc_id": {
"max_len": 10,
"common_prefix": ""
}
}
}
}
The dev set for Thai.
Language: th
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/th/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/th/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.th.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/th
Language: th
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/th/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/th/dev docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.th.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 0 | Not Relevant | 6.2K | 82.3% |
| 1 | Relevant | 1.3K | 17.7% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/th/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/th/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.th.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 542166,
"fields": {
"doc_id": {
"max_len": 10,
"common_prefix": ""
}
}
},
"queries": {
"count": 733
},
"qrels": {
"count": 7573,
"fields": {
"relevance": {
"counts_by_value": {
"1": 1343,
"0": 6230
}
}
}
}
}
The held-out test set (version a) for Thai.
Language: th
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/th/test-a")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/th/test-a queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.th.test-a.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/th
Language: th
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/th/test-a")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/th/test-a docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.th.test-a')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 542166,
"fields": {
"doc_id": {
"max_len": 10,
"common_prefix": ""
}
}
},
"queries": {
"count": 992
}
}
The held-out test set (version b) for Thai.
Language: th
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/th/test-b")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/th/test-b queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.th.test-b.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/th
Language: th
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/th/test-b")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/th/test-b docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.th.test-b')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 542166,
"fields": {
"doc_id": {
"max_len": 10,
"common_prefix": ""
}
}
},
"queries": {
"count": 650
}
}
The train set for Thai.
Language: th
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/th/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/th/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.th.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/th
Language: th
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/th/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/th/train docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.th.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 0 | Not Relevant | 17K | 77.6% |
| 1 | Relevant | 4.8K | 22.4% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/th/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/th/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.th.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 542166,
"fields": {
"doc_id": {
"max_len": 10,
"common_prefix": ""
}
}
},
"queries": {
"count": 2972
},
"qrels": {
"count": 21293,
"fields": {
"relevance": {
"counts_by_value": {
"1": 4778,
"0": 16515
}
}
}
}
}
The Yoruba corpus.
Language: yo
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/yo")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/yo docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.yo')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 49043,
"fields": {
"doc_id": {
"max_len": 9,
"common_prefix": ""
}
}
}
}
The dev set for Yoruba.
Language: yo
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/yo/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/yo/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.yo.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/yo
Language: yo
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/yo/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/yo/dev docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.yo.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 0 | Not Relevant | 1.0K | 87.9% |
| 1 | Relevant | 144 | 12.1% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/yo/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/yo/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.yo.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 49043,
"fields": {
"doc_id": {
"max_len": 9,
"common_prefix": ""
}
}
},
"queries": {
"count": 119
},
"qrels": {
"count": 1188,
"fields": {
"relevance": {
"counts_by_value": {
"1": 144,
"0": 1044
}
}
}
}
}
The held-out test set (version b) for Yoruba.
Language: yo
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/yo/test-b")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/yo/test-b queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.yo.test-b.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/yo
Language: yo
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/yo/test-b")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/yo/test-b docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.yo.test-b')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 49043,
"fields": {
"doc_id": {
"max_len": 9,
"common_prefix": ""
}
}
},
"queries": {
"count": 288
}
}
The Chinese corpus.
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/zh")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/zh docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.zh')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 4934368,
"fields": {
"doc_id": {
"max_len": 12,
"common_prefix": ""
}
}
}
}
The dev set for Chinese.
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/zh/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/zh/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.zh.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/zh
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/zh/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/zh/dev docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.zh.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 0 | Not Relevant | 2.9K | 74.7% |
| 1 | Relevant | 994 | 25.3% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/zh/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/zh/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.zh.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 4934368,
"fields": {
"doc_id": {
"max_len": 12,
"common_prefix": ""
}
}
},
"queries": {
"count": 393
},
"qrels": {
"count": 3928,
"fields": {
"relevance": {
"counts_by_value": {
"1": 994,
"0": 2934
}
}
}
}
}
The held-out test set (version b) for Chinese.
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/zh/test-b")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/zh/test-b queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.zh.test-b.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/zh
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/zh/test-b")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/zh/test-b docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.zh.test-b')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 4934368,
"fields": {
"doc_id": {
"max_len": 12,
"common_prefix": ""
}
}
},
"queries": {
"count": 920
}
}
The train set for Chinese.
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/zh/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/zh/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.zh.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/zh
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/zh/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/zh/train docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.zh.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 0 | Not Relevant | 9.9K | 75.7% |
| 1 | Relevant | 3.2K | 24.3% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/zh/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/zh/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.zh.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{
"docs": {
"count": 4934368,
"fields": {
"doc_id": {
"max_len": 12,
"common_prefix": ""
}
}
},
"queries": {
"count": 1312
},
"qrels": {
"count": 13113,
"fields": {
"relevance": {
"counts_by_value": {
"1": 3187,
"0": 9926
}
}
}
}
}