ir_datasets
: MIRACLMIRACL is a multilingual adhoc retrieval dataset covering 18 languages. The document corpora are based on Wikipedia dumps, which are split into passages.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }The Arabic corpus.
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ar")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ar docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ar')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 2061414, "fields": { "doc_id": { "max_len": 11, "common_prefix": "" } } } }
The dev set for Arabic.
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ar/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ar/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ar.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ar
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ar/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ar/dev docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ar.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
0 | Not Relevant | 24K | 80.6% |
1 | Relevant | 5.7K | 19.4% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ar/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/ar/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.ar.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 2061414, "fields": { "doc_id": { "max_len": 11, "common_prefix": "" } } }, "queries": { "count": 2896 }, "qrels": { "count": 29197, "fields": { "relevance": { "counts_by_value": { "1": 5658, "0": 23539 } } } } }
The held-out test set (version a) for Arabic.
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ar/test-a")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ar/test-a queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ar.test-a.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ar
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ar/test-a")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ar/test-a docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ar.test-a')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 2061414, "fields": { "doc_id": { "max_len": 11, "common_prefix": "" } } }, "queries": { "count": 936 } }
The held-out test set (version b) for Arabic.
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ar/test-b")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ar/test-b queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ar.test-b.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ar
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ar/test-b")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ar/test-b docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ar.test-b')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 2061414, "fields": { "doc_id": { "max_len": 11, "common_prefix": "" } } }, "queries": { "count": 1405 } }
The train set for Arabic.
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ar/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ar/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ar.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ar
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ar/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ar/train docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ar.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
0 | Not Relevant | 19K | 75.5% |
1 | Relevant | 6.2K | 24.5% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ar/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/ar/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.ar.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 2061414, "fields": { "doc_id": { "max_len": 11, "common_prefix": "" } } }, "queries": { "count": 3495 }, "qrels": { "count": 25382, "fields": { "relevance": { "counts_by_value": { "1": 6217, "0": 19165 } } } } }
The Bengali corpus.
Language: bn
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/bn")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/bn docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.bn')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 297265, "fields": { "doc_id": { "max_len": 10, "common_prefix": "" } } } }
The dev set for Bengali.
Language: bn
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/bn/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/bn/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.bn.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/bn
Language: bn
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/bn/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/bn/dev docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.bn.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
0 | Not Relevant | 3.3K | 79.5% |
1 | Relevant | 863 | 20.5% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/bn/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/bn/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.bn.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 297265, "fields": { "doc_id": { "max_len": 10, "common_prefix": "" } } }, "queries": { "count": 411 }, "qrels": { "count": 4206, "fields": { "relevance": { "counts_by_value": { "1": 863, "0": 3343 } } } } }
The held-out test set (version a) for Bengali.
Language: bn
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/bn/test-a")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/bn/test-a queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.bn.test-a.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/bn
Language: bn
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/bn/test-a")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/bn/test-a docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.bn.test-a')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 297265, "fields": { "doc_id": { "max_len": 10, "common_prefix": "" } } }, "queries": { "count": 102 } }
The held-out test set (version b) for Bengali.
Language: bn
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/bn/test-b")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/bn/test-b queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.bn.test-b.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/bn
Language: bn
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/bn/test-b")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/bn/test-b docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.bn.test-b')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 297265, "fields": { "doc_id": { "max_len": 10, "common_prefix": "" } } }, "queries": { "count": 1130 } }
The train set for Bengali.
Language: bn
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/bn/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/bn/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.bn.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/bn
Language: bn
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/bn/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/bn/train docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.bn.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
0 | Not Relevant | 13K | 77.0% |
1 | Relevant | 3.9K | 23.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/bn/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/bn/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.bn.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 297265, "fields": { "doc_id": { "max_len": 10, "common_prefix": "" } } }, "queries": { "count": 1631 }, "qrels": { "count": 16754, "fields": { "relevance": { "counts_by_value": { "1": 3859, "0": 12895 } } } } }
The German corpus.
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/de")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/de docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.de')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 15866222, "fields": { "doc_id": { "max_len": 12, "common_prefix": "" } } } }
The dev set for German.
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/de/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/de/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.de.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/de
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/de/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/de/dev docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.de.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
0 | Not Relevant | 2.3K | 74.2% |
1 | Relevant | 811 | 25.8% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/de/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/de/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.de.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 15866222, "fields": { "doc_id": { "max_len": 12, "common_prefix": "" } } }, "queries": { "count": 305 }, "qrels": { "count": 3144, "fields": { "relevance": { "counts_by_value": { "1": 811, "0": 2333 } } } } }
The held-out test set (version b) for German.
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/de/test-b")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/de/test-b queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.de.test-b.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/de
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/de/test-b")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/de/test-b docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.de.test-b')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 15866222, "fields": { "doc_id": { "max_len": 12, "common_prefix": "" } } }, "queries": { "count": 712 } }
The English corpus.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/en")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/en docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:miracl/en')
# Index miracl/en
indexer = pt.IterDictIndexer('./indices/miracl_en')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.en')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 32893221, "fields": { "doc_id": { "max_len": 13, "common_prefix": "" } } } }
The dev set for English.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/en/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/en/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:miracl/en/dev')
index_ref = pt.IndexRef.of('./indices/miracl_en') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.en.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/en
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/en/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/en/dev docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:miracl/en/dev')
# Index miracl/en
indexer = pt.IterDictIndexer('./indices/miracl_en')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.en.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
0 | Not Relevant | 6.0K | 72.1% |
1 | Relevant | 2.3K | 27.9% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/en/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/en/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:miracl/en/dev')
index_ref = pt.IndexRef.of('./indices/miracl_en') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.en.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 32893221, "fields": { "doc_id": { "max_len": 13, "common_prefix": "" } } }, "queries": { "count": 799 }, "qrels": { "count": 8350, "fields": { "relevance": { "counts_by_value": { "1": 2326, "0": 6024 } } } } }
The held-out test set (version a) for English.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/en/test-a")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/en/test-a queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:miracl/en/test-a')
index_ref = pt.IndexRef.of('./indices/miracl_en') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.en.test-a.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/en
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/en/test-a")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/en/test-a docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:miracl/en/test-a')
# Index miracl/en
indexer = pt.IterDictIndexer('./indices/miracl_en')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.en.test-a')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 32893221, "fields": { "doc_id": { "max_len": 13, "common_prefix": "" } } }, "queries": { "count": 734 } }
The held-out test set (version b) for English.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/en/test-b")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/en/test-b queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:miracl/en/test-b')
index_ref = pt.IndexRef.of('./indices/miracl_en') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.en.test-b.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/en
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/en/test-b")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/en/test-b docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:miracl/en/test-b')
# Index miracl/en
indexer = pt.IterDictIndexer('./indices/miracl_en')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.en.test-b')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 32893221, "fields": { "doc_id": { "max_len": 13, "common_prefix": "" } } }, "queries": { "count": 1790 } }
The train set for English.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/en/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/en/train queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:miracl/en/train')
index_ref = pt.IndexRef.of('./indices/miracl_en') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.en.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/en
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/en/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/en/train docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:miracl/en/train')
# Index miracl/en
indexer = pt.IterDictIndexer('./indices/miracl_en')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.en.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
0 | Not Relevant | 22K | 73.1% |
1 | Relevant | 7.9K | 26.9% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/en/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/en/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:miracl/en/train')
index_ref = pt.IndexRef.of('./indices/miracl_en') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.en.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 32893221, "fields": { "doc_id": { "max_len": 13, "common_prefix": "" } } }, "queries": { "count": 2863 }, "qrels": { "count": 29416, "fields": { "relevance": { "counts_by_value": { "1": 7899, "0": 21517 } } } } }
The Spanish corpus.
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/es")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/es docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.es')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 10373953, "fields": { "doc_id": { "max_len": 12, "common_prefix": "" } } } }
The dev set for Spanish.
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/es/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/es/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.es.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/es
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/es/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/es/dev docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.es.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
0 | Not Relevant | 3.5K | 53.6% |
1 | Relevant | 3.0K | 46.4% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/es/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/es/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.es.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 10373953, "fields": { "doc_id": { "max_len": 12, "common_prefix": "" } } }, "queries": { "count": 648 }, "qrels": { "count": 6443, "fields": { "relevance": { "counts_by_value": { "1": 2987, "0": 3456 } } } } }
The held-out test set (version b) for Spanish.
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/es/test-b")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/es/test-b queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.es.test-b.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/es
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/es/test-b")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/es/test-b docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.es.test-b')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 10373953, "fields": { "doc_id": { "max_len": 12, "common_prefix": "" } } }, "queries": { "count": 1515 } }
The train set for Spanish.
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/es/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/es/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.es.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/es
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/es/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/es/train docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.es.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
0 | Not Relevant | 12K | 53.4% |
1 | Relevant | 10K | 46.6% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/es/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/es/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.es.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 10373953, "fields": { "doc_id": { "max_len": 12, "common_prefix": "" } } }, "queries": { "count": 2162 }, "qrels": { "count": 21531, "fields": { "relevance": { "counts_by_value": { "1": 10025, "0": 11506 } } } } }
The Persian corpus.
Language: fa
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fa")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/fa docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fa')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 2207172, "fields": { "doc_id": { "max_len": 11, "common_prefix": "" } } } }
The dev set for Persian.
Language: fa
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fa/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/fa/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.fa.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/fa
Language: fa
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fa/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/fa/dev docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fa.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
0 | Not Relevant | 5.3K | 80.0% |
1 | Relevant | 1.3K | 20.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fa/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/fa/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.fa.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 2207172, "fields": { "doc_id": { "max_len": 11, "common_prefix": "" } } }, "queries": { "count": 632 }, "qrels": { "count": 6571, "fields": { "relevance": { "counts_by_value": { "1": 1314, "0": 5257 } } } } }
The held-out test set (version b) for Persian.
Language: fa
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fa/test-b")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/fa/test-b queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.fa.test-b.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/fa
Language: fa
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fa/test-b")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/fa/test-b docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fa.test-b')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 2207172, "fields": { "doc_id": { "max_len": 11, "common_prefix": "" } } }, "queries": { "count": 1476 } }
The train set for Persian.
Language: fa
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fa/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/fa/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.fa.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/fa
Language: fa
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fa/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/fa/train docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fa.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
0 | Not Relevant | 18K | 80.4% |
1 | Relevant | 4.3K | 19.6% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fa/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/fa/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.fa.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 2207172, "fields": { "doc_id": { "max_len": 11, "common_prefix": "" } } }, "queries": { "count": 2107 }, "qrels": { "count": 21844, "fields": { "relevance": { "counts_by_value": { "1": 4277, "0": 17567 } } } } }
The Finnish corpus.
Language: fi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fi")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/fi docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fi')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 1883509, "fields": { "doc_id": { "max_len": 11, "common_prefix": "" } } } }
The dev set for Finnish.
Language: fi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fi/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/fi/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.fi.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/fi
Language: fi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fi/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/fi/dev docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fi.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
0 | Not Relevant | 9.6K | 79.6% |
1 | Relevant | 2.4K | 20.4% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fi/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/fi/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.fi.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 1883509, "fields": { "doc_id": { "max_len": 11, "common_prefix": "" } } }, "queries": { "count": 1271 }, "qrels": { "count": 12008, "fields": { "relevance": { "counts_by_value": { "1": 2447, "0": 9561 } } } } }
The held-out test set (version a) for Finnish.
Language: fi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fi/test-a")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/fi/test-a queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.fi.test-a.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/fi
Language: fi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fi/test-a")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/fi/test-a docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fi.test-a')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 1883509, "fields": { "doc_id": { "max_len": 11, "common_prefix": "" } } }, "queries": { "count": 1060 } }
The held-out test set (version b) for Finnish.
Language: fi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fi/test-b")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/fi/test-b queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.fi.test-b.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/fi
Language: fi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fi/test-b")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/fi/test-b docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fi.test-b')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 1883509, "fields": { "doc_id": { "max_len": 11, "common_prefix": "" } } }, "queries": { "count": 711 } }
The train set for Finnish.
Language: fi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fi/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/fi/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.fi.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/fi
Language: fi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fi/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/fi/train docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fi.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
0 | Not Relevant | 15K | 75.8% |
1 | Relevant | 4.9K | 24.2% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fi/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/fi/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.fi.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 1883509, "fields": { "doc_id": { "max_len": 11, "common_prefix": "" } } }, "queries": { "count": 2897 }, "qrels": { "count": 20350, "fields": { "relevance": { "counts_by_value": { "1": 4928, "0": 15422 } } } } }
The French corpus.
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fr")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/fr docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fr')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 14636953, "fields": { "doc_id": { "max_len": 12, "common_prefix": "" } } } }
The dev set for French.
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fr/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/fr/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.fr.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/fr
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fr/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/fr/dev docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fr.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
0 | Not Relevant | 2.7K | 78.7% |
1 | Relevant | 731 | 21.3% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fr/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/fr/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.fr.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 14636953, "fields": { "doc_id": { "max_len": 12, "common_prefix": "" } } }, "queries": { "count": 343 }, "qrels": { "count": 3429, "fields": { "relevance": { "counts_by_value": { "1": 731, "0": 2698 } } } } }
The held-out test set (version b) for French.
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fr/test-b")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/fr/test-b queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.fr.test-b.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/fr
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fr/test-b")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/fr/test-b docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fr.test-b')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 14636953, "fields": { "doc_id": { "max_len": 12, "common_prefix": "" } } }, "queries": { "count": 801 } }
The train set for French.
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fr/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/fr/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.fr.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/fr
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fr/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/fr/train docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.fr.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
0 | Not Relevant | 9.1K | 79.7% |
1 | Relevant | 2.3K | 20.3% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/fr/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/fr/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.fr.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 14636953, "fields": { "doc_id": { "max_len": 12, "common_prefix": "" } } }, "queries": { "count": 1143 }, "qrels": { "count": 11426, "fields": { "relevance": { "counts_by_value": { "1": 2321, "0": 9105 } } } } }
The Hindi corpus.
Language: hi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/hi")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/hi docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.hi')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 506264, "fields": { "doc_id": { "max_len": 11, "common_prefix": "" } } } }
The dev set for Hindi.
Language: hi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/hi/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/hi/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.hi.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/hi
Language: hi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/hi/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/hi/dev docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.hi.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
0 | Not Relevant | 2.7K | 78.5% |
1 | Relevant | 752 | 21.5% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/hi/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/hi/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.hi.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 506264, "fields": { "doc_id": { "max_len": 11, "common_prefix": "" } } }, "queries": { "count": 350 }, "qrels": { "count": 3494, "fields": { "relevance": { "counts_by_value": { "1": 752, "0": 2742 } } } } }
The held-out test set (version b) for Hindi.
Language: hi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/hi/test-b")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/hi/test-b queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.hi.test-b.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/hi
Language: hi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/hi/test-b")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/hi/test-b docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.hi.test-b')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 506264, "fields": { "doc_id": { "max_len": 11, "common_prefix": "" } } }, "queries": { "count": 819 } }
The train set for Hindi.
Language: hi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/hi/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/hi/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.hi.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/hi
Language: hi
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/hi/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/hi/train docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.hi.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
0 | Not Relevant | 9.2K | 78.8% |
1 | Relevant | 2.5K | 21.2% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/hi/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/hi/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.hi.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 506264, "fields": { "doc_id": { "max_len": 11, "common_prefix": "" } } }, "queries": { "count": 1169 }, "qrels": { "count": 11668, "fields": { "relevance": { "counts_by_value": { "1": 2469, "0": 9199 } } } } }
The Indonesian corpus.
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/id")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/id docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.id')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 1446315, "fields": { "doc_id": { "max_len": 11, "common_prefix": "" } } } }
The dev set for Indonesian.
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/id/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/id/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.id.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/id
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/id/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/id/dev docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.id.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
0 | Not Relevant | 6.6K | 68.1% |
1 | Relevant | 3.1K | 31.9% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/id/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/id/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.id.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 1446315, "fields": { "doc_id": { "max_len": 11, "common_prefix": "" } } }, "queries": { "count": 960 }, "qrels": { "count": 9668, "fields": { "relevance": { "counts_by_value": { "1": 3088, "0": 6580 } } } } }
The held-out test set (version a) for Indonesian.
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/id/test-a")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/id/test-a queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.id.test-a.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/id
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/id/test-a")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/id/test-a docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.id.test-a')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 1446315, "fields": { "doc_id": { "max_len": 11, "common_prefix": "" } } }, "queries": { "count": 731 } }
The held-out test set (version b) for Indonesian.
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/id/test-b")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/id/test-b queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.id.test-b.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/id
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/id/test-b")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/id/test-b docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.id.test-b')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 1446315, "fields": { "doc_id": { "max_len": 11, "common_prefix": "" } } }, "queries": { "count": 611 } }
The train set for Indonesian.
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/id/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/id/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.id.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/id
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/id/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/id/train docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.id.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
0 | Not Relevant | 29K | 69.8% |
1 | Relevant | 13K | 30.2% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/id/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/id/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.id.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 1446315, "fields": { "doc_id": { "max_len": 11, "common_prefix": "" } } }, "queries": { "count": 4071 }, "qrels": { "count": 41358, "fields": { "relevance": { "counts_by_value": { "1": 12505, "0": 28853 } } } } }
The Japanese corpus.
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ja")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ja docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ja')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 6953614, "fields": { "doc_id": { "max_len": 12, "common_prefix": "" } } } }
The dev set for Japanese.
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ja/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ja/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ja.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ja
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ja/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ja/dev docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ja.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
0 | Not Relevant | 6.6K | 78.6% |
1 | Relevant | 1.8K | 21.4% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ja/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/ja/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.ja.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 6953614, "fields": { "doc_id": { "max_len": 12, "common_prefix": "" } } }, "queries": { "count": 860 }, "qrels": { "count": 8354, "fields": { "relevance": { "counts_by_value": { "1": 1790, "0": 6564 } } } } }
The held-out test set (version a) for Japanese.
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ja/test-a")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ja/test-a queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ja.test-a.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ja
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ja/test-a")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ja/test-a docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ja.test-a')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 6953614, "fields": { "doc_id": { "max_len": 12, "common_prefix": "" } } }, "queries": { "count": 650 } }
The held-out test set (version b) for Japanese.
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ja/test-b")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ja/test-b queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ja.test-b.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ja
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ja/test-b")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ja/test-b docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ja.test-b')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 6953614, "fields": { "doc_id": { "max_len": 12, "common_prefix": "" } } }, "queries": { "count": 1141 } }
The train set for Japanese.
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ja/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ja/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ja.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ja
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ja/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ja/train docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ja.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
0 | Not Relevant | 27K | 79.7% |
1 | Relevant | 7.0K | 20.3% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ja/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/ja/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.ja.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 6953614, "fields": { "doc_id": { "max_len": 12, "common_prefix": "" } } }, "queries": { "count": 3477 }, "qrels": { "count": 34387, "fields": { "relevance": { "counts_by_value": { "1": 6984, "0": 27403 } } } } }
The Korean corpus.
Language: ko
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ko")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ko docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ko')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 1486752, "fields": { "doc_id": { "max_len": 12, "common_prefix": "" } } } }
The dev set for Korean.
Language: ko
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ko/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ko/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ko.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ko
Language: ko
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ko/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ko/dev docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ko.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
0 | Not Relevant | 2.5K | 82.1% |
1 | Relevant | 547 | 17.9% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ko/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/ko/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.ko.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 1486752, "fields": { "doc_id": { "max_len": 12, "common_prefix": "" } } }, "queries": { "count": 213 }, "qrels": { "count": 3057, "fields": { "relevance": { "counts_by_value": { "1": 547, "0": 2510 } } } } }
The held-out test set (version a) for Korean.
Language: ko
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ko/test-a")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ko/test-a queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ko.test-a.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ko
Language: ko
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ko/test-a")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ko/test-a docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ko.test-a')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 1486752, "fields": { "doc_id": { "max_len": 12, "common_prefix": "" } } }, "queries": { "count": 263 } }
The held-out test set (version b) for Korean.
Language: ko
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ko/test-b")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ko/test-b queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ko.test-b.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ko
Language: ko
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ko/test-b")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ko/test-b docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ko.test-b')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 1486752, "fields": { "doc_id": { "max_len": 12, "common_prefix": "" } } }, "queries": { "count": 1417 } }
The train set for Korean.
Language: ko
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ko/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ko/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ko.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ko
Language: ko
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ko/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ko/train docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ko.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
0 | Not Relevant | 11K | 84.5% |
1 | Relevant | 2.0K | 15.5% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ko/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/ko/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.ko.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 1486752, "fields": { "doc_id": { "max_len": 12, "common_prefix": "" } } }, "queries": { "count": 868 }, "qrels": { "count": 12767, "fields": { "relevance": { "counts_by_value": { "1": 1973, "0": 10794 } } } } }
The Russian corpus.
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ru")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ru docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ru')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 9543918, "fields": { "doc_id": { "max_len": 12, "common_prefix": "" } } } }
The dev set for Russian.
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ru/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ru/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ru.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ru
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ru/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ru/dev docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ru.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
0 | Not Relevant | 9.5K | 72.8% |
1 | Relevant | 3.6K | 27.2% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ru/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/ru/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.ru.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 9543918, "fields": { "doc_id": { "max_len": 12, "common_prefix": "" } } }, "queries": { "count": 1252 }, "qrels": { "count": 13100, "fields": { "relevance": { "counts_by_value": { "1": 3560, "0": 9540 } } } } }
The held-out test set (version a) for Russian.
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ru/test-a")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ru/test-a queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ru.test-a.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ru
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ru/test-a")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ru/test-a docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ru.test-a')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 9543918, "fields": { "doc_id": { "max_len": 12, "common_prefix": "" } } }, "queries": { "count": 911 } }
The held-out test set (version b) for Russian.
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ru/test-b")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ru/test-b queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ru.test-b.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ru
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ru/test-b")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ru/test-b docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ru.test-b')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 9543918, "fields": { "doc_id": { "max_len": 12, "common_prefix": "" } } }, "queries": { "count": 718 } }
The train set for Russian.
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ru/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/ru/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.ru.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/ru
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ru/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/ru/train docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.ru.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
0 | Not Relevant | 24K | 70.5% |
1 | Relevant | 10K | 29.5% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/ru/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/ru/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.ru.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 9543918, "fields": { "doc_id": { "max_len": 12, "common_prefix": "" } } }, "queries": { "count": 4683 }, "qrels": { "count": 33921, "fields": { "relevance": { "counts_by_value": { "1": 10000, "0": 23921 } } } } }
The Swahili corpus.
Language: sw
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/sw")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/sw docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.sw')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 131924, "fields": { "doc_id": { "max_len": 9, "common_prefix": "" } } } }
The dev set for Swahili.
Language: sw
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/sw/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/sw/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.sw.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/sw
Language: sw
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/sw/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/sw/dev docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.sw.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
0 | Not Relevant | 4.2K | 82.1% |
1 | Relevant | 910 | 17.9% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/sw/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/sw/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.sw.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 131924, "fields": { "doc_id": { "max_len": 9, "common_prefix": "" } } }, "queries": { "count": 482 }, "qrels": { "count": 5092, "fields": { "relevance": { "counts_by_value": { "1": 910, "0": 4182 } } } } }
The held-out test set (version a) for Swahili.
Language: sw
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/sw/test-a")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/sw/test-a queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.sw.test-a.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/sw
Language: sw
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/sw/test-a")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/sw/test-a docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.sw.test-a')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 131924, "fields": { "doc_id": { "max_len": 9, "common_prefix": "" } } }, "queries": { "count": 638 } }
The held-out test set (version b) for Swahili.
Language: sw
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/sw/test-b")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/sw/test-b queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.sw.test-b.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/sw
Language: sw
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/sw/test-b")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/sw/test-b docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.sw.test-b')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 131924, "fields": { "doc_id": { "max_len": 9, "common_prefix": "" } } }, "queries": { "count": 465 } }
The train set for Swahili.
Language: sw
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/sw/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/sw/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.sw.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/sw
Language: sw
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/sw/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/sw/train docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.sw.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
0 | Not Relevant | 6.7K | 71.3% |
1 | Relevant | 2.7K | 28.7% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/sw/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/sw/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.sw.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 131924, "fields": { "doc_id": { "max_len": 9, "common_prefix": "" } } }, "queries": { "count": 1901 }, "qrels": { "count": 9359, "fields": { "relevance": { "counts_by_value": { "1": 2687, "0": 6672 } } } } }
The Telugu corpus.
Language: te
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/te")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/te docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.te')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 518079, "fields": { "doc_id": { "max_len": 10, "common_prefix": "" } } } }
The dev set for Telugu.
Language: te
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/te/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/te/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.te.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/te
Language: te
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/te/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/te/dev docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.te.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
0 | Not Relevant | 752 | 46.8% |
1 | Relevant | 854 | 53.2% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/te/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/te/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.te.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 518079, "fields": { "doc_id": { "max_len": 10, "common_prefix": "" } } }, "queries": { "count": 828 }, "qrels": { "count": 1606, "fields": { "relevance": { "counts_by_value": { "1": 854, "0": 752 } } } } }
The held-out test set (version a) for Telugu.
Language: te
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/te/test-a")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/te/test-a queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.te.test-a.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/te
Language: te
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/te/test-a")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/te/test-a docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.te.test-a')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 518079, "fields": { "doc_id": { "max_len": 10, "common_prefix": "" } } }, "queries": { "count": 594 } }
The held-out test set (version b) for Telugu.
Language: te
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/te/test-b")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/te/test-b queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.te.test-b.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/te
Language: te
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/te/test-b")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/te/test-b docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.te.test-b')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 518079, "fields": { "doc_id": { "max_len": 10, "common_prefix": "" } } }, "queries": { "count": 793 } }
The train set for Telugu.
Language: te
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/te/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/te/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.te.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/te
Language: te
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/te/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/te/train docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.te.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
0 | Not Relevant | 14K | 77.9% |
1 | Relevant | 4.1K | 22.1% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/te/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/te/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.te.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 518079, "fields": { "doc_id": { "max_len": 10, "common_prefix": "" } } }, "queries": { "count": 3452 }, "qrels": { "count": 18608, "fields": { "relevance": { "counts_by_value": { "1": 4119, "0": 14489 } } } } }
The Thai corpus.
Language: th
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/th")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/th docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.th')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 542166, "fields": { "doc_id": { "max_len": 10, "common_prefix": "" } } } }
The dev set for Thai.
Language: th
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/th/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/th/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.th.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/th
Language: th
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/th/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/th/dev docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.th.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
0 | Not Relevant | 6.2K | 82.3% |
1 | Relevant | 1.3K | 17.7% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/th/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/th/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.th.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 542166, "fields": { "doc_id": { "max_len": 10, "common_prefix": "" } } }, "queries": { "count": 733 }, "qrels": { "count": 7573, "fields": { "relevance": { "counts_by_value": { "1": 1343, "0": 6230 } } } } }
The held-out test set (version a) for Thai.
Language: th
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/th/test-a")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/th/test-a queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.th.test-a.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/th
Language: th
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/th/test-a")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/th/test-a docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.th.test-a')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 542166, "fields": { "doc_id": { "max_len": 10, "common_prefix": "" } } }, "queries": { "count": 992 } }
The held-out test set (version b) for Thai.
Language: th
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/th/test-b")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/th/test-b queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.th.test-b.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/th
Language: th
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/th/test-b")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/th/test-b docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.th.test-b')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 542166, "fields": { "doc_id": { "max_len": 10, "common_prefix": "" } } }, "queries": { "count": 650 } }
The train set for Thai.
Language: th
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/th/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/th/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.th.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/th
Language: th
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/th/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/th/train docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.th.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
0 | Not Relevant | 17K | 77.6% |
1 | Relevant | 4.8K | 22.4% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/th/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/th/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.th.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 542166, "fields": { "doc_id": { "max_len": 10, "common_prefix": "" } } }, "queries": { "count": 2972 }, "qrels": { "count": 21293, "fields": { "relevance": { "counts_by_value": { "1": 4778, "0": 16515 } } } } }
The Yoruba corpus.
Language: yo
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/yo")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/yo docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.yo')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 49043, "fields": { "doc_id": { "max_len": 9, "common_prefix": "" } } } }
The dev set for Yoruba.
Language: yo
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/yo/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/yo/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.yo.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/yo
Language: yo
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/yo/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/yo/dev docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.yo.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
0 | Not Relevant | 1.0K | 87.9% |
1 | Relevant | 144 | 12.1% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/yo/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/yo/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.yo.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 49043, "fields": { "doc_id": { "max_len": 9, "common_prefix": "" } } }, "queries": { "count": 119 }, "qrels": { "count": 1188, "fields": { "relevance": { "counts_by_value": { "1": 144, "0": 1044 } } } } }
The held-out test set (version b) for Yoruba.
Language: yo
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/yo/test-b")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/yo/test-b queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.yo.test-b.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/yo
Language: yo
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/yo/test-b")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/yo/test-b docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.yo.test-b')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 49043, "fields": { "doc_id": { "max_len": 9, "common_prefix": "" } } }, "queries": { "count": 288 } }
The Chinese corpus.
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/zh")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/zh docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.zh')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 4934368, "fields": { "doc_id": { "max_len": 12, "common_prefix": "" } } } }
The dev set for Chinese.
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/zh/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/zh/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.zh.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/zh
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/zh/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/zh/dev docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.zh.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
0 | Not Relevant | 2.9K | 74.7% |
1 | Relevant | 994 | 25.3% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/zh/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/zh/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.zh.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 4934368, "fields": { "doc_id": { "max_len": 12, "common_prefix": "" } } }, "queries": { "count": 393 }, "qrels": { "count": 3928, "fields": { "relevance": { "counts_by_value": { "1": 994, "0": 2934 } } } } }
The held-out test set (version b) for Chinese.
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/zh/test-b")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/zh/test-b queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.zh.test-b.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/zh
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/zh/test-b")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/zh/test-b docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.zh.test-b')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 4934368, "fields": { "doc_id": { "max_len": 12, "common_prefix": "" } } }, "queries": { "count": 920 } }
The train set for Chinese.
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/zh/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export miracl/zh/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.miracl.zh.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from miracl/zh
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/zh/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export miracl/zh/train docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.miracl.zh.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
0 | Not Relevant | 9.9K | 75.7% |
1 | Relevant | 3.2K | 24.3% |
Examples:
import ir_datasets
dataset = ir_datasets.load("miracl/zh/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export miracl/zh/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.miracl.zh.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2022Miracl, title={Making a MIRACL: Multilingual information retrieval across a continuum of languages}, author={Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy}, journal={arXiv preprint arXiv:2210.09984}, year={2022} }{ "docs": { "count": 4934368, "fields": { "doc_id": { "max_len": 12, "common_prefix": "" } } }, "queries": { "count": 1312 }, "qrels": { "count": 13113, "fields": { "relevance": { "counts_by_value": { "1": 3187, "0": 9926 } } } } }