← home
Github: datasets/mr_tydi.py

ir_datasets: Mr. TyDi

Index
  1. mr-tydi
  2. mr-tydi/ar
  3. mr-tydi/ar/dev
  4. mr-tydi/ar/test
  5. mr-tydi/ar/train
  6. mr-tydi/bn
  7. mr-tydi/bn/dev
  8. mr-tydi/bn/test
  9. mr-tydi/bn/train
  10. mr-tydi/en
  11. mr-tydi/en/dev
  12. mr-tydi/en/test
  13. mr-tydi/en/train
  14. mr-tydi/fi
  15. mr-tydi/fi/dev
  16. mr-tydi/fi/test
  17. mr-tydi/fi/train
  18. mr-tydi/id
  19. mr-tydi/id/dev
  20. mr-tydi/id/test
  21. mr-tydi/id/train
  22. mr-tydi/ja
  23. mr-tydi/ja/dev
  24. mr-tydi/ja/test
  25. mr-tydi/ja/train
  26. mr-tydi/ko
  27. mr-tydi/ko/dev
  28. mr-tydi/ko/test
  29. mr-tydi/ko/train
  30. mr-tydi/ru
  31. mr-tydi/ru/dev
  32. mr-tydi/ru/test
  33. mr-tydi/ru/train
  34. mr-tydi/sw
  35. mr-tydi/sw/dev
  36. mr-tydi/sw/test
  37. mr-tydi/sw/train
  38. mr-tydi/te
  39. mr-tydi/te/dev
  40. mr-tydi/te/test
  41. mr-tydi/te/train
  42. mr-tydi/th
  43. mr-tydi/th/dev
  44. mr-tydi/th/test
  45. mr-tydi/th/train

"mr-tydi"

A multi-lingual benchmark benchmark suite constructed from the TyDi QA Benchmark. Relevance labels are sparsely assigned based on shallow human annotation.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }

"mr-tydi/ar"

Complete Arabic dataset, including all train, dev, and test queries and qrels.

queries
17K queries

Language: ar

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ar")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ar queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.ar.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
2.1M docs

Language: ar

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ar")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ar docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.ar')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
17K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Passage identified within Wikipedia article from top Google search results17K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ar")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ar qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.ar.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }
Metadata

"mr-tydi/ar/dev"

Development set for Arabic

queries
3.1K queries

Language: ar

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ar/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ar/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.ar.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
2.1M docs

Inherits docs from mr-tydi/ar

Language: ar

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ar/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ar/dev docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.ar.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
3.1K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Passage identified within Wikipedia article from top Google search results3.1K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ar/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ar/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.ar.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }
Metadata

"mr-tydi/ar/test"

Test set for Arabic

queries
1.1K queries

Language: ar

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ar/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ar/test queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.ar.test.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
2.1M docs

Inherits docs from mr-tydi/ar

Language: ar

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ar/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ar/test docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.ar.test')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
1.3K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Passage identified within Wikipedia article from top Google search results1.3K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ar/test")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ar/test qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.ar.test.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }
Metadata

"mr-tydi/ar/train"

Train set for Arabic

queries
12K queries

Language: ar

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ar/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ar/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.ar.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
2.1M docs

Inherits docs from mr-tydi/ar

Language: ar

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ar/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ar/train docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.ar.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
12K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Passage identified within Wikipedia article from top Google search results12K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ar/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ar/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.ar.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }
Metadata

"mr-tydi/bn"

Complete Bengali dataset, including all train, dev, and test queries and qrels.

queries
2.3K queries

Language: bn

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/bn")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/bn queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.bn.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
304K docs

Language: bn

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/bn")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/bn docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.bn')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
2.3K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Passage identified within Wikipedia article from top Google search results2.3K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/bn")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/bn qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.bn.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }
Metadata

"mr-tydi/bn/dev"

Development set for Bengali

queries
440 queries

Language: bn

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/bn/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/bn/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.bn.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
304K docs

Inherits docs from mr-tydi/bn

Language: bn

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/bn/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/bn/dev docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.bn.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
443 qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Passage identified within Wikipedia article from top Google search results443 100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/bn/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/bn/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.bn.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }
Metadata

"mr-tydi/bn/test"

Test set for Bengali

queries
111 queries

Language: bn

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/bn/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/bn/test queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.bn.test.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
304K docs

Inherits docs from mr-tydi/bn

Language: bn

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/bn/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/bn/test docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.bn.test')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
130 qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Passage identified within Wikipedia article from top Google search results130 100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/bn/test")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/bn/test qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.bn.test.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }
Metadata

"mr-tydi/bn/train"

Train set for Bengali

queries
1.7K queries

Language: bn

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/bn/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/bn/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.bn.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
304K docs

Inherits docs from mr-tydi/bn

Language: bn

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/bn/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/bn/train docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.bn.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
1.7K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Passage identified within Wikipedia article from top Google search results1.7K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/bn/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/bn/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.bn.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }
Metadata

"mr-tydi/en"

Complete English dataset, including all train, dev, and test queries and qrels.

queries
5.2K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/en")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/en queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:mr-tydi/en')
index_ref = pt.IndexRef.of('./indices/mr-tydi_en') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.en.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
33M docs

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/en")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/en docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:mr-tydi/en')
# Index mr-tydi/en
indexer = pt.IterDictIndexer('./indices/mr-tydi_en')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.en')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
5.4K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Passage identified within Wikipedia article from top Google search results5.4K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/en")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/en qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:mr-tydi/en')
index_ref = pt.IndexRef.of('./indices/mr-tydi_en') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)

You can find more details about PyTerrier experiments here.

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.en.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }
Metadata

"mr-tydi/en/dev"

Development set for English

queries
878 queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/en/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/en/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:mr-tydi/en/dev')
index_ref = pt.IndexRef.of('./indices/mr-tydi_en') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.en.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
33M docs

Inherits docs from mr-tydi/en

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/en/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/en/dev docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:mr-tydi/en/dev')
# Index mr-tydi/en
indexer = pt.IterDictIndexer('./indices/mr-tydi_en')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.en.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
878 qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Passage identified within Wikipedia article from top Google search results878 100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/en/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/en/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:mr-tydi/en/dev')
index_ref = pt.IndexRef.of('./indices/mr-tydi_en') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)

You can find more details about PyTerrier experiments here.

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.en.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }
Metadata

"mr-tydi/en/test"

Test set for English

queries
744 queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/en/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/en/test queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:mr-tydi/en/test')
index_ref = pt.IndexRef.of('./indices/mr-tydi_en') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.en.test.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
33M docs

Inherits docs from mr-tydi/en

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/en/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/en/test docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:mr-tydi/en/test')
# Index mr-tydi/en
indexer = pt.IterDictIndexer('./indices/mr-tydi_en')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.en.test')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
935 qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Passage identified within Wikipedia article from top Google search results935 100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/en/test")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/en/test qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:mr-tydi/en/test')
index_ref = pt.IndexRef.of('./indices/mr-tydi_en') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)

You can find more details about PyTerrier experiments here.

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.en.test.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }
Metadata

"mr-tydi/en/train"

Train set for English

queries
3.5K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/en/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/en/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:mr-tydi/en/train')
index_ref = pt.IndexRef.of('./indices/mr-tydi_en') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.en.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
33M docs

Inherits docs from mr-tydi/en

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/en/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/en/train docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:mr-tydi/en/train')
# Index mr-tydi/en
indexer = pt.IterDictIndexer('./indices/mr-tydi_en')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.en.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
3.5K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Passage identified within Wikipedia article from top Google search results3.5K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/en/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/en/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:mr-tydi/en/train')
index_ref = pt.IndexRef.of('./indices/mr-tydi_en') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)

You can find more details about PyTerrier experiments here.

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.en.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }
Metadata

"mr-tydi/fi"

Complete Finnish dataset, including all train, dev, and test queries and qrels.

queries
9.6K queries

Language: fi

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/fi")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/fi queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.fi.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
1.9M docs

Language: fi

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/fi")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/fi docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.fi')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
9.8K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Passage identified within Wikipedia article from top Google search results9.8K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/fi")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/fi qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.fi.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }
Metadata

"mr-tydi/fi/dev"

Development set for Finnish

queries
1.7K queries

Language: fi

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/fi/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/fi/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.fi.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
1.9M docs

Inherits docs from mr-tydi/fi

Language: fi

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/fi/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/fi/dev docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.fi.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
1.7K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Passage identified within Wikipedia article from top Google search results1.7K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/fi/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/fi/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.fi.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }
Metadata

"mr-tydi/fi/test"

Test set for Finnish

queries
1.3K queries

Language: fi

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/fi/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/fi/test queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.fi.test.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
1.9M docs

Inherits docs from mr-tydi/fi

Language: fi

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/fi/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/fi/test docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.fi.test')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
1.5K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Passage identified within Wikipedia article from top Google search results1.5K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/fi/test")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/fi/test qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.fi.test.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }
Metadata

"mr-tydi/fi/train"

Train set for Finnish

queries
6.6K queries

Language: fi

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/fi/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/fi/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.fi.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
1.9M docs

Inherits docs from mr-tydi/fi

Language: fi

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/fi/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/fi/train docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.fi.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
6.6K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Passage identified within Wikipedia article from top Google search results6.6K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/fi/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/fi/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.fi.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }
Metadata

"mr-tydi/id"

Complete Indonesian dataset, including all train, dev, and test queries and qrels.

queries
7.0K queries

Language: id

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/id")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/id queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.id.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
1.5M docs

Language: id

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/id")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/id docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.id')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
7.1K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Passage identified within Wikipedia article from top Google search results7.1K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/id")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/id qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.id.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }
Metadata

"mr-tydi/id/dev"

Development set for Indonesian

queries
1.2K queries

Language: id

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/id/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/id/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.id.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
1.5M docs

Inherits docs from mr-tydi/id

Language: id

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/id/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/id/dev docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.id.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
1.2K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Passage identified within Wikipedia article from top Google search results1.2K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/id/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/id/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.id.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }
Metadata

"mr-tydi/id/test"

Test set for Indonesian

queries
829 queries

Language: id

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/id/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/id/test queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.id.test.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
1.5M docs

Inherits docs from mr-tydi/id

Language: id

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/id/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/id/test docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.id.test')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
961 qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Passage identified within Wikipedia article from top Google search results961 100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/id/test")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/id/test qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.id.test.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }
Metadata

"mr-tydi/id/train"

Train set for Indonesian

queries
4.9K queries

Language: id

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/id/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/id/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.id.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
1.5M docs

Inherits docs from mr-tydi/id

Language: id

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/id/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/id/train docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.id.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
4.9K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Passage identified within Wikipedia article from top Google search results4.9K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/id/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/id/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.id.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }
Metadata

"mr-tydi/ja"

Complete Japanese dataset, including all train, dev, and test queries and qrels.

queries
5.4K queries

Language: ja

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ja")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ja queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.ja.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
7.0M docs

Language: ja

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ja")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ja docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.ja')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
5.5K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Passage identified within Wikipedia article from top Google search results5.5K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ja")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ja qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.ja.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }
Metadata

"mr-tydi/ja/dev"

Development set for Japanese

queries
928 queries

Language: ja

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ja/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ja/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.ja.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
7.0M docs

Inherits docs from mr-tydi/ja

Language: ja

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ja/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ja/dev docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.ja.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
928 qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Passage identified within Wikipedia article from top Google search results928 100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ja/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ja/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.ja.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }
Metadata

"mr-tydi/ja/test"

Test set for Japanese

queries
720 queries

Language: ja

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ja/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ja/test queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.ja.test.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
7.0M docs

Inherits docs from mr-tydi/ja

Language: ja

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ja/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ja/test docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.ja.test')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
923 qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Passage identified within Wikipedia article from top Google search results923 100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ja/test")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ja/test qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.ja.test.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }
Metadata

"mr-tydi/ja/train"

Train set for Japanese

queries
3.7K queries

Language: ja

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ja/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ja/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.ja.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
7.0M docs

Inherits docs from mr-tydi/ja

Language: ja

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ja/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ja/train docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.ja.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
3.7K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Passage identified within Wikipedia article from top Google search results3.7K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ja/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ja/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.ja.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }
Metadata

"mr-tydi/ko"

Complete Korean dataset, including all train, dev, and test queries and qrels.

queries
2.0K queries

Language: ko

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ko")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ko queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.ko.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
1.5M docs

Language: ko

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ko")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ko docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.ko')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
2.1K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Passage identified within Wikipedia article from top Google search results2.1K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ko")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ko qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.ko.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }
Metadata

"mr-tydi/ko/dev"

Development set for Korean

queries
303 queries

Language: ko

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ko/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ko/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.ko.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
1.5M docs

Inherits docs from mr-tydi/ko

Language: ko

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ko/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ko/dev docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.ko.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
307 qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Passage identified within Wikipedia article from top Google search results307 100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ko/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ko/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.ko.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }
Metadata

"mr-tydi/ko/test"

Test set for Korean

queries
421 queries

Language: ko

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ko/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ko/test queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.ko.test.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
1.5M docs

Inherits docs from mr-tydi/ko

Language: ko

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ko/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ko/test docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.ko.test')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
492 qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Passage identified within Wikipedia article from top Google search results492 100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ko/test")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ko/test qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.ko.test.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }
Metadata

"mr-tydi/ko/train"

Train set for Korean

queries
1.3K queries

Language: ko

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ko/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ko/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.ko.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
1.5M docs

Inherits docs from mr-tydi/ko

Language: ko

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ko/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ko/train docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.ko.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
1.3K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Passage identified within Wikipedia article from top Google search results1.3K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ko/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ko/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.ko.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }
Metadata

"mr-tydi/ru"

Complete Russian dataset, including all train, dev, and test queries and qrels.

queries
7.8K queries

Language: ru

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ru")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ru queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.ru.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
9.6M docs

Language: ru

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ru")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ru docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.ru')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
7.9K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Passage identified within Wikipedia article from top Google search results7.9K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ru")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ru qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.ru.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }
Metadata

"mr-tydi/ru/dev"

Development set for Russian

queries
1.4K queries

Language: ru

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ru/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ru/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.ru.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
9.6M docs

Inherits docs from mr-tydi/ru

Language: ru

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ru/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ru/dev docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.ru.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
1.4K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Passage identified within Wikipedia article from top Google search results1.4K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ru/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ru/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.ru.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }
Metadata

"mr-tydi/ru/test"

Test set for Russian

queries
995 queries

Language: ru

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ru/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ru/test queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.ru.test.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
9.6M docs

Inherits docs from mr-tydi/ru

Language: ru

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ru/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ru/test docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.ru.test')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
1.2K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Passage identified within Wikipedia article from top Google search results1.2K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ru/test")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ru/test qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.ru.test.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }
Metadata

"mr-tydi/ru/train"

Train set for Russian

queries
5.4K queries

Language: ru

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ru/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ru/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.ru.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
9.6M docs

Inherits docs from mr-tydi/ru

Language: ru

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ru/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ru/train docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.ru.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
5.4K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Passage identified within Wikipedia article from top Google search results5.4K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ru/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/ru/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.ru.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }
Metadata

"mr-tydi/sw"

Complete Swahili dataset, including all train, dev, and test queries and qrels.

queries
3.3K queries

Language: sw

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/sw")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/sw queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.sw.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
137K docs

Language: sw

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/sw")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/sw docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.sw')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
3.8K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Passage identified within Wikipedia article from top Google search results3.8K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/sw")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/sw qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.sw.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }
Metadata

"mr-tydi/sw/dev"

Development set for Swahili

queries
526 queries

Language: sw

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/sw/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/sw/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.sw.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
137K docs

Inherits docs from mr-tydi/sw

Language: sw

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/sw/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/sw/dev docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.sw.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
623 qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Passage identified within Wikipedia article from top Google search results623 100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/sw/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/sw/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.sw.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }
Metadata

"mr-tydi/sw/test"

Test set for Swahili

queries
670 queries

Language: sw

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/sw/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/sw/test queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.sw.test.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
137K docs

Inherits docs from mr-tydi/sw

Language: sw

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/sw/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/sw/test docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.sw.test')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
743 qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Passage identified within Wikipedia article from top Google search results743 100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/sw/test")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/sw/test qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.sw.test.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }
Metadata

"mr-tydi/sw/train"

Train set for Swahili

queries
2.1K queries

Language: sw

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/sw/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/sw/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.sw.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
137K docs

Inherits docs from mr-tydi/sw

Language: sw

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/sw/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/sw/train docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.sw.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
2.4K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Passage identified within Wikipedia article from top Google search results2.4K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/sw/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/sw/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.sw.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }
Metadata

"mr-tydi/te"

Complete Telugu dataset, including all train, dev, and test queries and qrels.

queries
5.5K queries

Language: te

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/te")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/te queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.te.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
548K docs

Language: te

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/te")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/te docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.te')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
5.5K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Passage identified within Wikipedia article from top Google search results5.5K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/te")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/te qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.te.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }
Metadata

"mr-tydi/te/dev"

Development set for Telugu

queries
983 queries

Language: te

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/te/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/te/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.te.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
548K docs

Inherits docs from mr-tydi/te

Language: te

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/te/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/te/dev docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.te.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
983 qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Passage identified within Wikipedia article from top Google search results983 100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/te/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/te/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.te.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }
Metadata

"mr-tydi/te/test"

Test set for Telugu

queries
646 queries

Language: te

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/te/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/te/test queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.te.test.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
548K docs

Inherits docs from mr-tydi/te

Language: te

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/te/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/te/test docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.te.test')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
677 qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Passage identified within Wikipedia article from top Google search results677 100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/te/test")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/te/test qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.te.test.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }
Metadata

"mr-tydi/te/train"

Train set for Telugu

queries
3.9K queries

Language: te

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/te/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/te/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.te.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
548K docs

Inherits docs from mr-tydi/te

Language: te

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/te/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/te/train docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.te.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
3.9K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Passage identified within Wikipedia article from top Google search results3.9K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/te/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/te/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.te.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }
Metadata

"mr-tydi/th"

Complete Thai dataset, including all train, dev, and test queries and qrels.

queries
5.3K queries

Language: th

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/th")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/th queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.th.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
569K docs

Language: th

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/th")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/th docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.th')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
5.5K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Passage identified within Wikipedia article from top Google search results5.5K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/th")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/th qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.th.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }
Metadata

"mr-tydi/th/dev"

Development set for Thai

queries
807 queries

Language: th

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/th/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/th/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.th.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
569K docs

Inherits docs from mr-tydi/th

Language: th

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/th/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/th/dev docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.th.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
817 qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Passage identified within Wikipedia article from top Google search results817 100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/th/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/th/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.th.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }
Metadata

"mr-tydi/th/test"

Test set for Thai

queries
1.2K queries

Language: th

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/th/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/th/test queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.th.test.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
569K docs

Inherits docs from mr-tydi/th

Language: th

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/th/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/th/test docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.th.test')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
1.4K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Passage identified within Wikipedia article from top Google search results1.4K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/th/test")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/th/test qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.th.test.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }
Metadata

"mr-tydi/th/train"

Train set for Thai

queries
3.3K queries

Language: th

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/th/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/th/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.th.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs
569K docs

Inherits docs from mr-tydi/th

Language: th

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/th/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/th/train docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.th.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels
3.4K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Passage identified within Wikipedia article from top Google search results3.4K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("mr-tydi/th/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export mr-tydi/th/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

XPM-IR
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.th.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }
Metadata