← home
Github: datasets/mr_tydi.py

ir_datasets: Mr. TyDi

Index
  1. mr-tydi
  2. mr-tydi/ar
  3. mr-tydi/ar/dev
  4. mr-tydi/ar/test
  5. mr-tydi/ar/train
  6. mr-tydi/bn
  7. mr-tydi/bn/dev
  8. mr-tydi/bn/test
  9. mr-tydi/bn/train
  10. mr-tydi/en
  11. mr-tydi/en/dev
  12. mr-tydi/en/test
  13. mr-tydi/en/train
  14. mr-tydi/fi
  15. mr-tydi/fi/dev
  16. mr-tydi/fi/test
  17. mr-tydi/fi/train
  18. mr-tydi/id
  19. mr-tydi/id/dev
  20. mr-tydi/id/test
  21. mr-tydi/id/train
  22. mr-tydi/ja
  23. mr-tydi/ja/dev
  24. mr-tydi/ja/test
  25. mr-tydi/ja/train
  26. mr-tydi/ko
  27. mr-tydi/ko/dev
  28. mr-tydi/ko/test
  29. mr-tydi/ko/train
  30. mr-tydi/ru
  31. mr-tydi/ru/dev
  32. mr-tydi/ru/test
  33. mr-tydi/ru/train
  34. mr-tydi/sw
  35. mr-tydi/sw/dev
  36. mr-tydi/sw/test
  37. mr-tydi/sw/train
  38. mr-tydi/te
  39. mr-tydi/te/dev
  40. mr-tydi/te/test
  41. mr-tydi/te/train
  42. mr-tydi/th
  43. mr-tydi/th/dev
  44. mr-tydi/th/test
  45. mr-tydi/th/train

"mr-tydi"

A multi-lingual benchmark benchmark suite constructed from the TyDi QA Benchmark. Relevance labels are sparsely assigned based on shallow human annotation.

Citation

ir_datasets.bib:

\cite{Zhang2021MrTyDi,Clark2020TyDiQa}

Bibtex:

@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }

"mr-tydi/ar"

Complete Arabic dataset, including all train, dev, and test queries and qrels.

queriesdocsqrelsCitationMetadata
17K queries

Language: ar

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ar")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"mr-tydi/ar/dev"

Development set for Arabic

queriesdocsqrelsCitationMetadata
3.1K queries

Language: ar

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ar/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"mr-tydi/ar/test"

Test set for Arabic

queriesdocsqrelsCitationMetadata
1.1K queries

Language: ar

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ar/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"mr-tydi/ar/train"

Train set for Arabic

queriesdocsqrelsCitationMetadata
12K queries

Language: ar

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ar/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"mr-tydi/bn"

Complete Bengali dataset, including all train, dev, and test queries and qrels.

queriesdocsqrelsCitationMetadata
2.3K queries

Language: bn

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("mr-tydi/bn")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"mr-tydi/bn/dev"

Development set for Bengali

queriesdocsqrelsCitationMetadata
440 queries

Language: bn

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("mr-tydi/bn/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"mr-tydi/bn/test"

Test set for Bengali

queriesdocsqrelsCitationMetadata
111 queries

Language: bn

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("mr-tydi/bn/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"mr-tydi/bn/train"

Train set for Bengali

queriesdocsqrelsCitationMetadata
1.7K queries

Language: bn

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("mr-tydi/bn/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"mr-tydi/en"

Complete English dataset, including all train, dev, and test queries and qrels.

queriesdocsqrelsCitationMetadata
5.2K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("mr-tydi/en")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"mr-tydi/en/dev"

Development set for English

queriesdocsqrelsCitationMetadata
878 queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("mr-tydi/en/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"mr-tydi/en/test"

Test set for English

queriesdocsqrelsCitationMetadata
744 queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("mr-tydi/en/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"mr-tydi/en/train"

Train set for English

queriesdocsqrelsCitationMetadata
3.5K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("mr-tydi/en/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"mr-tydi/fi"

Complete Finnish dataset, including all train, dev, and test queries and qrels.

queriesdocsqrelsCitationMetadata
9.6K queries

Language: fi

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("mr-tydi/fi")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"mr-tydi/fi/dev"

Development set for Finnish

queriesdocsqrelsCitationMetadata
1.7K queries

Language: fi

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("mr-tydi/fi/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"mr-tydi/fi/test"

Test set for Finnish

queriesdocsqrelsCitationMetadata
1.3K queries

Language: fi

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("mr-tydi/fi/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"mr-tydi/fi/train"

Train set for Finnish

queriesdocsqrelsCitationMetadata
6.6K queries

Language: fi

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("mr-tydi/fi/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"mr-tydi/id"

Complete Indonesian dataset, including all train, dev, and test queries and qrels.

queriesdocsqrelsCitationMetadata
7.0K queries

Language: id

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("mr-tydi/id")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"mr-tydi/id/dev"

Development set for Indonesian

queriesdocsqrelsCitationMetadata
1.2K queries

Language: id

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("mr-tydi/id/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"mr-tydi/id/test"

Test set for Indonesian

queriesdocsqrelsCitationMetadata
829 queries

Language: id

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("mr-tydi/id/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"mr-tydi/id/train"

Train set for Indonesian

queriesdocsqrelsCitationMetadata
4.9K queries

Language: id

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("mr-tydi/id/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"mr-tydi/ja"

Complete Japanese dataset, including all train, dev, and test queries and qrels.

queriesdocsqrelsCitationMetadata
5.4K queries

Language: ja

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ja")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"mr-tydi/ja/dev"

Development set for Japanese

queriesdocsqrelsCitationMetadata
928 queries

Language: ja

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ja/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"mr-tydi/ja/test"

Test set for Japanese

queriesdocsqrelsCitationMetadata
720 queries

Language: ja

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ja/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"mr-tydi/ja/train"

Train set for Japanese

queriesdocsqrelsCitationMetadata
3.7K queries

Language: ja

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ja/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"mr-tydi/ko"

Complete Korean dataset, including all train, dev, and test queries and qrels.

queriesdocsqrelsCitationMetadata
2.0K queries

Language: ko

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ko")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"mr-tydi/ko/dev"

Development set for Korean

queriesdocsqrelsCitationMetadata
303 queries

Language: ko

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ko/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"mr-tydi/ko/test"

Test set for Korean

queriesdocsqrelsCitationMetadata
421 queries

Language: ko

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ko/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"mr-tydi/ko/train"

Train set for Korean

queriesdocsqrelsCitationMetadata
1.3K queries

Language: ko

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ko/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"mr-tydi/ru"

Complete Russian dataset, including all train, dev, and test queries and qrels.

queriesdocsqrelsCitationMetadata
7.8K queries

Language: ru

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ru")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"mr-tydi/ru/dev"

Development set for Russian

queriesdocsqrelsCitationMetadata
1.4K queries

Language: ru

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ru/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"mr-tydi/ru/test"

Test set for Russian

queriesdocsqrelsCitationMetadata
995 queries

Language: ru

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ru/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"mr-tydi/ru/train"

Train set for Russian

queriesdocsqrelsCitationMetadata
5.4K queries

Language: ru

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ru/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"mr-tydi/sw"

Complete Swahili dataset, including all train, dev, and test queries and qrels.

queriesdocsqrelsCitationMetadata
3.3K queries

Language: sw

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("mr-tydi/sw")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"mr-tydi/sw/dev"

Development set for Swahili

queriesdocsqrelsCitationMetadata
526 queries

Language: sw

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("mr-tydi/sw/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"mr-tydi/sw/test"

Test set for Swahili

queriesdocsqrelsCitationMetadata
670 queries

Language: sw

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("mr-tydi/sw/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"mr-tydi/sw/train"

Train set for Swahili

queriesdocsqrelsCitationMetadata
2.1K queries

Language: sw

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("mr-tydi/sw/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"mr-tydi/te"

Complete Telugu dataset, including all train, dev, and test queries and qrels.

queriesdocsqrelsCitationMetadata
5.5K queries

Language: te

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("mr-tydi/te")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"mr-tydi/te/dev"

Development set for Telugu

queriesdocsqrelsCitationMetadata
983 queries

Language: te

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("mr-tydi/te/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"mr-tydi/te/test"

Test set for Telugu

queriesdocsqrelsCitationMetadata
646 queries

Language: te

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("mr-tydi/te/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"mr-tydi/te/train"

Train set for Telugu

queriesdocsqrelsCitationMetadata
3.9K queries

Language: te

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("mr-tydi/te/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"mr-tydi/th"

Complete Thai dataset, including all train, dev, and test queries and qrels.

queriesdocsqrelsCitationMetadata
5.3K queries

Language: th

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("mr-tydi/th")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"mr-tydi/th/dev"

Development set for Thai

queriesdocsqrelsCitationMetadata
807 queries

Language: th

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("mr-tydi/th/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"mr-tydi/th/test"

Test set for Thai

queriesdocsqrelsCitationMetadata
1.2K queries

Language: th

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("mr-tydi/th/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"mr-tydi/th/train"

Train set for Thai

queriesdocsqrelsCitationMetadata
3.3K queries

Language: th

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("mr-tydi/th/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.