ir_datasets
: Mr. TyDiA multi-lingual benchmark benchmark suite constructed from the TyDi QA Benchmark. Relevance labels are sparsely assigned based on shallow human annotation.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }Complete Arabic dataset, including all train, dev, and test queries and qrels.
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ar")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Development set for Arabic
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ar/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Test set for Arabic
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ar/test")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Train set for Arabic
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ar/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Complete Bengali dataset, including all train, dev, and test queries and qrels.
Language: bn
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/bn")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Development set for Bengali
Language: bn
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/bn/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Test set for Bengali
Language: bn
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/bn/test")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Train set for Bengali
Language: bn
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/bn/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Complete English dataset, including all train, dev, and test queries and qrels.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/en")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Development set for English
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/en/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Test set for English
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/en/test")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Train set for English
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/en/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Complete Finnish dataset, including all train, dev, and test queries and qrels.
Language: fi
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/fi")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Development set for Finnish
Language: fi
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/fi/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Test set for Finnish
Language: fi
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/fi/test")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Train set for Finnish
Language: fi
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/fi/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Complete Indonesian dataset, including all train, dev, and test queries and qrels.
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/id")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Development set for Indonesian
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/id/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Test set for Indonesian
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/id/test")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Train set for Indonesian
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/id/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Complete Japanese dataset, including all train, dev, and test queries and qrels.
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ja")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Development set for Japanese
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ja/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Test set for Japanese
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ja/test")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Train set for Japanese
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ja/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Complete Korean dataset, including all train, dev, and test queries and qrels.
Language: ko
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ko")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Development set for Korean
Language: ko
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ko/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Test set for Korean
Language: ko
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ko/test")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Train set for Korean
Language: ko
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ko/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Complete Russian dataset, including all train, dev, and test queries and qrels.
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ru")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Development set for Russian
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ru/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Test set for Russian
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ru/test")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Train set for Russian
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ru/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Complete Swahili dataset, including all train, dev, and test queries and qrels.
Language: sw
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/sw")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Development set for Swahili
Language: sw
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/sw/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Test set for Swahili
Language: sw
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/sw/test")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Train set for Swahili
Language: sw
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/sw/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Complete Telugu dataset, including all train, dev, and test queries and qrels.
Language: te
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/te")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Development set for Telugu
Language: te
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/te/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Test set for Telugu
Language: te
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/te/test")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Train set for Telugu
Language: te
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/te/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Complete Thai dataset, including all train, dev, and test queries and qrels.
Language: th
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/th")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Development set for Thai
Language: th
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/th/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Test set for Thai
Language: th
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/th/test")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Train set for Thai
Language: th
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/th/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.