ir_datasets: Mr. TyDiA multi-lingual benchmark benchmark suite constructed from the TyDi QA Benchmark. Relevance labels are sparsely assigned based on shallow human annotation.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }Complete Arabic dataset, including all train, dev, and test queries and qrels.
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ar")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ar queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.ar.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ar")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ar docs
[doc_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.ar')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Passage identified within Wikipedia article from top Google search results | 17K | 100.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ar")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ar qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.ar.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }{
  "docs": {
    "count": 2106586,
    "fields": {
      "doc_id": {
        "max_len": 11,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 16595
  },
  "qrels": {
    "count": 16749,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 16749
        }
      }
    }
  }
}
Development set for Arabic
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ar/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ar/dev queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.ar.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mr-tydi/ar
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ar/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ar/dev docs
[doc_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.ar.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Passage identified within Wikipedia article from top Google search results | 3.1K | 100.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ar/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ar/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.ar.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }{
  "docs": {
    "count": 2106586,
    "fields": {
      "doc_id": {
        "max_len": 11,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 3115
  },
  "qrels": {
    "count": 3115,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 3115
        }
      }
    }
  }
}
Test set for Arabic
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ar/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ar/test queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.ar.test.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mr-tydi/ar
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ar/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ar/test docs
[doc_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.ar.test')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Passage identified within Wikipedia article from top Google search results | 1.3K | 100.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ar/test")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ar/test qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.ar.test.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }{
  "docs": {
    "count": 2106586,
    "fields": {
      "doc_id": {
        "max_len": 11,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 1081
  },
  "qrels": {
    "count": 1257,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 1257
        }
      }
    }
  }
}
Train set for Arabic
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ar/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ar/train queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.ar.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mr-tydi/ar
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ar/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ar/train docs
[doc_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.ar.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Passage identified within Wikipedia article from top Google search results | 12K | 100.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ar/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ar/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.ar.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }{
  "docs": {
    "count": 2106586,
    "fields": {
      "doc_id": {
        "max_len": 11,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 12377
  },
  "qrels": {
    "count": 12377,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 12377
        }
      }
    }
  }
}
Complete Bengali dataset, including all train, dev, and test queries and qrels.
Language: bn
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/bn")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/bn queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.bn.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: bn
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/bn")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/bn docs
[doc_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.bn')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Passage identified within Wikipedia article from top Google search results | 2.3K | 100.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/bn")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mr-tydi/bn qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.bn.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }{
  "docs": {
    "count": 304059,
    "fields": {
      "doc_id": {
        "max_len": 10,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 2264
  },
  "qrels": {
    "count": 2292,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 2292
        }
      }
    }
  }
}
Development set for Bengali
Language: bn
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/bn/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/bn/dev queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.bn.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mr-tydi/bn
Language: bn
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/bn/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/bn/dev docs
[doc_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.bn.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Passage identified within Wikipedia article from top Google search results | 443 | 100.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/bn/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mr-tydi/bn/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.bn.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }{
  "docs": {
    "count": 304059,
    "fields": {
      "doc_id": {
        "max_len": 10,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 440
  },
  "qrels": {
    "count": 443,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 443
        }
      }
    }
  }
}
Test set for Bengali
Language: bn
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/bn/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/bn/test queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.bn.test.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mr-tydi/bn
Language: bn
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/bn/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/bn/test docs
[doc_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.bn.test')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Passage identified within Wikipedia article from top Google search results | 130 | 100.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/bn/test")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mr-tydi/bn/test qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.bn.test.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }{
  "docs": {
    "count": 304059,
    "fields": {
      "doc_id": {
        "max_len": 10,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 111
  },
  "qrels": {
    "count": 130,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 130
        }
      }
    }
  }
}
Train set for Bengali
Language: bn
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/bn/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/bn/train queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.bn.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mr-tydi/bn
Language: bn
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/bn/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/bn/train docs
[doc_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.bn.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Passage identified within Wikipedia article from top Google search results | 1.7K | 100.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/bn/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mr-tydi/bn/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.bn.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }{
  "docs": {
    "count": 304059,
    "fields": {
      "doc_id": {
        "max_len": 10,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 1713
  },
  "qrels": {
    "count": 1719,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 1719
        }
      }
    }
  }
}
Complete English dataset, including all train, dev, and test queries and qrels.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/en")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/en queries
[query_id]    [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:mr-tydi/en')
index_ref = pt.IndexRef.of('./indices/mr-tydi_en') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.en.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/en")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/en docs
[doc_id]    [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:mr-tydi/en')
# Index mr-tydi/en
indexer = pt.IterDictIndexer('./indices/mr-tydi_en')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.en')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Passage identified within Wikipedia article from top Google search results | 5.4K | 100.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/en")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mr-tydi/en qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:mr-tydi/en')
index_ref = pt.IndexRef.of('./indices/mr-tydi_en') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.en.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }{
  "docs": {
    "count": 32907100,
    "fields": {
      "doc_id": {
        "max_len": 13,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 5194
  },
  "qrels": {
    "count": 5360,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 5360
        }
      }
    }
  }
}
Development set for English
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/en/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/en/dev queries
[query_id]    [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:mr-tydi/en/dev')
index_ref = pt.IndexRef.of('./indices/mr-tydi_en') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.en.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mr-tydi/en
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/en/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/en/dev docs
[doc_id]    [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:mr-tydi/en/dev')
# Index mr-tydi/en
indexer = pt.IterDictIndexer('./indices/mr-tydi_en')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.en.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Passage identified within Wikipedia article from top Google search results | 878 | 100.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/en/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mr-tydi/en/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:mr-tydi/en/dev')
index_ref = pt.IndexRef.of('./indices/mr-tydi_en') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.en.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }{
  "docs": {
    "count": 32907100,
    "fields": {
      "doc_id": {
        "max_len": 13,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 878
  },
  "qrels": {
    "count": 878,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 878
        }
      }
    }
  }
}
Test set for English
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/en/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/en/test queries
[query_id]    [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:mr-tydi/en/test')
index_ref = pt.IndexRef.of('./indices/mr-tydi_en') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.en.test.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mr-tydi/en
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/en/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/en/test docs
[doc_id]    [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:mr-tydi/en/test')
# Index mr-tydi/en
indexer = pt.IterDictIndexer('./indices/mr-tydi_en')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.en.test')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Passage identified within Wikipedia article from top Google search results | 935 | 100.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/en/test")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mr-tydi/en/test qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:mr-tydi/en/test')
index_ref = pt.IndexRef.of('./indices/mr-tydi_en') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.en.test.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }{
  "docs": {
    "count": 32907100,
    "fields": {
      "doc_id": {
        "max_len": 13,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 744
  },
  "qrels": {
    "count": 935,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 935
        }
      }
    }
  }
}
Train set for English
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/en/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/en/train queries
[query_id]    [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:mr-tydi/en/train')
index_ref = pt.IndexRef.of('./indices/mr-tydi_en') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.en.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mr-tydi/en
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/en/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/en/train docs
[doc_id]    [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:mr-tydi/en/train')
# Index mr-tydi/en
indexer = pt.IterDictIndexer('./indices/mr-tydi_en')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.en.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Passage identified within Wikipedia article from top Google search results | 3.5K | 100.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/en/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mr-tydi/en/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:mr-tydi/en/train')
index_ref = pt.IndexRef.of('./indices/mr-tydi_en') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.en.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }{
  "docs": {
    "count": 32907100,
    "fields": {
      "doc_id": {
        "max_len": 13,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 3547
  },
  "qrels": {
    "count": 3547,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 3547
        }
      }
    }
  }
}
Complete Finnish dataset, including all train, dev, and test queries and qrels.
Language: fi
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/fi")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/fi queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.fi.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: fi
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/fi")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/fi docs
[doc_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.fi')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Passage identified within Wikipedia article from top Google search results | 9.8K | 100.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/fi")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mr-tydi/fi qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.fi.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }{
  "docs": {
    "count": 1908757,
    "fields": {
      "doc_id": {
        "max_len": 11,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 9572
  },
  "qrels": {
    "count": 9750,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 9750
        }
      }
    }
  }
}
Development set for Finnish
Language: fi
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/fi/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/fi/dev queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.fi.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mr-tydi/fi
Language: fi
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/fi/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/fi/dev docs
[doc_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.fi.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Passage identified within Wikipedia article from top Google search results | 1.7K | 100.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/fi/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mr-tydi/fi/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.fi.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }{
  "docs": {
    "count": 1908757,
    "fields": {
      "doc_id": {
        "max_len": 11,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 1738
  },
  "qrels": {
    "count": 1738,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 1738
        }
      }
    }
  }
}
Test set for Finnish
Language: fi
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/fi/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/fi/test queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.fi.test.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mr-tydi/fi
Language: fi
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/fi/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/fi/test docs
[doc_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.fi.test')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Passage identified within Wikipedia article from top Google search results | 1.5K | 100.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/fi/test")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mr-tydi/fi/test qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.fi.test.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }{
  "docs": {
    "count": 1908757,
    "fields": {
      "doc_id": {
        "max_len": 11,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 1254
  },
  "qrels": {
    "count": 1451,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 1451
        }
      }
    }
  }
}
Train set for Finnish
Language: fi
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/fi/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/fi/train queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.fi.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mr-tydi/fi
Language: fi
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/fi/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/fi/train docs
[doc_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.fi.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Passage identified within Wikipedia article from top Google search results | 6.6K | 100.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/fi/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mr-tydi/fi/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.fi.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }{
  "docs": {
    "count": 1908757,
    "fields": {
      "doc_id": {
        "max_len": 11,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 6561
  },
  "qrels": {
    "count": 6561,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 6561
        }
      }
    }
  }
}
Complete Indonesian dataset, including all train, dev, and test queries and qrels.
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/id")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/id queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.id.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/id")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/id docs
[doc_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.id')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Passage identified within Wikipedia article from top Google search results | 7.1K | 100.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/id")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mr-tydi/id qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.id.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }{
  "docs": {
    "count": 1469399,
    "fields": {
      "doc_id": {
        "max_len": 11,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 6977
  },
  "qrels": {
    "count": 7087,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 7087
        }
      }
    }
  }
}
Development set for Indonesian
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/id/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/id/dev queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.id.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mr-tydi/id
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/id/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/id/dev docs
[doc_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.id.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Passage identified within Wikipedia article from top Google search results | 1.2K | 100.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/id/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mr-tydi/id/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.id.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }{
  "docs": {
    "count": 1469399,
    "fields": {
      "doc_id": {
        "max_len": 11,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 1224
  },
  "qrels": {
    "count": 1224,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 1224
        }
      }
    }
  }
}
Test set for Indonesian
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/id/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/id/test queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.id.test.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mr-tydi/id
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/id/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/id/test docs
[doc_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.id.test')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Passage identified within Wikipedia article from top Google search results | 961 | 100.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/id/test")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mr-tydi/id/test qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.id.test.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }{
  "docs": {
    "count": 1469399,
    "fields": {
      "doc_id": {
        "max_len": 11,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 829
  },
  "qrels": {
    "count": 961,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 961
        }
      }
    }
  }
}
Train set for Indonesian
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/id/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/id/train queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.id.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mr-tydi/id
Language: id
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/id/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/id/train docs
[doc_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.id.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Passage identified within Wikipedia article from top Google search results | 4.9K | 100.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/id/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mr-tydi/id/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.id.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }{
  "docs": {
    "count": 1469399,
    "fields": {
      "doc_id": {
        "max_len": 11,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 4902
  },
  "qrels": {
    "count": 4902,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 4902
        }
      }
    }
  }
}
Complete Japanese dataset, including all train, dev, and test queries and qrels.
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ja")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ja queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.ja.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ja")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ja docs
[doc_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.ja')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Passage identified within Wikipedia article from top Google search results | 5.5K | 100.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ja")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ja qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.ja.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }{
  "docs": {
    "count": 7000027,
    "fields": {
      "doc_id": {
        "max_len": 12,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 5353
  },
  "qrels": {
    "count": 5548,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 5548
        }
      }
    }
  }
}
Development set for Japanese
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ja/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ja/dev queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.ja.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mr-tydi/ja
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ja/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ja/dev docs
[doc_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.ja.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Passage identified within Wikipedia article from top Google search results | 928 | 100.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ja/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ja/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.ja.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }{
  "docs": {
    "count": 7000027,
    "fields": {
      "doc_id": {
        "max_len": 12,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 928
  },
  "qrels": {
    "count": 928,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 928
        }
      }
    }
  }
}
Test set for Japanese
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ja/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ja/test queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.ja.test.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mr-tydi/ja
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ja/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ja/test docs
[doc_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.ja.test')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Passage identified within Wikipedia article from top Google search results | 923 | 100.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ja/test")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ja/test qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.ja.test.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }{
  "docs": {
    "count": 7000027,
    "fields": {
      "doc_id": {
        "max_len": 12,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 720
  },
  "qrels": {
    "count": 923,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 923
        }
      }
    }
  }
}
Train set for Japanese
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ja/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ja/train queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.ja.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mr-tydi/ja
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ja/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ja/train docs
[doc_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.ja.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Passage identified within Wikipedia article from top Google search results | 3.7K | 100.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ja/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ja/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.ja.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }{
  "docs": {
    "count": 7000027,
    "fields": {
      "doc_id": {
        "max_len": 12,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 3697
  },
  "qrels": {
    "count": 3697,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 3697
        }
      }
    }
  }
}
Complete Korean dataset, including all train, dev, and test queries and qrels.
Language: ko
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ko")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ko queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.ko.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: ko
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ko")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ko docs
[doc_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.ko')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Passage identified within Wikipedia article from top Google search results | 2.1K | 100.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ko")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ko qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.ko.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }{
  "docs": {
    "count": 1496126,
    "fields": {
      "doc_id": {
        "max_len": 12,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 2019
  },
  "qrels": {
    "count": 2116,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 2116
        }
      }
    }
  }
}
Development set for Korean
Language: ko
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ko/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ko/dev queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.ko.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mr-tydi/ko
Language: ko
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ko/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ko/dev docs
[doc_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.ko.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Passage identified within Wikipedia article from top Google search results | 307 | 100.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ko/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ko/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.ko.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }{
  "docs": {
    "count": 1496126,
    "fields": {
      "doc_id": {
        "max_len": 12,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 303
  },
  "qrels": {
    "count": 307,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 307
        }
      }
    }
  }
}
Test set for Korean
Language: ko
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ko/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ko/test queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.ko.test.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mr-tydi/ko
Language: ko
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ko/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ko/test docs
[doc_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.ko.test')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Passage identified within Wikipedia article from top Google search results | 492 | 100.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ko/test")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ko/test qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.ko.test.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }{
  "docs": {
    "count": 1496126,
    "fields": {
      "doc_id": {
        "max_len": 12,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 421
  },
  "qrels": {
    "count": 492,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 492
        }
      }
    }
  }
}
Train set for Korean
Language: ko
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ko/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ko/train queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.ko.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mr-tydi/ko
Language: ko
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ko/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ko/train docs
[doc_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.ko.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Passage identified within Wikipedia article from top Google search results | 1.3K | 100.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ko/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ko/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.ko.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }{
  "docs": {
    "count": 1496126,
    "fields": {
      "doc_id": {
        "max_len": 12,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 1295
  },
  "qrels": {
    "count": 1317,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 1317
        }
      }
    }
  }
}
Complete Russian dataset, including all train, dev, and test queries and qrels.
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ru")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ru queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.ru.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ru")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ru docs
[doc_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.ru')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Passage identified within Wikipedia article from top Google search results | 7.9K | 100.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ru")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ru qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.ru.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }{
  "docs": {
    "count": 9597504,
    "fields": {
      "doc_id": {
        "max_len": 12,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 7763
  },
  "qrels": {
    "count": 7909,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 7909
        }
      }
    }
  }
}
Development set for Russian
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ru/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ru/dev queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.ru.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mr-tydi/ru
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ru/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ru/dev docs
[doc_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.ru.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Passage identified within Wikipedia article from top Google search results | 1.4K | 100.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ru/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ru/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.ru.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }{
  "docs": {
    "count": 9597504,
    "fields": {
      "doc_id": {
        "max_len": 12,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 1375
  },
  "qrels": {
    "count": 1375,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 1375
        }
      }
    }
  }
}
Test set for Russian
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ru/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ru/test queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.ru.test.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mr-tydi/ru
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ru/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ru/test docs
[doc_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.ru.test')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Passage identified within Wikipedia article from top Google search results | 1.2K | 100.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ru/test")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ru/test qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.ru.test.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }{
  "docs": {
    "count": 9597504,
    "fields": {
      "doc_id": {
        "max_len": 12,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 995
  },
  "qrels": {
    "count": 1168,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 1168
        }
      }
    }
  }
}
Train set for Russian
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ru/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ru/train queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.ru.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mr-tydi/ru
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ru/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ru/train docs
[doc_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.ru.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Passage identified within Wikipedia article from top Google search results | 5.4K | 100.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/ru/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mr-tydi/ru/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.ru.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }{
  "docs": {
    "count": 9597504,
    "fields": {
      "doc_id": {
        "max_len": 12,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 5366
  },
  "qrels": {
    "count": 5366,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 5366
        }
      }
    }
  }
}
Complete Swahili dataset, including all train, dev, and test queries and qrels.
Language: sw
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/sw")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/sw queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.sw.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: sw
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/sw")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/sw docs
[doc_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.sw')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Passage identified within Wikipedia article from top Google search results | 3.8K | 100.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/sw")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mr-tydi/sw qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.sw.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }{
  "docs": {
    "count": 136689,
    "fields": {
      "doc_id": {
        "max_len": 9,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 3271
  },
  "qrels": {
    "count": 3767,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 3767
        }
      }
    }
  }
}
Development set for Swahili
Language: sw
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/sw/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/sw/dev queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.sw.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mr-tydi/sw
Language: sw
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/sw/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/sw/dev docs
[doc_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.sw.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Passage identified within Wikipedia article from top Google search results | 623 | 100.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/sw/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mr-tydi/sw/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.sw.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }{
  "docs": {
    "count": 136689,
    "fields": {
      "doc_id": {
        "max_len": 9,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 526
  },
  "qrels": {
    "count": 623,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 623
        }
      }
    }
  }
}
Test set for Swahili
Language: sw
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/sw/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/sw/test queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.sw.test.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mr-tydi/sw
Language: sw
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/sw/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/sw/test docs
[doc_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.sw.test')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Passage identified within Wikipedia article from top Google search results | 743 | 100.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/sw/test")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mr-tydi/sw/test qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.sw.test.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }{
  "docs": {
    "count": 136689,
    "fields": {
      "doc_id": {
        "max_len": 9,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 670
  },
  "qrels": {
    "count": 743,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 743
        }
      }
    }
  }
}
Train set for Swahili
Language: sw
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/sw/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/sw/train queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.sw.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mr-tydi/sw
Language: sw
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/sw/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/sw/train docs
[doc_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.sw.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Passage identified within Wikipedia article from top Google search results | 2.4K | 100.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/sw/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mr-tydi/sw/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.sw.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }{
  "docs": {
    "count": 136689,
    "fields": {
      "doc_id": {
        "max_len": 9,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 2072
  },
  "qrels": {
    "count": 2401,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 2401
        }
      }
    }
  }
}
Complete Telugu dataset, including all train, dev, and test queries and qrels.
Language: te
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/te")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/te queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.te.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: te
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/te")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/te docs
[doc_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.te')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Passage identified within Wikipedia article from top Google search results | 5.5K | 100.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/te")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mr-tydi/te qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.te.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }{
  "docs": {
    "count": 548224,
    "fields": {
      "doc_id": {
        "max_len": 10,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 5517
  },
  "qrels": {
    "count": 5540,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 5540
        }
      }
    }
  }
}
Development set for Telugu
Language: te
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/te/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/te/dev queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.te.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mr-tydi/te
Language: te
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/te/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/te/dev docs
[doc_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.te.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Passage identified within Wikipedia article from top Google search results | 983 | 100.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/te/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mr-tydi/te/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.te.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }{
  "docs": {
    "count": 548224,
    "fields": {
      "doc_id": {
        "max_len": 10,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 983
  },
  "qrels": {
    "count": 983,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 983
        }
      }
    }
  }
}
Test set for Telugu
Language: te
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/te/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/te/test queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.te.test.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mr-tydi/te
Language: te
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/te/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/te/test docs
[doc_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.te.test')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Passage identified within Wikipedia article from top Google search results | 677 | 100.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/te/test")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mr-tydi/te/test qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.te.test.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }{
  "docs": {
    "count": 548224,
    "fields": {
      "doc_id": {
        "max_len": 10,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 646
  },
  "qrels": {
    "count": 677,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 677
        }
      }
    }
  }
}
Train set for Telugu
Language: te
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/te/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/te/train queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.te.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mr-tydi/te
Language: te
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/te/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/te/train docs
[doc_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.te.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Passage identified within Wikipedia article from top Google search results | 3.9K | 100.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/te/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mr-tydi/te/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.te.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }{
  "docs": {
    "count": 548224,
    "fields": {
      "doc_id": {
        "max_len": 10,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 3880
  },
  "qrels": {
    "count": 3880,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 3880
        }
      }
    }
  }
}
Complete Thai dataset, including all train, dev, and test queries and qrels.
Language: th
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/th")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/th queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.th.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: th
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/th")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/th docs
[doc_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.th')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Passage identified within Wikipedia article from top Google search results | 5.5K | 100.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/th")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mr-tydi/th qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.th.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }{
  "docs": {
    "count": 568855,
    "fields": {
      "doc_id": {
        "max_len": 10,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 5322
  },
  "qrels": {
    "count": 5545,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 5545
        }
      }
    }
  }
}
Development set for Thai
Language: th
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/th/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/th/dev queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.th.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mr-tydi/th
Language: th
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/th/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/th/dev docs
[doc_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.th.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Passage identified within Wikipedia article from top Google search results | 817 | 100.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/th/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mr-tydi/th/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.th.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }{
  "docs": {
    "count": 568855,
    "fields": {
      "doc_id": {
        "max_len": 10,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 807
  },
  "qrels": {
    "count": 817,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 817
        }
      }
    }
  }
}
Test set for Thai
Language: th
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/th/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/th/test queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.th.test.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mr-tydi/th
Language: th
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/th/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/th/test docs
[doc_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.th.test')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Passage identified within Wikipedia article from top Google search results | 1.4K | 100.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/th/test")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mr-tydi/th/test qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.th.test.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }{
  "docs": {
    "count": 568855,
    "fields": {
      "doc_id": {
        "max_len": 10,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 1190
  },
  "qrels": {
    "count": 1368,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 1368
        }
      }
    }
  }
}
Train set for Thai
Language: th
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/th/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/th/train queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.mr-tydi.th.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from mr-tydi/th
Language: th
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/th/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export mr-tydi/th/train docs
[doc_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.mr-tydi.th.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 1 | Passage identified within Wikipedia article from top Google search results | 3.4K | 100.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("mr-tydi/th/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export mr-tydi/th/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.mr-tydi.th.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@article{Zhang2021MrTyDi, title={{Mr. TyDi}: A Multi-lingual Benchmark for Dense Retrieval}, author={Xinyu Zhang and Xueguang Ma and Peng Shi and Jimmy Lin}, year={2021}, journal={arXiv:2108.08787}, } @article{Clark2020TyDiQa, title={{TyDi QA}: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author={Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}, year={2020}, journal={Transactions of the Association for Computational Linguistics} }{
  "docs": {
    "count": 568855,
    "fields": {
      "doc_id": {
        "max_len": 10,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 3319
  },
  "qrels": {
    "count": 3360,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 3360
        }
      }
    }
  }
}