ir_datasets
: WikiCLIRA Cross-Language IR (CLIR) collection between English queries and other language documents, built from Wikipedia.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }WikiCLIR with Arabic documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/ar")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
WikiCLIR with Catalan documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/ca")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
WikiCLIR with Czech documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/cs")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
WikiCLIR with German documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/de")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
WikiCLIR with Simple English documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/en-simple")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
WikiCLIR with Spanish documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/es")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
WikiCLIR with Finnish documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/fi")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
WikiCLIR with French documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/fr")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
WikiCLIR with Italian documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/it")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
WikiCLIR with Japanese documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/ja")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
WikiCLIR with Korean documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/ko")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
WikiCLIR with Dutch documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/nl")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
WikiCLIR with Norwegian (Bokmål) documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/nn")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
WikiCLIR with Norwegian (Nynorsk) documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/no")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
WikiCLIR with Polish documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/pl")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
WikiCLIR with Portuguese documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/pt")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
WikiCLIR with Romanian documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/ro")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
WikiCLIR with Russian documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/ru")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
WikiCLIR with Swedish documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/sv")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
WikiCLIR with Swahili documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/sw")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
WikiCLIR with Tagalog documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/tl")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
WikiCLIR with Turkish documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/tr")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
WikiCLIR with Ukrainian documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/uk")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
WikiCLIR with Vietnamese documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/vi")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
WikiCLIR with Chinese documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/zh")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.