← home
Github: datasets/wikiclir.py

ir_datasets: WikiCLIR

Index
  1. wikiclir
  2. wikiclir/ar
  3. wikiclir/ca
  4. wikiclir/cs
  5. wikiclir/de
  6. wikiclir/en-simple
  7. wikiclir/es
  8. wikiclir/fi
  9. wikiclir/fr
  10. wikiclir/it
  11. wikiclir/ja
  12. wikiclir/ko
  13. wikiclir/nl
  14. wikiclir/nn
  15. wikiclir/no
  16. wikiclir/pl
  17. wikiclir/pt
  18. wikiclir/ro
  19. wikiclir/ru
  20. wikiclir/sv
  21. wikiclir/sw
  22. wikiclir/tl
  23. wikiclir/tr
  24. wikiclir/uk
  25. wikiclir/vi
  26. wikiclir/zh

"wikiclir"

A Cross-Language IR (CLIR) collection between English queries and other language documents, built from Wikipedia.

Citation

ir_datasets.bib:

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }

"wikiclir/ar"

WikiCLIR with Arabic documents.

queries
324K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/ar")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/ar queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

docs
535K docs

Language: ar

Document type:
WikiClirDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/ar")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/ar docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

qrels
519K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1All other articles that link to the mate, and are linked by the mate195K37.5%
2Document assigned to the (English) cross-lingual mate324K62.5%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/ar")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/ar qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Citation

ir_datasets.bib:

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }
Metadata

"wikiclir/ca"

WikiCLIR with Catalan documents.

queries
340K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/ca")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/ca queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

docs
549K docs

Language: ca

Document type:
WikiClirDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/ca")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/ca docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

qrels
965K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1All other articles that link to the mate, and are linked by the mate626K64.8%
2Document assigned to the (English) cross-lingual mate340K35.2%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/ca")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/ca qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Citation

ir_datasets.bib:

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }
Metadata

"wikiclir/cs"

WikiCLIR with Czech documents.

queries
234K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/cs")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/cs queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

docs
387K docs

Language: cs

Document type:
WikiClirDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/cs")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/cs docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

qrels
954K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1All other articles that link to the mate, and are linked by the mate721K75.5%
2Document assigned to the (English) cross-lingual mate234K24.5%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/cs")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/cs qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Citation

ir_datasets.bib:

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }
Metadata

"wikiclir/de"

WikiCLIR with German documents.

queries
938K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/de")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/de queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

docs
2.1M docs

Language: de

Document type:
WikiClirDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/de")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/de docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

qrels
5.6M qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1All other articles that link to the mate, and are linked by the mate4.6M83.1%
2Document assigned to the (English) cross-lingual mate938K16.9%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/de")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/de qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Citation

ir_datasets.bib:

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }
Metadata

"wikiclir/en-simple"

WikiCLIR with Simple English documents.

queries
115K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/en-simple")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/en-simple queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikiclir/en-simple')
index_ref = pt.IndexRef.of('./indices/wikiclir_en-simple') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs
127K docs

Language: en

Document type:
WikiClirDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/en-simple")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/en-simple docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikiclir/en-simple')
# Index wikiclir/en-simple
indexer = pt.IterDictIndexer('./indices/wikiclir_en-simple')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'text'])

You can find more details about PyTerrier indexing here.

qrels
250K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1All other articles that link to the mate, and are linked by the mate136K54.2%
2Document assigned to the (English) cross-lingual mate115K45.8%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/en-simple")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/en-simple qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:wikiclir/en-simple')
index_ref = pt.IndexRef.of('./indices/wikiclir_en-simple') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }
Metadata

"wikiclir/es"

WikiCLIR with Spanish documents.

queries
782K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/es")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/es queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

docs
1.3M docs

Language: es

Document type:
WikiClirDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/es")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/es docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

qrels
2.9M qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1All other articles that link to the mate, and are linked by the mate2.1M73.0%
2Document assigned to the (English) cross-lingual mate781K27.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/es")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/es qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Citation

ir_datasets.bib:

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }
Metadata

"wikiclir/fi"

WikiCLIR with Finnish documents.

queries
274K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/fi")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/fi queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

docs
419K docs

Language: fi

Document type:
WikiClirDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/fi")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/fi docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

qrels
940K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1All other articles that link to the mate, and are linked by the mate666K70.9%
2Document assigned to the (English) cross-lingual mate274K29.1%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/fi")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/fi qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Citation

ir_datasets.bib:

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }
Metadata

"wikiclir/fr"

WikiCLIR with French documents.

queries
1.1M queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/fr")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/fr queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

docs
1.9M docs

Language: fr

Document type:
WikiClirDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/fr")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/fr docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

qrels
5.1M qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1All other articles that link to the mate, and are linked by the mate4.0M78.8%
2Document assigned to the (English) cross-lingual mate1.1M21.2%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/fr")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/fr qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Citation

ir_datasets.bib:

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }
Metadata

"wikiclir/it"

WikiCLIR with Italian documents.

queries
809K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/it")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/it queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

docs
1.3M docs

Language: it

Document type:
WikiClirDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/it")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/it docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

qrels
3.4M qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1All other articles that link to the mate, and are linked by the mate2.6M76.5%
2Document assigned to the (English) cross-lingual mate808K23.5%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/it")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/it qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Citation

ir_datasets.bib:

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }
Metadata

"wikiclir/ja"

WikiCLIR with Japanese documents.

queries
426K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/ja")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/ja queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

docs
1.1M docs

Language: ja

Document type:
WikiClirDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/ja")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/ja docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

qrels
3.3M qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1All other articles that link to the mate, and are linked by the mate2.9M87.2%
2Document assigned to the (English) cross-lingual mate426K12.8%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/ja")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/ja qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Citation

ir_datasets.bib:

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }
Metadata

"wikiclir/ko"

WikiCLIR with Korean documents.

queries
225K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/ko")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/ko queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

docs
394K docs

Language: ko

Document type:
WikiClirDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/ko")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/ko docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

qrels
568K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1All other articles that link to the mate, and are linked by the mate343K60.4%
2Document assigned to the (English) cross-lingual mate225K39.6%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/ko")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/ko qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Citation

ir_datasets.bib:

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }
Metadata

"wikiclir/nl"

WikiCLIR with Dutch documents.

queries
688K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/nl")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/nl queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

docs
1.9M docs

Language: nl

Document type:
WikiClirDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/nl")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/nl docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

qrels
2.3M qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1All other articles that link to the mate, and are linked by the mate1.6M70.5%
2Document assigned to the (English) cross-lingual mate688K29.5%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/nl")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/nl qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Citation

ir_datasets.bib:

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }
Metadata

"wikiclir/nn"

WikiCLIR with Norwegian (Bokmål) documents.

queries
99K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/nn")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/nn queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

docs
133K docs

Language: nn

Document type:
WikiClirDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/nn")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/nn docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

qrels
250K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1All other articles that link to the mate, and are linked by the mate151K60.2%
2Document assigned to the (English) cross-lingual mate99K39.8%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/nn")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/nn qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Citation

ir_datasets.bib:

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }
Metadata

"wikiclir/no"

WikiCLIR with Norwegian (Nynorsk) documents.

queries
300K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/no")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/no queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

docs
471K docs

Language: no

Document type:
WikiClirDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/no")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/no docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

qrels
964K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1All other articles that link to the mate, and are linked by the mate664K68.9%
2Document assigned to the (English) cross-lingual mate300K31.1%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/no")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/no qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Citation

ir_datasets.bib:

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }
Metadata

"wikiclir/pl"

WikiCLIR with Polish documents.

queries
694K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/pl")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/pl queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

docs
1.2M docs

Language: pl

Document type:
WikiClirDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/pl")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/pl docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

qrels
2.5M qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1All other articles that link to the mate, and are linked by the mate1.8M71.9%
2Document assigned to the (English) cross-lingual mate694K28.1%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/pl")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/pl qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Citation

ir_datasets.bib:

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }
Metadata

"wikiclir/pt"

WikiCLIR with Portuguese documents.

queries
612K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/pt")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/pt queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

docs
973K docs

Language: pt

Document type:
WikiClirDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/pt")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/pt docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

qrels
1.7M qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1All other articles that link to the mate, and are linked by the mate1.1M64.9%
2Document assigned to the (English) cross-lingual mate612K35.1%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/pt")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/pt qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Citation

ir_datasets.bib:

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }
Metadata

"wikiclir/ro"

WikiCLIR with Romanian documents.

queries
199K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/ro")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/ro queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

docs
377K docs

Language: ro

Document type:
WikiClirDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/ro")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/ro docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

qrels
451K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1All other articles that link to the mate, and are linked by the mate252K55.8%
2Document assigned to the (English) cross-lingual mate199K44.2%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/ro")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/ro qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Citation

ir_datasets.bib:

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }
Metadata

"wikiclir/ru"

WikiCLIR with Russian documents.

queries
665K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/ru")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/ru queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

docs
1.4M docs

Language: ru

Document type:
WikiClirDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/ru")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/ru docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

qrels
2.3M qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1All other articles that link to the mate, and are linked by the mate1.7M71.4%
2Document assigned to the (English) cross-lingual mate665K28.6%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/ru")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/ru qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Citation

ir_datasets.bib:

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }
Metadata

"wikiclir/sv"

WikiCLIR with Swedish documents.

queries
639K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/sv")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/sv queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

docs
3.8M docs

Language: sv

Document type:
WikiClirDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/sv")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/sv docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

qrels
2.1M qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1All other articles that link to the mate, and are linked by the mate1.4M69.1%
2Document assigned to the (English) cross-lingual mate639K30.9%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/sv")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/sv qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Citation

ir_datasets.bib:

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }
Metadata

"wikiclir/sw"

WikiCLIR with Swahili documents.

queries
23K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/sw")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/sw queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

docs
37K docs

Language: sw

Document type:
WikiClirDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/sw")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/sw docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

qrels
58K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1All other articles that link to the mate, and are linked by the mate35K60.5%
2Document assigned to the (English) cross-lingual mate23K39.5%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/sw")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/sw qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Citation

ir_datasets.bib:

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }
Metadata

"wikiclir/tl"

WikiCLIR with Tagalog documents.

queries
49K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/tl")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/tl queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

docs
79K docs

Language: tl

Document type:
WikiClirDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/tl")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/tl docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

qrels
72K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1All other articles that link to the mate, and are linked by the mate23K32.4%
2Document assigned to the (English) cross-lingual mate49K67.6%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/tl")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/tl qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Citation

ir_datasets.bib:

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }
Metadata

"wikiclir/tr"

WikiCLIR with Turkish documents.

queries
185K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/tr")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/tr queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

docs
296K docs

Language: tr

Document type:
WikiClirDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/tr")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/tr docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

qrels
381K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1All other articles that link to the mate, and are linked by the mate195K51.3%
2Document assigned to the (English) cross-lingual mate185K48.7%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/tr")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/tr qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Citation

ir_datasets.bib:

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }
Metadata

"wikiclir/uk"

WikiCLIR with Ukrainian documents.

queries
348K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/uk")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/uk queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

docs
705K docs

Language: uk

Document type:
WikiClirDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/uk")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/uk docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

qrels
913K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1All other articles that link to the mate, and are linked by the mate565K61.9%
2Document assigned to the (English) cross-lingual mate348K38.1%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/uk")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/uk qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Citation

ir_datasets.bib:

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }
Metadata

"wikiclir/vi"

WikiCLIR with Vietnamese documents.

queries
354K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/vi")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/vi queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

docs
1.4M docs

Language: vi

Document type:
WikiClirDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/vi")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/vi docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

qrels
611K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1All other articles that link to the mate, and are linked by the mate257K42.1%
2Document assigned to the (English) cross-lingual mate354K57.9%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/vi")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/vi qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Citation

ir_datasets.bib:

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }
Metadata

"wikiclir/zh"

WikiCLIR with Chinese documents.

queries
463K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/zh")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/zh queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

docs
951K docs

Language: zh

Document type:
WikiClirDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/zh")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/zh docs
[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

qrels
926K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1All other articles that link to the mate, and are linked by the mate463K50.0%
2Document assigned to the (English) cross-lingual mate463K50.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("wikiclir/zh")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export wikiclir/zh qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Citation

ir_datasets.bib:

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }
Metadata