ir_datasets: WikiCLIRA Cross-Language IR (CLIR) collection between English queries and other language documents, built from Wikipedia.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }WikiCLIR with Arabic documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/ar")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/ar queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.ar.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/ar")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/ar docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.ar')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | All other articles that link to the mate, and are linked by the mate | 195K | 37.5% |
| 2 | Document assigned to the (English) cross-lingual mate | 324K | 62.5% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/ar")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/ar qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.ar.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{
"docs": {
"count": 535118,
"fields": {
"doc_id": {
"max_len": 7,
"common_prefix": ""
}
}
},
"queries": {
"count": 324489
},
"qrels": {
"count": 519269,
"fields": {
"relevance": {
"counts_by_value": {
"2": 324475,
"1": 194794
}
}
}
}
}
WikiCLIR with Catalan documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/ca")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/ca queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.ca.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: ca
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/ca")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/ca docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.ca')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | All other articles that link to the mate, and are linked by the mate | 626K | 64.8% |
| 2 | Document assigned to the (English) cross-lingual mate | 340K | 35.2% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/ca")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/ca qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.ca.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{
"docs": {
"count": 548722,
"fields": {
"doc_id": {
"max_len": 7,
"common_prefix": ""
}
}
},
"queries": {
"count": 339586
},
"qrels": {
"count": 965233,
"fields": {
"relevance": {
"counts_by_value": {
"2": 339562,
"1": 625671
}
}
}
}
}
WikiCLIR with Czech documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/cs")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/cs queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.cs.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: cs
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/cs")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/cs docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.cs')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | All other articles that link to the mate, and are linked by the mate | 721K | 75.5% |
| 2 | Document assigned to the (English) cross-lingual mate | 234K | 24.5% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/cs")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/cs qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.cs.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{
"docs": {
"count": 386906,
"fields": {
"doc_id": {
"max_len": 7,
"common_prefix": ""
}
}
},
"queries": {
"count": 233553
},
"qrels": {
"count": 954370,
"fields": {
"relevance": {
"counts_by_value": {
"2": 233535,
"1": 720835
}
}
}
}
}
WikiCLIR with German documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/de")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/de queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.de.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/de")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/de docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.de')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | All other articles that link to the mate, and are linked by the mate | 4.6M | 83.1% |
| 2 | Document assigned to the (English) cross-lingual mate | 938K | 16.9% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/de")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/de qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.de.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{
"docs": {
"count": 2091278,
"fields": {
"doc_id": {
"max_len": 8,
"common_prefix": ""
}
}
},
"queries": {
"count": 938217
},
"qrels": {
"count": 5550454,
"fields": {
"relevance": {
"counts_by_value": {
"2": 938194,
"1": 4612260
}
}
}
}
}
WikiCLIR with Simple English documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/en-simple")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/en-simple queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikiclir/en-simple')
index_ref = pt.IndexRef.of('./indices/wikiclir_en-simple') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.en-simple.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/en-simple")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/en-simple docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikiclir/en-simple')
# Index wikiclir/en-simple
indexer = pt.IterDictIndexer('./indices/wikiclir_en-simple')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.en-simple')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | All other articles that link to the mate, and are linked by the mate | 136K | 54.2% |
| 2 | Document assigned to the (English) cross-lingual mate | 115K | 45.8% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/en-simple")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/en-simple qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:wikiclir/en-simple')
index_ref = pt.IndexRef.of('./indices/wikiclir_en-simple') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.en-simple.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{
"docs": {
"count": 127089,
"fields": {
"doc_id": {
"max_len": 6,
"common_prefix": ""
}
}
},
"queries": {
"count": 114572
},
"qrels": {
"count": 250380,
"fields": {
"relevance": {
"counts_by_value": {
"2": 114564,
"1": 135816
}
}
}
}
}
WikiCLIR with Spanish documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/es")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/es queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.es.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/es")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/es docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.es')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | All other articles that link to the mate, and are linked by the mate | 2.1M | 73.0% |
| 2 | Document assigned to the (English) cross-lingual mate | 781K | 27.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/es")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/es qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.es.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{
"docs": {
"count": 1302958,
"fields": {
"doc_id": {
"max_len": 7,
"common_prefix": ""
}
}
},
"queries": {
"count": 781642
},
"qrels": {
"count": 2894807,
"fields": {
"relevance": {
"counts_by_value": {
"2": 781376,
"1": 2113431
}
}
}
}
}
WikiCLIR with Finnish documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/fi")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/fi queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.fi.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: fi
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/fi")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/fi docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.fi')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | All other articles that link to the mate, and are linked by the mate | 666K | 70.9% |
| 2 | Document assigned to the (English) cross-lingual mate | 274K | 29.1% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/fi")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/fi qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.fi.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{
"docs": {
"count": 418677,
"fields": {
"doc_id": {
"max_len": 7,
"common_prefix": ""
}
}
},
"queries": {
"count": 273819
},
"qrels": {
"count": 939613,
"fields": {
"relevance": {
"counts_by_value": {
"2": 273796,
"1": 665817
}
}
}
}
}
WikiCLIR with French documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/fr")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/fr queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.fr.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/fr")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/fr docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.fr')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | All other articles that link to the mate, and are linked by the mate | 4.0M | 78.8% |
| 2 | Document assigned to the (English) cross-lingual mate | 1.1M | 21.2% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/fr")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/fr qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.fr.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{
"docs": {
"count": 1894397,
"fields": {
"doc_id": {
"max_len": 8,
"common_prefix": ""
}
}
},
"queries": {
"count": 1089179
},
"qrels": {
"count": 5137366,
"fields": {
"relevance": {
"counts_by_value": {
"2": 1089052,
"1": 4048314
}
}
}
}
}
WikiCLIR with Italian documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/it")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/it queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.it.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/it")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/it docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.it')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | All other articles that link to the mate, and are linked by the mate | 2.6M | 76.5% |
| 2 | Document assigned to the (English) cross-lingual mate | 808K | 23.5% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/it")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/it qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.it.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{
"docs": {
"count": 1347011,
"fields": {
"doc_id": {
"max_len": 7,
"common_prefix": ""
}
}
},
"queries": {
"count": 808605
},
"qrels": {
"count": 3443633,
"fields": {
"relevance": {
"counts_by_value": {
"2": 808345,
"1": 2635288
}
}
}
}
}
WikiCLIR with Japanese documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/ja")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/ja queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.ja.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/ja")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/ja docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.ja')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | All other articles that link to the mate, and are linked by the mate | 2.9M | 87.2% |
| 2 | Document assigned to the (English) cross-lingual mate | 426K | 12.8% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/ja")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/ja qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.ja.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{
"docs": {
"count": 1071292,
"fields": {
"doc_id": {
"max_len": 7,
"common_prefix": ""
}
}
},
"queries": {
"count": 426431
},
"qrels": {
"count": 3338667,
"fields": {
"relevance": {
"counts_by_value": {
"2": 426383,
"1": 2912284
}
}
}
}
}
WikiCLIR with Korean documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/ko")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/ko queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.ko.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: ko
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/ko")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/ko docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.ko')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | All other articles that link to the mate, and are linked by the mate | 343K | 60.4% |
| 2 | Document assigned to the (English) cross-lingual mate | 225K | 39.6% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/ko")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/ko qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.ko.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{
"docs": {
"count": 394177,
"fields": {
"doc_id": {
"max_len": 7,
"common_prefix": ""
}
}
},
"queries": {
"count": 224855
},
"qrels": {
"count": 568205,
"fields": {
"relevance": {
"counts_by_value": {
"2": 224843,
"1": 343362
}
}
}
}
}
WikiCLIR with Dutch documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/nl")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/nl queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.nl.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: nl
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/nl")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/nl docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.nl')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | All other articles that link to the mate, and are linked by the mate | 1.6M | 70.5% |
| 2 | Document assigned to the (English) cross-lingual mate | 688K | 29.5% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/nl")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/nl qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.nl.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{
"docs": {
"count": 1908260,
"fields": {
"doc_id": {
"max_len": 7,
"common_prefix": ""
}
}
},
"queries": {
"count": 687718
},
"qrels": {
"count": 2334644,
"fields": {
"relevance": {
"counts_by_value": {
"2": 687672,
"1": 1646972
}
}
}
}
}
WikiCLIR with Norwegian (Bokmål) documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/nn")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/nn queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.nn.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: nn
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/nn")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/nn docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.nn')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | All other articles that link to the mate, and are linked by the mate | 151K | 60.2% |
| 2 | Document assigned to the (English) cross-lingual mate | 99K | 39.8% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/nn")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/nn qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.nn.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{
"docs": {
"count": 133290,
"fields": {
"doc_id": {
"max_len": 6,
"common_prefix": ""
}
}
},
"queries": {
"count": 99493
},
"qrels": {
"count": 250141,
"fields": {
"relevance": {
"counts_by_value": {
"2": 99465,
"1": 150676
}
}
}
}
}
WikiCLIR with Norwegian (Nynorsk) documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/no")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/no queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.no.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: no
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/no")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/no docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.no')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | All other articles that link to the mate, and are linked by the mate | 664K | 68.9% |
| 2 | Document assigned to the (English) cross-lingual mate | 300K | 31.1% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/no")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/no qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.no.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{
"docs": {
"count": 471420,
"fields": {
"doc_id": {
"max_len": 7,
"common_prefix": ""
}
}
},
"queries": {
"count": 299897
},
"qrels": {
"count": 963514,
"fields": {
"relevance": {
"counts_by_value": {
"2": 299831,
"1": 663683
}
}
}
}
}
WikiCLIR with Polish documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/pl")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/pl queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.pl.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: pl
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/pl")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/pl docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.pl')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | All other articles that link to the mate, and are linked by the mate | 1.8M | 71.9% |
| 2 | Document assigned to the (English) cross-lingual mate | 694K | 28.1% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/pl")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/pl qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.pl.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{
"docs": {
"count": 1234316,
"fields": {
"doc_id": {
"max_len": 7,
"common_prefix": ""
}
}
},
"queries": {
"count": 693656
},
"qrels": {
"count": 2471360,
"fields": {
"relevance": {
"counts_by_value": {
"2": 693604,
"1": 1777756
}
}
}
}
}
WikiCLIR with Portuguese documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/pt")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/pt queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.pt.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/pt")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/pt docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.pt')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | All other articles that link to the mate, and are linked by the mate | 1.1M | 64.9% |
| 2 | Document assigned to the (English) cross-lingual mate | 612K | 35.1% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/pt")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/pt qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.pt.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{
"docs": {
"count": 973057,
"fields": {
"doc_id": {
"max_len": 7,
"common_prefix": ""
}
}
},
"queries": {
"count": 611732
},
"qrels": {
"count": 1741889,
"fields": {
"relevance": {
"counts_by_value": {
"2": 611643,
"1": 1130246
}
}
}
}
}
WikiCLIR with Romanian documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/ro")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/ro queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.ro.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: ro
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/ro")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/ro docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.ro')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | All other articles that link to the mate, and are linked by the mate | 252K | 55.8% |
| 2 | Document assigned to the (English) cross-lingual mate | 199K | 44.2% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/ro")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/ro qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.ro.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{
"docs": {
"count": 376655,
"fields": {
"doc_id": {
"max_len": 7,
"common_prefix": ""
}
}
},
"queries": {
"count": 199264
},
"qrels": {
"count": 451180,
"fields": {
"relevance": {
"counts_by_value": {
"2": 199253,
"1": 251927
}
}
}
}
}
WikiCLIR with Russian documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/ru")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/ru queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.ru.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/ru")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/ru docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.ru')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | All other articles that link to the mate, and are linked by the mate | 1.7M | 71.4% |
| 2 | Document assigned to the (English) cross-lingual mate | 665K | 28.6% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/ru")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/ru qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.ru.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{
"docs": {
"count": 1413945,
"fields": {
"doc_id": {
"max_len": 7,
"common_prefix": ""
}
}
},
"queries": {
"count": 664924
},
"qrels": {
"count": 2321384,
"fields": {
"relevance": {
"counts_by_value": {
"2": 664780,
"1": 1656604
}
}
}
}
}
WikiCLIR with Swedish documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/sv")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/sv queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.sv.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: sv
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/sv")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/sv docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.sv')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | All other articles that link to the mate, and are linked by the mate | 1.4M | 69.1% |
| 2 | Document assigned to the (English) cross-lingual mate | 639K | 30.9% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/sv")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/sv qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.sv.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{
"docs": {
"count": 3785412,
"fields": {
"doc_id": {
"max_len": 7,
"common_prefix": ""
}
}
},
"queries": {
"count": 639073
},
"qrels": {
"count": 2069453,
"fields": {
"relevance": {
"counts_by_value": {
"2": 638829,
"1": 1430624
}
}
}
}
}
WikiCLIR with Swahili documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/sw")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/sw queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.sw.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: sw
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/sw")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/sw docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.sw')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | All other articles that link to the mate, and are linked by the mate | 35K | 60.5% |
| 2 | Document assigned to the (English) cross-lingual mate | 23K | 39.5% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/sw")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/sw qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.sw.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{
"docs": {
"count": 37079,
"fields": {
"doc_id": {
"max_len": 5,
"common_prefix": ""
}
}
},
"queries": {
"count": 22860
},
"qrels": {
"count": 57924,
"fields": {
"relevance": {
"counts_by_value": {
"2": 22859,
"1": 35065
}
}
}
}
}
WikiCLIR with Tagalog documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/tl")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/tl queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.tl.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: tl
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/tl")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/tl docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.tl')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | All other articles that link to the mate, and are linked by the mate | 23K | 32.4% |
| 2 | Document assigned to the (English) cross-lingual mate | 49K | 67.6% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/tl")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/tl qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.tl.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{
"docs": {
"count": 79008,
"fields": {
"doc_id": {
"max_len": 6,
"common_prefix": ""
}
}
},
"queries": {
"count": 48930
},
"qrels": {
"count": 72359,
"fields": {
"relevance": {
"counts_by_value": {
"2": 48928,
"1": 23431
}
}
}
}
}
WikiCLIR with Turkish documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/tr")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/tr queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.tr.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: tr
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/tr")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/tr docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.tr')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | All other articles that link to the mate, and are linked by the mate | 195K | 51.3% |
| 2 | Document assigned to the (English) cross-lingual mate | 185K | 48.7% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/tr")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/tr qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.tr.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{
"docs": {
"count": 295593,
"fields": {
"doc_id": {
"max_len": 7,
"common_prefix": ""
}
}
},
"queries": {
"count": 185388
},
"qrels": {
"count": 380651,
"fields": {
"relevance": {
"counts_by_value": {
"2": 185360,
"1": 195291
}
}
}
}
}
WikiCLIR with Ukrainian documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/uk")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/uk queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.uk.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: uk
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/uk")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/uk docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.uk')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | All other articles that link to the mate, and are linked by the mate | 565K | 61.9% |
| 2 | Document assigned to the (English) cross-lingual mate | 348K | 38.1% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/uk")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/uk qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.uk.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{
"docs": {
"count": 704903,
"fields": {
"doc_id": {
"max_len": 7,
"common_prefix": ""
}
}
},
"queries": {
"count": 348222
},
"qrels": {
"count": 913358,
"fields": {
"relevance": {
"counts_by_value": {
"2": 348168,
"1": 565190
}
}
}
}
}
WikiCLIR with Vietnamese documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/vi")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/vi queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.vi.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: vi
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/vi")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/vi docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.vi')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | All other articles that link to the mate, and are linked by the mate | 257K | 42.1% |
| 2 | Document assigned to the (English) cross-lingual mate | 354K | 57.9% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/vi")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/vi qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.vi.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{
"docs": {
"count": 1392152,
"fields": {
"doc_id": {
"max_len": 7,
"common_prefix": ""
}
}
},
"queries": {
"count": 354312
},
"qrels": {
"count": 611355,
"fields": {
"relevance": {
"counts_by_value": {
"2": 354279,
"1": 257076
}
}
}
}
}
WikiCLIR with Chinese documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/zh")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/zh queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.zh.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/zh")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/zh docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.zh')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | All other articles that link to the mate, and are linked by the mate | 463K | 50.0% |
| 2 | Document assigned to the (English) cross-lingual mate | 463K | 50.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/zh")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/zh qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.zh.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{
"docs": {
"count": 951480,
"fields": {
"doc_id": {
"max_len": 7,
"common_prefix": ""
}
}
},
"queries": {
"count": 463273
},
"qrels": {
"count": 926130,
"fields": {
"relevance": {
"counts_by_value": {
"2": 463209,
"1": 462921
}
}
}
}
}