ir_datasets
: WikiCLIRA Cross-Language IR (CLIR) collection between English queries and other language documents, built from Wikipedia.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }WikiCLIR with Arabic documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/ar")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, first_sent>
You can find more details about the Python API here.
ir_datasets export wikiclir/ar queries
[query_id] [title] [first_sent]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.ar.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: ar
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/ar")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/ar docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.ar')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | All other articles that link to the mate, and are linked by the mate | 195K | 37.5% |
2 | Document assigned to the (English) cross-lingual mate | 324K | 62.5% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/ar")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/ar qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.ar.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{ "docs": { "count": 535118, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 324489 }, "qrels": { "count": 519269, "fields": { "relevance": { "counts_by_value": { "2": 324475, "1": 194794 } } } } }
WikiCLIR with Catalan documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/ca")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, first_sent>
You can find more details about the Python API here.
ir_datasets export wikiclir/ca queries
[query_id] [title] [first_sent]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.ca.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: ca
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/ca")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/ca docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.ca')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | All other articles that link to the mate, and are linked by the mate | 626K | 64.8% |
2 | Document assigned to the (English) cross-lingual mate | 340K | 35.2% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/ca")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/ca qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.ca.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{ "docs": { "count": 548722, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 339586 }, "qrels": { "count": 965233, "fields": { "relevance": { "counts_by_value": { "2": 339562, "1": 625671 } } } } }
WikiCLIR with Czech documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/cs")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, first_sent>
You can find more details about the Python API here.
ir_datasets export wikiclir/cs queries
[query_id] [title] [first_sent]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.cs.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: cs
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/cs")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/cs docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.cs')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | All other articles that link to the mate, and are linked by the mate | 721K | 75.5% |
2 | Document assigned to the (English) cross-lingual mate | 234K | 24.5% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/cs")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/cs qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.cs.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{ "docs": { "count": 386906, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 233553 }, "qrels": { "count": 954370, "fields": { "relevance": { "counts_by_value": { "2": 233535, "1": 720835 } } } } }
WikiCLIR with German documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/de")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, first_sent>
You can find more details about the Python API here.
ir_datasets export wikiclir/de queries
[query_id] [title] [first_sent]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.de.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: de
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/de")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/de docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.de')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | All other articles that link to the mate, and are linked by the mate | 4.6M | 83.1% |
2 | Document assigned to the (English) cross-lingual mate | 938K | 16.9% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/de")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/de qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.de.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{ "docs": { "count": 2091278, "fields": { "doc_id": { "max_len": 8, "common_prefix": "" } } }, "queries": { "count": 938217 }, "qrels": { "count": 5550454, "fields": { "relevance": { "counts_by_value": { "2": 938194, "1": 4612260 } } } } }
WikiCLIR with Simple English documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/en-simple")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, first_sent>
You can find more details about the Python API here.
ir_datasets export wikiclir/en-simple queries
[query_id] [title] [first_sent]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikiclir/en-simple')
index_ref = pt.IndexRef.of('./indices/wikiclir_en-simple') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('title'))
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.en-simple.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/en-simple")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/en-simple docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikiclir/en-simple')
# Index wikiclir/en-simple
indexer = pt.IterDictIndexer('./indices/wikiclir_en-simple')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'text'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.en-simple')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | All other articles that link to the mate, and are linked by the mate | 136K | 54.2% |
2 | Document assigned to the (English) cross-lingual mate | 115K | 45.8% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/en-simple")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/en-simple qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:wikiclir/en-simple')
index_ref = pt.IndexRef.of('./indices/wikiclir_en-simple') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('title'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.en-simple.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{ "docs": { "count": 127089, "fields": { "doc_id": { "max_len": 6, "common_prefix": "" } } }, "queries": { "count": 114572 }, "qrels": { "count": 250380, "fields": { "relevance": { "counts_by_value": { "2": 114564, "1": 135816 } } } } }
WikiCLIR with Spanish documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/es")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, first_sent>
You can find more details about the Python API here.
ir_datasets export wikiclir/es queries
[query_id] [title] [first_sent]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.es.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/es")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/es docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.es')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | All other articles that link to the mate, and are linked by the mate | 2.1M | 73.0% |
2 | Document assigned to the (English) cross-lingual mate | 781K | 27.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/es")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/es qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.es.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{ "docs": { "count": 1302958, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 781642 }, "qrels": { "count": 2894807, "fields": { "relevance": { "counts_by_value": { "2": 781376, "1": 2113431 } } } } }
WikiCLIR with Finnish documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/fi")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, first_sent>
You can find more details about the Python API here.
ir_datasets export wikiclir/fi queries
[query_id] [title] [first_sent]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.fi.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: fi
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/fi")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/fi docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.fi')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | All other articles that link to the mate, and are linked by the mate | 666K | 70.9% |
2 | Document assigned to the (English) cross-lingual mate | 274K | 29.1% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/fi")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/fi qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.fi.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{ "docs": { "count": 418677, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 273819 }, "qrels": { "count": 939613, "fields": { "relevance": { "counts_by_value": { "2": 273796, "1": 665817 } } } } }
WikiCLIR with French documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/fr")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, first_sent>
You can find more details about the Python API here.
ir_datasets export wikiclir/fr queries
[query_id] [title] [first_sent]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.fr.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/fr")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/fr docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.fr')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | All other articles that link to the mate, and are linked by the mate | 4.0M | 78.8% |
2 | Document assigned to the (English) cross-lingual mate | 1.1M | 21.2% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/fr")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/fr qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.fr.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{ "docs": { "count": 1894397, "fields": { "doc_id": { "max_len": 8, "common_prefix": "" } } }, "queries": { "count": 1089179 }, "qrels": { "count": 5137366, "fields": { "relevance": { "counts_by_value": { "2": 1089052, "1": 4048314 } } } } }
WikiCLIR with Italian documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/it")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, first_sent>
You can find more details about the Python API here.
ir_datasets export wikiclir/it queries
[query_id] [title] [first_sent]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.it.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/it")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/it docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.it')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | All other articles that link to the mate, and are linked by the mate | 2.6M | 76.5% |
2 | Document assigned to the (English) cross-lingual mate | 808K | 23.5% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/it")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/it qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.it.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{ "docs": { "count": 1347011, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 808605 }, "qrels": { "count": 3443633, "fields": { "relevance": { "counts_by_value": { "2": 808345, "1": 2635288 } } } } }
WikiCLIR with Japanese documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/ja")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, first_sent>
You can find more details about the Python API here.
ir_datasets export wikiclir/ja queries
[query_id] [title] [first_sent]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.ja.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: ja
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/ja")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/ja docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.ja')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | All other articles that link to the mate, and are linked by the mate | 2.9M | 87.2% |
2 | Document assigned to the (English) cross-lingual mate | 426K | 12.8% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/ja")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/ja qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.ja.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{ "docs": { "count": 1071292, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 426431 }, "qrels": { "count": 3338667, "fields": { "relevance": { "counts_by_value": { "2": 426383, "1": 2912284 } } } } }
WikiCLIR with Korean documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/ko")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, first_sent>
You can find more details about the Python API here.
ir_datasets export wikiclir/ko queries
[query_id] [title] [first_sent]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.ko.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: ko
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/ko")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/ko docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.ko')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | All other articles that link to the mate, and are linked by the mate | 343K | 60.4% |
2 | Document assigned to the (English) cross-lingual mate | 225K | 39.6% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/ko")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/ko qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.ko.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{ "docs": { "count": 394177, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 224855 }, "qrels": { "count": 568205, "fields": { "relevance": { "counts_by_value": { "2": 224843, "1": 343362 } } } } }
WikiCLIR with Dutch documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/nl")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, first_sent>
You can find more details about the Python API here.
ir_datasets export wikiclir/nl queries
[query_id] [title] [first_sent]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.nl.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: nl
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/nl")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/nl docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.nl')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | All other articles that link to the mate, and are linked by the mate | 1.6M | 70.5% |
2 | Document assigned to the (English) cross-lingual mate | 688K | 29.5% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/nl")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/nl qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.nl.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{ "docs": { "count": 1908260, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 687718 }, "qrels": { "count": 2334644, "fields": { "relevance": { "counts_by_value": { "2": 687672, "1": 1646972 } } } } }
WikiCLIR with Norwegian (Bokmål) documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/nn")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, first_sent>
You can find more details about the Python API here.
ir_datasets export wikiclir/nn queries
[query_id] [title] [first_sent]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.nn.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: nn
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/nn")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/nn docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.nn')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | All other articles that link to the mate, and are linked by the mate | 151K | 60.2% |
2 | Document assigned to the (English) cross-lingual mate | 99K | 39.8% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/nn")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/nn qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.nn.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{ "docs": { "count": 133290, "fields": { "doc_id": { "max_len": 6, "common_prefix": "" } } }, "queries": { "count": 99493 }, "qrels": { "count": 250141, "fields": { "relevance": { "counts_by_value": { "2": 99465, "1": 150676 } } } } }
WikiCLIR with Norwegian (Nynorsk) documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/no")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, first_sent>
You can find more details about the Python API here.
ir_datasets export wikiclir/no queries
[query_id] [title] [first_sent]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.no.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: no
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/no")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/no docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.no')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | All other articles that link to the mate, and are linked by the mate | 664K | 68.9% |
2 | Document assigned to the (English) cross-lingual mate | 300K | 31.1% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/no")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/no qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.no.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{ "docs": { "count": 471420, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 299897 }, "qrels": { "count": 963514, "fields": { "relevance": { "counts_by_value": { "2": 299831, "1": 663683 } } } } }
WikiCLIR with Polish documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/pl")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, first_sent>
You can find more details about the Python API here.
ir_datasets export wikiclir/pl queries
[query_id] [title] [first_sent]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.pl.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: pl
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/pl")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/pl docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.pl')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | All other articles that link to the mate, and are linked by the mate | 1.8M | 71.9% |
2 | Document assigned to the (English) cross-lingual mate | 694K | 28.1% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/pl")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/pl qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.pl.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{ "docs": { "count": 1234316, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 693656 }, "qrels": { "count": 2471360, "fields": { "relevance": { "counts_by_value": { "2": 693604, "1": 1777756 } } } } }
WikiCLIR with Portuguese documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/pt")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, first_sent>
You can find more details about the Python API here.
ir_datasets export wikiclir/pt queries
[query_id] [title] [first_sent]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.pt.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: pt
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/pt")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/pt docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.pt')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | All other articles that link to the mate, and are linked by the mate | 1.1M | 64.9% |
2 | Document assigned to the (English) cross-lingual mate | 612K | 35.1% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/pt")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/pt qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.pt.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{ "docs": { "count": 973057, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 611732 }, "qrels": { "count": 1741889, "fields": { "relevance": { "counts_by_value": { "2": 611643, "1": 1130246 } } } } }
WikiCLIR with Romanian documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/ro")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, first_sent>
You can find more details about the Python API here.
ir_datasets export wikiclir/ro queries
[query_id] [title] [first_sent]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.ro.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: ro
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/ro")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/ro docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.ro')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | All other articles that link to the mate, and are linked by the mate | 252K | 55.8% |
2 | Document assigned to the (English) cross-lingual mate | 199K | 44.2% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/ro")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/ro qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.ro.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{ "docs": { "count": 376655, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 199264 }, "qrels": { "count": 451180, "fields": { "relevance": { "counts_by_value": { "2": 199253, "1": 251927 } } } } }
WikiCLIR with Russian documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/ru")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, first_sent>
You can find more details about the Python API here.
ir_datasets export wikiclir/ru queries
[query_id] [title] [first_sent]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.ru.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/ru")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/ru docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.ru')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | All other articles that link to the mate, and are linked by the mate | 1.7M | 71.4% |
2 | Document assigned to the (English) cross-lingual mate | 665K | 28.6% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/ru")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/ru qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.ru.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{ "docs": { "count": 1413945, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 664924 }, "qrels": { "count": 2321384, "fields": { "relevance": { "counts_by_value": { "2": 664780, "1": 1656604 } } } } }
WikiCLIR with Swedish documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/sv")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, first_sent>
You can find more details about the Python API here.
ir_datasets export wikiclir/sv queries
[query_id] [title] [first_sent]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.sv.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: sv
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/sv")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/sv docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.sv')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | All other articles that link to the mate, and are linked by the mate | 1.4M | 69.1% |
2 | Document assigned to the (English) cross-lingual mate | 639K | 30.9% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/sv")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/sv qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.sv.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{ "docs": { "count": 3785412, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 639073 }, "qrels": { "count": 2069453, "fields": { "relevance": { "counts_by_value": { "2": 638829, "1": 1430624 } } } } }
WikiCLIR with Swahili documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/sw")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, first_sent>
You can find more details about the Python API here.
ir_datasets export wikiclir/sw queries
[query_id] [title] [first_sent]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.sw.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: sw
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/sw")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/sw docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.sw')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | All other articles that link to the mate, and are linked by the mate | 35K | 60.5% |
2 | Document assigned to the (English) cross-lingual mate | 23K | 39.5% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/sw")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/sw qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.sw.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{ "docs": { "count": 37079, "fields": { "doc_id": { "max_len": 5, "common_prefix": "" } } }, "queries": { "count": 22860 }, "qrels": { "count": 57924, "fields": { "relevance": { "counts_by_value": { "2": 22859, "1": 35065 } } } } }
WikiCLIR with Tagalog documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/tl")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, first_sent>
You can find more details about the Python API here.
ir_datasets export wikiclir/tl queries
[query_id] [title] [first_sent]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.tl.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: tl
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/tl")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/tl docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.tl')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | All other articles that link to the mate, and are linked by the mate | 23K | 32.4% |
2 | Document assigned to the (English) cross-lingual mate | 49K | 67.6% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/tl")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/tl qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.tl.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{ "docs": { "count": 79008, "fields": { "doc_id": { "max_len": 6, "common_prefix": "" } } }, "queries": { "count": 48930 }, "qrels": { "count": 72359, "fields": { "relevance": { "counts_by_value": { "2": 48928, "1": 23431 } } } } }
WikiCLIR with Turkish documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/tr")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, first_sent>
You can find more details about the Python API here.
ir_datasets export wikiclir/tr queries
[query_id] [title] [first_sent]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.tr.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: tr
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/tr")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/tr docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.tr')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | All other articles that link to the mate, and are linked by the mate | 195K | 51.3% |
2 | Document assigned to the (English) cross-lingual mate | 185K | 48.7% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/tr")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/tr qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.tr.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{ "docs": { "count": 295593, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 185388 }, "qrels": { "count": 380651, "fields": { "relevance": { "counts_by_value": { "2": 185360, "1": 195291 } } } } }
WikiCLIR with Ukrainian documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/uk")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, first_sent>
You can find more details about the Python API here.
ir_datasets export wikiclir/uk queries
[query_id] [title] [first_sent]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.uk.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: uk
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/uk")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/uk docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.uk')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | All other articles that link to the mate, and are linked by the mate | 565K | 61.9% |
2 | Document assigned to the (English) cross-lingual mate | 348K | 38.1% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/uk")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/uk qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.uk.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{ "docs": { "count": 704903, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 348222 }, "qrels": { "count": 913358, "fields": { "relevance": { "counts_by_value": { "2": 348168, "1": 565190 } } } } }
WikiCLIR with Vietnamese documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/vi")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, first_sent>
You can find more details about the Python API here.
ir_datasets export wikiclir/vi queries
[query_id] [title] [first_sent]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.vi.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: vi
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/vi")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/vi docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.vi')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | All other articles that link to the mate, and are linked by the mate | 257K | 42.1% |
2 | Document assigned to the (English) cross-lingual mate | 354K | 57.9% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/vi")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/vi qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.vi.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{ "docs": { "count": 1392152, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 354312 }, "qrels": { "count": 611355, "fields": { "relevance": { "counts_by_value": { "2": 354279, "1": 257076 } } } } }
WikiCLIR with Chinese documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/zh")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, first_sent>
You can find more details about the Python API here.
ir_datasets export wikiclir/zh queries
[query_id] [title] [first_sent]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.zh.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/zh")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text>
You can find more details about the Python API here.
ir_datasets export wikiclir/zh docs
[doc_id] [title] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.zh')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | All other articles that link to the mate, and are linked by the mate | 463K | 50.0% |
2 | Document assigned to the (English) cross-lingual mate | 463K | 50.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikiclir/zh")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikiclir/zh qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.zh.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }{ "docs": { "count": 951480, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 463273 }, "qrels": { "count": 926130, "fields": { "relevance": { "counts_by_value": { "2": 463209, "1": 462921 } } } } }