ir_datasets : WikiCLIR

324K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/ar")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/ar queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.ar.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

535K docs

Language: ar

Document type:

WikiClirDoc: (namedtuple)

doc_id: str
title: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/ar")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/ar docs



[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.ar')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

519K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	All other articles that link to the mate, and are linked by the mate	`195K`	37.5%
2	Document assigned to the (English) cross-lingual mate	`324K`	62.5%

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/ar")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/ar qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.ar.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }

{
  "docs": {
    "count": 535118,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 324489
  },
  "qrels": {
    "count": 519269,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "2": 324475,
          "1": 194794
        }
      }
    }
  }
}

`"wikiclir/ca"`

WikiCLIR with Catalan documents.

340K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/ca")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/ca queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.ca.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

549K docs

Language: ca

Document type:

WikiClirDoc: (namedtuple)

doc_id: str
title: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/ca")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/ca docs



[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.ca')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

965K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	All other articles that link to the mate, and are linked by the mate	`626K`	64.8%
2	Document assigned to the (English) cross-lingual mate	`340K`	35.2%

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/ca")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/ca qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.ca.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }

{
  "docs": {
    "count": 548722,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 339586
  },
  "qrels": {
    "count": 965233,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "2": 339562,
          "1": 625671
        }
      }
    }
  }
}

`"wikiclir/cs"`

WikiCLIR with Czech documents.

234K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/cs")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/cs queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.cs.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

387K docs

Language: cs

Document type:

WikiClirDoc: (namedtuple)

doc_id: str
title: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/cs")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/cs docs



[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.cs')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

954K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	All other articles that link to the mate, and are linked by the mate	`721K`	75.5%
2	Document assigned to the (English) cross-lingual mate	`234K`	24.5%

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/cs")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/cs qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.cs.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }

{
  "docs": {
    "count": 386906,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 233553
  },
  "qrels": {
    "count": 954370,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "2": 233535,
          "1": 720835
        }
      }
    }
  }
}

`"wikiclir/de"`

WikiCLIR with German documents.

938K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/de")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/de queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.de.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

2.1M docs

Language: de

Document type:

WikiClirDoc: (namedtuple)

doc_id: str
title: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/de")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/de docs



[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.de')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

5.6M qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	All other articles that link to the mate, and are linked by the mate	`4.6M`	83.1%
2	Document assigned to the (English) cross-lingual mate	`938K`	16.9%

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/de")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/de qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.de.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }

{
  "docs": {
    "count": 2091278,
    "fields": {
      "doc_id": {
        "max_len": 8,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 938217
  },
  "qrels": {
    "count": 5550454,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "2": 938194,
          "1": 4612260
        }
      }
    }
  }
}

`"wikiclir/en-simple"`

WikiCLIR with Simple English documents.

115K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/en-simple")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/en-simple queries



[query_id]    [text]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikiclir/en-simple')
index_ref = pt.IndexRef.of('./indices/wikiclir_en-simple') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.en-simple.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

127K docs

Language: en

Document type:

WikiClirDoc: (namedtuple)

doc_id: str
title: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/en-simple")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/en-simple docs



[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikiclir/en-simple')
# Index wikiclir/en-simple
indexer = pt.IterDictIndexer('./indices/wikiclir_en-simple')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'text'])

You can find more details about PyTerrier indexing here.

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.en-simple')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

250K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	All other articles that link to the mate, and are linked by the mate	`136K`	54.2%
2	Document assigned to the (English) cross-lingual mate	`115K`	45.8%

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/en-simple")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/en-simple qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:wikiclir/en-simple')
index_ref = pt.IndexRef.of('./indices/wikiclir_en-simple') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)

You can find more details about PyTerrier experiments here.

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.en-simple.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }

{
  "docs": {
    "count": 127089,
    "fields": {
      "doc_id": {
        "max_len": 6,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 114572
  },
  "qrels": {
    "count": 250380,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "2": 114564,
          "1": 135816
        }
      }
    }
  }
}

`"wikiclir/es"`

WikiCLIR with Spanish documents.

782K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/es")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/es queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.es.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

1.3M docs

Language: es

Document type:

WikiClirDoc: (namedtuple)

doc_id: str
title: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/es")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/es docs



[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.es')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

2.9M qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	All other articles that link to the mate, and are linked by the mate	`2.1M`	73.0%
2	Document assigned to the (English) cross-lingual mate	`781K`	27.0%

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/es")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/es qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.es.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }

{
  "docs": {
    "count": 1302958,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 781642
  },
  "qrels": {
    "count": 2894807,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "2": 781376,
          "1": 2113431
        }
      }
    }
  }
}

`"wikiclir/fi"`

WikiCLIR with Finnish documents.

274K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/fi")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/fi queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.fi.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

419K docs

Language: fi

Document type:

WikiClirDoc: (namedtuple)

doc_id: str
title: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/fi")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/fi docs



[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.fi')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

940K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	All other articles that link to the mate, and are linked by the mate	`666K`	70.9%
2	Document assigned to the (English) cross-lingual mate	`274K`	29.1%

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/fi")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/fi qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.fi.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }

{
  "docs": {
    "count": 418677,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 273819
  },
  "qrels": {
    "count": 939613,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "2": 273796,
          "1": 665817
        }
      }
    }
  }
}

`"wikiclir/fr"`

WikiCLIR with French documents.

1.1M queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/fr")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/fr queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.fr.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

1.9M docs

Language: fr

Document type:

WikiClirDoc: (namedtuple)

doc_id: str
title: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/fr")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/fr docs



[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.fr')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

5.1M qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	All other articles that link to the mate, and are linked by the mate	`4.0M`	78.8%
2	Document assigned to the (English) cross-lingual mate	`1.1M`	21.2%

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/fr")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/fr qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.fr.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }

{
  "docs": {
    "count": 1894397,
    "fields": {
      "doc_id": {
        "max_len": 8,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 1089179
  },
  "qrels": {
    "count": 5137366,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "2": 1089052,
          "1": 4048314
        }
      }
    }
  }
}

`"wikiclir/it"`

WikiCLIR with Italian documents.

809K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/it")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/it queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.it.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

1.3M docs

Language: it

Document type:

WikiClirDoc: (namedtuple)

doc_id: str
title: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/it")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/it docs



[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.it')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

3.4M qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	All other articles that link to the mate, and are linked by the mate	`2.6M`	76.5%
2	Document assigned to the (English) cross-lingual mate	`808K`	23.5%

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/it")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/it qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.it.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }

{
  "docs": {
    "count": 1347011,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 808605
  },
  "qrels": {
    "count": 3443633,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "2": 808345,
          "1": 2635288
        }
      }
    }
  }
}

`"wikiclir/ja"`

WikiCLIR with Japanese documents.

426K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/ja")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/ja queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.ja.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

1.1M docs

Language: ja

Document type:

WikiClirDoc: (namedtuple)

doc_id: str
title: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/ja")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/ja docs



[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.ja')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

3.3M qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	All other articles that link to the mate, and are linked by the mate	`2.9M`	87.2%
2	Document assigned to the (English) cross-lingual mate	`426K`	12.8%

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/ja")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/ja qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.ja.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }

{
  "docs": {
    "count": 1071292,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 426431
  },
  "qrels": {
    "count": 3338667,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "2": 426383,
          "1": 2912284
        }
      }
    }
  }
}

`"wikiclir/ko"`

WikiCLIR with Korean documents.

225K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/ko")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/ko queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.ko.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

394K docs

Language: ko

Document type:

WikiClirDoc: (namedtuple)

doc_id: str
title: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/ko")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/ko docs



[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.ko')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

568K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	All other articles that link to the mate, and are linked by the mate	`343K`	60.4%
2	Document assigned to the (English) cross-lingual mate	`225K`	39.6%

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/ko")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/ko qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.ko.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }

{
  "docs": {
    "count": 394177,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 224855
  },
  "qrels": {
    "count": 568205,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "2": 224843,
          "1": 343362
        }
      }
    }
  }
}

`"wikiclir/nl"`

WikiCLIR with Dutch documents.

688K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/nl")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/nl queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.nl.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

1.9M docs

Language: nl

Document type:

WikiClirDoc: (namedtuple)

doc_id: str
title: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/nl")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/nl docs



[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.nl')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

2.3M qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	All other articles that link to the mate, and are linked by the mate	`1.6M`	70.5%
2	Document assigned to the (English) cross-lingual mate	`688K`	29.5%

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/nl")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/nl qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.nl.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }

{
  "docs": {
    "count": 1908260,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 687718
  },
  "qrels": {
    "count": 2334644,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "2": 687672,
          "1": 1646972
        }
      }
    }
  }
}

`"wikiclir/nn"`

WikiCLIR with Norwegian (Bokmål) documents.

99K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/nn")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/nn queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.nn.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

133K docs

Language: nn

Document type:

WikiClirDoc: (namedtuple)

doc_id: str
title: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/nn")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/nn docs



[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.nn')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

250K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	All other articles that link to the mate, and are linked by the mate	`151K`	60.2%
2	Document assigned to the (English) cross-lingual mate	`99K`	39.8%

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/nn")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/nn qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.nn.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }

{
  "docs": {
    "count": 133290,
    "fields": {
      "doc_id": {
        "max_len": 6,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 99493
  },
  "qrels": {
    "count": 250141,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "2": 99465,
          "1": 150676
        }
      }
    }
  }
}

`"wikiclir/no"`

WikiCLIR with Norwegian (Nynorsk) documents.

300K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/no")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/no queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.no.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

471K docs

Language: no

Document type:

WikiClirDoc: (namedtuple)

doc_id: str
title: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/no")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/no docs



[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.no')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

964K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	All other articles that link to the mate, and are linked by the mate	`664K`	68.9%
2	Document assigned to the (English) cross-lingual mate	`300K`	31.1%

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/no")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/no qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.no.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }

{
  "docs": {
    "count": 471420,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 299897
  },
  "qrels": {
    "count": 963514,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "2": 299831,
          "1": 663683
        }
      }
    }
  }
}

`"wikiclir/pl"`

WikiCLIR with Polish documents.

694K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/pl")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/pl queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.pl.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

1.2M docs

Language: pl

Document type:

WikiClirDoc: (namedtuple)

doc_id: str
title: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/pl")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/pl docs



[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.pl')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

2.5M qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	All other articles that link to the mate, and are linked by the mate	`1.8M`	71.9%
2	Document assigned to the (English) cross-lingual mate	`694K`	28.1%

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/pl")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/pl qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.pl.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }

{
  "docs": {
    "count": 1234316,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 693656
  },
  "qrels": {
    "count": 2471360,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "2": 693604,
          "1": 1777756
        }
      }
    }
  }
}

`"wikiclir/pt"`

WikiCLIR with Portuguese documents.

612K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/pt")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/pt queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.pt.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

973K docs

Language: pt

Document type:

WikiClirDoc: (namedtuple)

doc_id: str
title: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/pt")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/pt docs



[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.pt')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

1.7M qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	All other articles that link to the mate, and are linked by the mate	`1.1M`	64.9%
2	Document assigned to the (English) cross-lingual mate	`612K`	35.1%

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/pt")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/pt qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.pt.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }

{
  "docs": {
    "count": 973057,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 611732
  },
  "qrels": {
    "count": 1741889,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "2": 611643,
          "1": 1130246
        }
      }
    }
  }
}

`"wikiclir/ro"`

WikiCLIR with Romanian documents.

199K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/ro")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/ro queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.ro.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

377K docs

Language: ro

Document type:

WikiClirDoc: (namedtuple)

doc_id: str
title: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/ro")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/ro docs



[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.ro')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

451K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	All other articles that link to the mate, and are linked by the mate	`252K`	55.8%
2	Document assigned to the (English) cross-lingual mate	`199K`	44.2%

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/ro")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/ro qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.ro.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }

{
  "docs": {
    "count": 376655,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 199264
  },
  "qrels": {
    "count": 451180,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "2": 199253,
          "1": 251927
        }
      }
    }
  }
}

`"wikiclir/ru"`

WikiCLIR with Russian documents.

665K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/ru")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/ru queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.ru.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

1.4M docs

Language: ru

Document type:

WikiClirDoc: (namedtuple)

doc_id: str
title: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/ru")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/ru docs



[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.ru')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

2.3M qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	All other articles that link to the mate, and are linked by the mate	`1.7M`	71.4%
2	Document assigned to the (English) cross-lingual mate	`665K`	28.6%

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/ru")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/ru qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.ru.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }

{
  "docs": {
    "count": 1413945,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 664924
  },
  "qrels": {
    "count": 2321384,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "2": 664780,
          "1": 1656604
        }
      }
    }
  }
}

`"wikiclir/sv"`

WikiCLIR with Swedish documents.

639K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/sv")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/sv queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.sv.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

3.8M docs

Language: sv

Document type:

WikiClirDoc: (namedtuple)

doc_id: str
title: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/sv")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/sv docs



[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.sv')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

2.1M qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	All other articles that link to the mate, and are linked by the mate	`1.4M`	69.1%
2	Document assigned to the (English) cross-lingual mate	`639K`	30.9%

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/sv")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/sv qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.sv.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }

{
  "docs": {
    "count": 3785412,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 639073
  },
  "qrels": {
    "count": 2069453,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "2": 638829,
          "1": 1430624
        }
      }
    }
  }
}

`"wikiclir/sw"`

WikiCLIR with Swahili documents.

23K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/sw")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/sw queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.sw.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

37K docs

Language: sw

Document type:

WikiClirDoc: (namedtuple)

doc_id: str
title: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/sw")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/sw docs



[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.sw')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

58K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	All other articles that link to the mate, and are linked by the mate	`35K`	60.5%
2	Document assigned to the (English) cross-lingual mate	`23K`	39.5%

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/sw")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/sw qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.sw.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }

{
  "docs": {
    "count": 37079,
    "fields": {
      "doc_id": {
        "max_len": 5,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 22860
  },
  "qrels": {
    "count": 57924,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "2": 22859,
          "1": 35065
        }
      }
    }
  }
}

`"wikiclir/tl"`

WikiCLIR with Tagalog documents.

49K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/tl")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/tl queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.tl.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

79K docs

Language: tl

Document type:

WikiClirDoc: (namedtuple)

doc_id: str
title: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/tl")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/tl docs



[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.tl')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

72K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	All other articles that link to the mate, and are linked by the mate	`23K`	32.4%
2	Document assigned to the (English) cross-lingual mate	`49K`	67.6%

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/tl")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/tl qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.tl.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }

{
  "docs": {
    "count": 79008,
    "fields": {
      "doc_id": {
        "max_len": 6,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 48930
  },
  "qrels": {
    "count": 72359,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "2": 48928,
          "1": 23431
        }
      }
    }
  }
}

`"wikiclir/tr"`

WikiCLIR with Turkish documents.

185K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/tr")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/tr queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.tr.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

296K docs

Language: tr

Document type:

WikiClirDoc: (namedtuple)

doc_id: str
title: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/tr")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/tr docs



[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.tr')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

381K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	All other articles that link to the mate, and are linked by the mate	`195K`	51.3%
2	Document assigned to the (English) cross-lingual mate	`185K`	48.7%

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/tr")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/tr qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.tr.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }

{
  "docs": {
    "count": 295593,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 185388
  },
  "qrels": {
    "count": 380651,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "2": 185360,
          "1": 195291
        }
      }
    }
  }
}

`"wikiclir/uk"`

WikiCLIR with Ukrainian documents.

348K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/uk")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/uk queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.uk.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

705K docs

Language: uk

Document type:

WikiClirDoc: (namedtuple)

doc_id: str
title: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/uk")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/uk docs



[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.uk')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

913K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	All other articles that link to the mate, and are linked by the mate	`565K`	61.9%
2	Document assigned to the (English) cross-lingual mate	`348K`	38.1%

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/uk")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/uk qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.uk.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }

{
  "docs": {
    "count": 704903,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 348222
  },
  "qrels": {
    "count": 913358,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "2": 348168,
          "1": 565190
        }
      }
    }
  }
}

`"wikiclir/vi"`

WikiCLIR with Vietnamese documents.

354K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/vi")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/vi queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.vi.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

1.4M docs

Language: vi

Document type:

WikiClirDoc: (namedtuple)

doc_id: str
title: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/vi")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/vi docs



[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.vi')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

611K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	All other articles that link to the mate, and are linked by the mate	`257K`	42.1%
2	Document assigned to the (English) cross-lingual mate	`354K`	57.9%

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/vi")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/vi qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.vi.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }

{
  "docs": {
    "count": 1392152,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 354312
  },
  "qrels": {
    "count": 611355,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "2": 354279,
          "1": 257076
        }
      }
    }
  }
}

`"wikiclir/zh"`

WikiCLIR with Chinese documents.

463K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/zh")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/zh queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.wikiclir.zh.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

951K docs

Language: zh

Document type:

WikiClirDoc: (namedtuple)

doc_id: str
title: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/zh")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/zh docs



[doc_id]    [title]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.wikiclir.zh')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

926K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	All other articles that link to the mate, and are linked by the mate	`463K`	50.0%
2	Document assigned to the (English) cross-lingual mate	`463K`	50.0%

Examples:

import ir_datasets
dataset = ir_datasets.load("wikiclir/zh")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikiclir/zh qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.wikiclir.zh.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

\cite{sasaki-etal-2018-cross}

Bibtex:

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }