← home
Github: datasets/wikir.py

ir_datasets: WikIR

Index
  1. wikir
  2. wikir/en1k
  3. wikir/en1k/test
  4. wikir/en1k/training
  5. wikir/en1k/validation
  6. wikir/en59k
  7. wikir/en59k/test
  8. wikir/en59k/training
  9. wikir/en59k/validation
  10. wikir/es13k
  11. wikir/es13k/test
  12. wikir/es13k/training
  13. wikir/es13k/validation
  14. wikir/fr14k
  15. wikir/fr14k/test
  16. wikir/fr14k/training
  17. wikir/fr14k/validation
  18. wikir/it16k
  19. wikir/it16k/test
  20. wikir/it16k/training
  21. wikir/it16k/validation

"wikir"

A suite of IR benchmarks in multiple languages built from Wikipeida.

Citation
bibtex: @inproceedings{Frej2020WIKIRAP, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MLWIKIRAP, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

"wikir/en1k"

A small version of WikIR for English.

docs

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('wikir/en1k')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
Citation
bibtex: @inproceedings{Frej2020WIKIRAP, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MLWIKIRAP, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

"wikir/en1k/test"

Test set of wikir/en1k. Scoreddocs are the provided BM25 run.

queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('wikir/en1k/test')
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
docs

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('wikir/en1k/test')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
0Otherwise
1There is a link to the article with the query as its title in the first sentence
2Query is the article title

Example

import ir_datasets
dataset = ir_datasets.load('wikir/en1k/test')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Example

import ir_datasets
dataset = ir_datasets.load('wikir/en1k/test')
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>
Citation
bibtex: @inproceedings{Frej2020WIKIRAP, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MLWIKIRAP, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

"wikir/en1k/training"

Training set of wikir/en1k. Scoreddocs are the provided BM25 run.

queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('wikir/en1k/training')
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
docs

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('wikir/en1k/training')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
0Otherwise
1There is a link to the article with the query as its title in the first sentence
2Query is the article title

Example

import ir_datasets
dataset = ir_datasets.load('wikir/en1k/training')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Example

import ir_datasets
dataset = ir_datasets.load('wikir/en1k/training')
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>
Citation
bibtex: @inproceedings{Frej2020WIKIRAP, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MLWIKIRAP, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

"wikir/en1k/validation"

Validation set of wikir/en1k. Scoreddocs are the provided BM25 run.

queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('wikir/en1k/validation')
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
docs

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('wikir/en1k/validation')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
0Otherwise
1There is a link to the article with the query as its title in the first sentence
2Query is the article title

Example

import ir_datasets
dataset = ir_datasets.load('wikir/en1k/validation')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Example

import ir_datasets
dataset = ir_datasets.load('wikir/en1k/validation')
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>
Citation
bibtex: @inproceedings{Frej2020WIKIRAP, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MLWIKIRAP, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

"wikir/en59k"

WikIR for English.

docs

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('wikir/en59k')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
Citation
bibtex: @inproceedings{Frej2020WIKIRAP, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MLWIKIRAP, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

"wikir/en59k/test"

Test set of wikir/en59k. Scoreddocs are the provided BM25 run.

queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('wikir/en59k/test')
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
docs

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('wikir/en59k/test')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
0Otherwise
1There is a link to the article with the query as its title in the first sentence
2Query is the article title

Example

import ir_datasets
dataset = ir_datasets.load('wikir/en59k/test')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Example

import ir_datasets
dataset = ir_datasets.load('wikir/en59k/test')
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>
Citation
bibtex: @inproceedings{Frej2020WIKIRAP, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MLWIKIRAP, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

"wikir/en59k/training"

Training set of wikir/en59k. Scoreddocs are the provided BM25 run.

queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('wikir/en59k/training')
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
docs

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('wikir/en59k/training')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
0Otherwise
1There is a link to the article with the query as its title in the first sentence
2Query is the article title

Example

import ir_datasets
dataset = ir_datasets.load('wikir/en59k/training')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Example

import ir_datasets
dataset = ir_datasets.load('wikir/en59k/training')
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>
Citation
bibtex: @inproceedings{Frej2020WIKIRAP, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MLWIKIRAP, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

"wikir/en59k/validation"

Validation set of wikir/en59k. Scoreddocs are the provided BM25 run.

queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('wikir/en59k/validation')
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
docs

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('wikir/en59k/validation')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
0Otherwise
1There is a link to the article with the query as its title in the first sentence
2Query is the article title

Example

import ir_datasets
dataset = ir_datasets.load('wikir/en59k/validation')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Example

import ir_datasets
dataset = ir_datasets.load('wikir/en59k/validation')
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>
Citation
bibtex: @inproceedings{Frej2020WIKIRAP, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MLWIKIRAP, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

"wikir/es13k"

WikIR for Spanish.

docs

Language: es

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('wikir/es13k')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
Citation
bibtex: @inproceedings{Frej2020WIKIRAP, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MLWIKIRAP, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

"wikir/es13k/test"

Test set of wikir/es13k. Scoreddocs are the provided BM25 run.

queries

Language: es

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('wikir/es13k/test')
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
docs

Language: es

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('wikir/es13k/test')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
0Otherwise
1There is a link to the article with the query as its title in the first sentence
2Query is the article title

Example

import ir_datasets
dataset = ir_datasets.load('wikir/es13k/test')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Example

import ir_datasets
dataset = ir_datasets.load('wikir/es13k/test')
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>
Citation
bibtex: @inproceedings{Frej2020WIKIRAP, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MLWIKIRAP, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

"wikir/es13k/training"

Training set of wikir/es13k. Scoreddocs are the provided BM25 run.

queries

Language: es

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('wikir/es13k/training')
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
docs

Language: es

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('wikir/es13k/training')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
0Otherwise
1There is a link to the article with the query as its title in the first sentence
2Query is the article title

Example

import ir_datasets
dataset = ir_datasets.load('wikir/es13k/training')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Example

import ir_datasets
dataset = ir_datasets.load('wikir/es13k/training')
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>
Citation
bibtex: @inproceedings{Frej2020WIKIRAP, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MLWIKIRAP, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

"wikir/es13k/validation"

Validation set of wikir/es13k. Scoreddocs are the provided BM25 run.

queries

Language: es

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('wikir/es13k/validation')
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
docs

Language: es

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('wikir/es13k/validation')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
0Otherwise
1There is a link to the article with the query as its title in the first sentence
2Query is the article title

Example

import ir_datasets
dataset = ir_datasets.load('wikir/es13k/validation')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Example

import ir_datasets
dataset = ir_datasets.load('wikir/es13k/validation')
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>
Citation
bibtex: @inproceedings{Frej2020WIKIRAP, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MLWIKIRAP, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

"wikir/fr14k"

WikIR for French.

docs

Language: fr

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('wikir/fr14k')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
Citation
bibtex: @inproceedings{Frej2020WIKIRAP, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MLWIKIRAP, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

"wikir/fr14k/test"

Test set of wikir/fr14k. Scoreddocs are the provided BM25 run.

queries

Language: fr

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('wikir/fr14k/test')
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
docs

Language: fr

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('wikir/fr14k/test')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
0Otherwise
1There is a link to the article with the query as its title in the first sentence
2Query is the article title

Example

import ir_datasets
dataset = ir_datasets.load('wikir/fr14k/test')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Example

import ir_datasets
dataset = ir_datasets.load('wikir/fr14k/test')
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>
Citation
bibtex: @inproceedings{Frej2020WIKIRAP, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MLWIKIRAP, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

"wikir/fr14k/training"

Training set of wikir/fr14k. Scoreddocs are the provided BM25 run.

queries

Language: fr

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('wikir/fr14k/training')
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
docs

Language: fr

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('wikir/fr14k/training')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
0Otherwise
1There is a link to the article with the query as its title in the first sentence
2Query is the article title

Example

import ir_datasets
dataset = ir_datasets.load('wikir/fr14k/training')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Example

import ir_datasets
dataset = ir_datasets.load('wikir/fr14k/training')
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>
Citation
bibtex: @inproceedings{Frej2020WIKIRAP, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MLWIKIRAP, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

"wikir/fr14k/validation"

Validation set of wikir/fr14k. Scoreddocs are the provided BM25 run.

queries

Language: fr

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('wikir/fr14k/validation')
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
docs

Language: fr

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('wikir/fr14k/validation')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
0Otherwise
1There is a link to the article with the query as its title in the first sentence
2Query is the article title

Example

import ir_datasets
dataset = ir_datasets.load('wikir/fr14k/validation')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Example

import ir_datasets
dataset = ir_datasets.load('wikir/fr14k/validation')
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>
Citation
bibtex: @inproceedings{Frej2020WIKIRAP, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MLWIKIRAP, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

"wikir/it16k"

WikIR for Italian.

docs

Language: it

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('wikir/it16k')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
Citation
bibtex: @inproceedings{Frej2020WIKIRAP, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MLWIKIRAP, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

"wikir/it16k/test"

Test set of wikir/it16k. Scoreddocs are the provided BM25 run.

queries

Language: it

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('wikir/it16k/test')
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
docs

Language: it

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('wikir/it16k/test')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
0Otherwise
1There is a link to the article with the query as its title in the first sentence
2Query is the article title

Example

import ir_datasets
dataset = ir_datasets.load('wikir/it16k/test')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Example

import ir_datasets
dataset = ir_datasets.load('wikir/it16k/test')
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>
Citation
bibtex: @inproceedings{Frej2020WIKIRAP, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MLWIKIRAP, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

"wikir/it16k/training"

Training set of wikir/it16k. Scoreddocs are the provided BM25 run.

queries

Language: it

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('wikir/it16k/training')
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
docs

Language: it

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('wikir/it16k/training')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
0Otherwise
1There is a link to the article with the query as its title in the first sentence
2Query is the article title

Example

import ir_datasets
dataset = ir_datasets.load('wikir/it16k/training')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Example

import ir_datasets
dataset = ir_datasets.load('wikir/it16k/training')
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>
Citation
bibtex: @inproceedings{Frej2020WIKIRAP, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MLWIKIRAP, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

"wikir/it16k/validation"

Validation set of wikir/it16k. Scoreddocs are the provided BM25 run.

queries

Language: it

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('wikir/it16k/validation')
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
docs

Language: it

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('wikir/it16k/validation')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>
qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
0Otherwise
1There is a link to the article with the query as its title in the first sentence
2Query is the article title

Example

import ir_datasets
dataset = ir_datasets.load('wikir/it16k/validation')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Example

import ir_datasets
dataset = ir_datasets.load('wikir/it16k/validation')
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>
Citation
bibtex: @inproceedings{Frej2020WIKIRAP, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MLWIKIRAP, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }