← home
Github: datasets/wikir.py

ir_datasets: WikIR

Index
  1. wikir
  2. wikir/en1k
  3. wikir/en1k/test
  4. wikir/en1k/training
  5. wikir/en1k/validation
  6. wikir/en59k
  7. wikir/en59k/test
  8. wikir/en59k/training
  9. wikir/en59k/validation
  10. wikir/en78k
  11. wikir/en78k/test
  12. wikir/en78k/training
  13. wikir/en78k/validation
  14. wikir/ens78k
  15. wikir/ens78k/test
  16. wikir/ens78k/training
  17. wikir/ens78k/validation
  18. wikir/es13k
  19. wikir/es13k/test
  20. wikir/es13k/training
  21. wikir/es13k/validation
  22. wikir/fr14k
  23. wikir/fr14k/test
  24. wikir/fr14k/training
  25. wikir/fr14k/validation
  26. wikir/it16k
  27. wikir/it16k/test
  28. wikir/it16k/training
  29. wikir/it16k/validation

"wikir"

A suite of IR benchmarks in multiple languages built from Wikipeida.

Citation

ir_datasets.bib:

\cite{Frej2020Wikir,Frej2020MlWikir}

Bibtex:

@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

"wikir/en1k"

A small version of WikIR for English.

docsCitation

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("wikir/en1k")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.


"wikir/en1k/test"

Test set of wikir/en1k. Scoreddocs are the provided BM25 run.

queriesdocsqrelsscoreddocsCitation

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("wikir/en1k/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"wikir/en1k/training"

Training set of wikir/en1k. Scoreddocs are the provided BM25 run.

queriesdocsqrelsscoreddocsCitation

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("wikir/en1k/training")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"wikir/en1k/validation"

Validation set of wikir/en1k. Scoreddocs are the provided BM25 run.

queriesdocsqrelsscoreddocsCitation

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("wikir/en1k/validation")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"wikir/en59k"

WikIR for English.

docsCitation

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("wikir/en59k")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.


"wikir/en59k/test"

Test set of wikir/en59k. Scoreddocs are the provided BM25 run.

queriesdocsqrelsscoreddocsCitation

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("wikir/en59k/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"wikir/en59k/training"

Training set of wikir/en59k. Scoreddocs are the provided BM25 run.

queriesdocsqrelsscoreddocsCitation

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("wikir/en59k/training")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"wikir/en59k/validation"

Validation set of wikir/en59k. Scoreddocs are the provided BM25 run.

queriesdocsqrelsscoreddocsCitation

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("wikir/en59k/validation")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"wikir/en78k"

WikIR for English. This is one of the two versions used in Frej2020Wikir.

docsCitation

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("wikir/en78k")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.


"wikir/en78k/test"

Test set of wikir/en78k. Scoreddocs are the provided BM25 run.

queriesdocsqrelsscoreddocsCitation

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("wikir/en78k/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"wikir/en78k/training"

Training set of wikir/en78k. Scoreddocs are the provided BM25 run.

queriesdocsqrelsscoreddocsCitation

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("wikir/en78k/training")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"wikir/en78k/validation"

Validation set of wikir/en78k. Scoreddocs are the provided BM25 run.

queriesdocsqrelsscoreddocsCitation

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("wikir/en78k/validation")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"wikir/ens78k"

WikIR for English, using the first sentences of articles as queries. This is one of the two versions used in Frej2020Wikir.

docsCitation

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("wikir/ens78k")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.


"wikir/ens78k/test"

Test set of wikir/ens78k. Scoreddocs are the provided BM25 run.

queriesdocsqrelsscoreddocsCitation

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("wikir/ens78k/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"wikir/ens78k/training"

Training set of wikir/ens78k. Scoreddocs are the provided BM25 run.

queriesdocsqrelsscoreddocsCitation

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("wikir/ens78k/training")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"wikir/ens78k/validation"

Validation set of wikir/ens78k. Scoreddocs are the provided BM25 run.

queriesdocsqrelsscoreddocsCitation

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("wikir/ens78k/validation")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"wikir/es13k"

WikIR for Spanish.

docsCitation

Language: es

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("wikir/es13k")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.


"wikir/es13k/test"

Test set of wikir/es13k. Scoreddocs are the provided BM25 run.

queriesdocsqrelsscoreddocsCitation

Language: es

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("wikir/es13k/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"wikir/es13k/training"

Training set of wikir/es13k. Scoreddocs are the provided BM25 run.

queriesdocsqrelsscoreddocsCitation

Language: es

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("wikir/es13k/training")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"wikir/es13k/validation"

Validation set of wikir/es13k. Scoreddocs are the provided BM25 run.

queriesdocsqrelsscoreddocsCitation

Language: es

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("wikir/es13k/validation")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"wikir/fr14k"

WikIR for French.

docsCitation

Language: fr

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("wikir/fr14k")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.


"wikir/fr14k/test"

Test set of wikir/fr14k. Scoreddocs are the provided BM25 run.

queriesdocsqrelsscoreddocsCitation

Language: fr

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("wikir/fr14k/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"wikir/fr14k/training"

Training set of wikir/fr14k. Scoreddocs are the provided BM25 run.

queriesdocsqrelsscoreddocsCitation

Language: fr

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("wikir/fr14k/training")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"wikir/fr14k/validation"

Validation set of wikir/fr14k. Scoreddocs are the provided BM25 run.

queriesdocsqrelsscoreddocsCitation

Language: fr

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("wikir/fr14k/validation")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"wikir/it16k"

WikIR for Italian.

docsCitation

Language: it

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("wikir/it16k")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.


"wikir/it16k/test"

Test set of wikir/it16k. Scoreddocs are the provided BM25 run.

queriesdocsqrelsscoreddocsCitation

Language: it

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("wikir/it16k/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"wikir/it16k/training"

Training set of wikir/it16k. Scoreddocs are the provided BM25 run.

queriesdocsqrelsscoreddocsCitation

Language: it

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("wikir/it16k/training")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"wikir/it16k/validation"

Validation set of wikir/it16k. Scoreddocs are the provided BM25 run.

queriesdocsqrelsscoreddocsCitation

Language: it

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("wikir/it16k/validation")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.