ir_datasets
: WikIRA suite of IR benchmarks in multiple languages built from Wikipeida.
Bibtex:
@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }A small version of WikIR for English.
Test set of wikir/en1k. Scoreddocs are the provided BM25 run.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en1k/test")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Training set of wikir/en1k. Scoreddocs are the provided BM25 run.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en1k/training")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Validation set of wikir/en1k. Scoreddocs are the provided BM25 run.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en1k/validation")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
WikIR for English.
Test set of wikir/en59k. Scoreddocs are the provided BM25 run.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en59k/test")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Training set of wikir/en59k. Scoreddocs are the provided BM25 run.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en59k/training")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Validation set of wikir/en59k. Scoreddocs are the provided BM25 run.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en59k/validation")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
WikIR for English. This is one of the two versions used in Frej2020Wikir.
Test set of wikir/en78k. Scoreddocs are the provided BM25 run.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en78k/test")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Training set of wikir/en78k. Scoreddocs are the provided BM25 run.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en78k/training")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Validation set of wikir/en78k. Scoreddocs are the provided BM25 run.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en78k/validation")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
WikIR for English, using the first sentences of articles as queries. This is one of the two versions used in Frej2020Wikir.
Test set of wikir/ens78k. Scoreddocs are the provided BM25 run.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/ens78k/test")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Training set of wikir/ens78k. Scoreddocs are the provided BM25 run.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/ens78k/training")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Validation set of wikir/ens78k. Scoreddocs are the provided BM25 run.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/ens78k/validation")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
WikIR for Spanish.
Test set of wikir/es13k. Scoreddocs are the provided BM25 run.
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/es13k/test")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Training set of wikir/es13k. Scoreddocs are the provided BM25 run.
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/es13k/training")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Validation set of wikir/es13k. Scoreddocs are the provided BM25 run.
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/es13k/validation")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
WikIR for French.
Test set of wikir/fr14k. Scoreddocs are the provided BM25 run.
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/fr14k/test")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Training set of wikir/fr14k. Scoreddocs are the provided BM25 run.
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/fr14k/training")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Validation set of wikir/fr14k. Scoreddocs are the provided BM25 run.
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/fr14k/validation")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
WikIR for Italian.
Test set of wikir/it16k. Scoreddocs are the provided BM25 run.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/it16k/test")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Training set of wikir/it16k. Scoreddocs are the provided BM25 run.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/it16k/training")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Validation set of wikir/it16k. Scoreddocs are the provided BM25 run.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/it16k/validation")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.