ir_datasets
: WikIRA suite of IR benchmarks in multiple languages built from Wikipeida.
Bibtex:
@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }A small version of WikIR for English.
Test set of wikir/en1k. Scoreddocs are the provided BM25 run.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en1k/test")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Training set of wikir/en1k. Scoreddocs are the provided BM25 run.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en1k/training")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Validation set of wikir/en1k. Scoreddocs are the provided BM25 run.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en1k/validation")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
WikIR for English.
Test set of wikir/en59k. Scoreddocs are the provided BM25 run.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en59k/test")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Training set of wikir/en59k. Scoreddocs are the provided BM25 run.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en59k/training")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Validation set of wikir/en59k. Scoreddocs are the provided BM25 run.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en59k/validation")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
WikIR for Spanish.
Test set of wikir/es13k. Scoreddocs are the provided BM25 run.
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/es13k/test")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Training set of wikir/es13k. Scoreddocs are the provided BM25 run.
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/es13k/training")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Validation set of wikir/es13k. Scoreddocs are the provided BM25 run.
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/es13k/validation")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
WikIR for French.
Test set of wikir/fr14k. Scoreddocs are the provided BM25 run.
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/fr14k/test")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Training set of wikir/fr14k. Scoreddocs are the provided BM25 run.
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/fr14k/training")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Validation set of wikir/fr14k. Scoreddocs are the provided BM25 run.
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/fr14k/validation")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
WikIR for Italian.
Test set of wikir/it16k. Scoreddocs are the provided BM25 run.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/it16k/test")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Training set of wikir/it16k. Scoreddocs are the provided BM25 run.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/it16k/training")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Validation set of wikir/it16k. Scoreddocs are the provided BM25 run.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/it16k/validation")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.