ir_datasets
: WikIRA suite of IR benchmarks in multiple languages built from Wikipeida.
Bibtex:
@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }A small version of WikIR for English.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en1k")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/en1k docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/en1k')
# Index wikir/en1k
indexer = pt.IterDictIndexer('./indices/wikir_en1k')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Bibtex:
@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }Test set of wikir/en1k. Scoreddocs are the provided BM25 run.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en1k/test")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/en1k/test queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/en1k/test')
index_ref = pt.IndexRef.of('./indices/wikir_en1k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
Inherits docs from wikir/en1k
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en1k/test")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/en1k/test docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/en1k/test')
# Index wikir/en1k
indexer = pt.IterDictIndexer('./indices/wikir_en1k')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|---|
0 | Otherwise |
1 | There is a link to the article with the query as its title in the first sentence |
2 | Query is the article title |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en1k/test")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikir/en1k/test qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:wikir/en1k/test')
index_ref = pt.IndexRef.of('./indices/wikir_en1k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en1k/test")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export wikir/en1k/test scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }Training set of wikir/en1k. Scoreddocs are the provided BM25 run.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en1k/training")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/en1k/training queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/en1k/training')
index_ref = pt.IndexRef.of('./indices/wikir_en1k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
Inherits docs from wikir/en1k
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en1k/training")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/en1k/training docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/en1k/training')
# Index wikir/en1k
indexer = pt.IterDictIndexer('./indices/wikir_en1k')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|---|
0 | Otherwise |
1 | There is a link to the article with the query as its title in the first sentence |
2 | Query is the article title |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en1k/training")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikir/en1k/training qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:wikir/en1k/training')
index_ref = pt.IndexRef.of('./indices/wikir_en1k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en1k/training")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export wikir/en1k/training scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }Validation set of wikir/en1k. Scoreddocs are the provided BM25 run.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en1k/validation")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/en1k/validation queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/en1k/validation')
index_ref = pt.IndexRef.of('./indices/wikir_en1k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
Inherits docs from wikir/en1k
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en1k/validation")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/en1k/validation docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/en1k/validation')
# Index wikir/en1k
indexer = pt.IterDictIndexer('./indices/wikir_en1k')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|---|
0 | Otherwise |
1 | There is a link to the article with the query as its title in the first sentence |
2 | Query is the article title |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en1k/validation")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikir/en1k/validation qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:wikir/en1k/validation')
index_ref = pt.IndexRef.of('./indices/wikir_en1k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en1k/validation")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export wikir/en1k/validation scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }WikIR for English.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en59k")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/en59k docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/en59k')
# Index wikir/en59k
indexer = pt.IterDictIndexer('./indices/wikir_en59k')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Bibtex:
@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }Test set of wikir/en59k. Scoreddocs are the provided BM25 run.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en59k/test")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/en59k/test queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/en59k/test')
index_ref = pt.IndexRef.of('./indices/wikir_en59k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
Inherits docs from wikir/en59k
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en59k/test")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/en59k/test docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/en59k/test')
# Index wikir/en59k
indexer = pt.IterDictIndexer('./indices/wikir_en59k')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|---|
0 | Otherwise |
1 | There is a link to the article with the query as its title in the first sentence |
2 | Query is the article title |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en59k/test")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikir/en59k/test qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:wikir/en59k/test')
index_ref = pt.IndexRef.of('./indices/wikir_en59k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en59k/test")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export wikir/en59k/test scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }Training set of wikir/en59k. Scoreddocs are the provided BM25 run.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en59k/training")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/en59k/training queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/en59k/training')
index_ref = pt.IndexRef.of('./indices/wikir_en59k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
Inherits docs from wikir/en59k
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en59k/training")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/en59k/training docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/en59k/training')
# Index wikir/en59k
indexer = pt.IterDictIndexer('./indices/wikir_en59k')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|---|
0 | Otherwise |
1 | There is a link to the article with the query as its title in the first sentence |
2 | Query is the article title |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en59k/training")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikir/en59k/training qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:wikir/en59k/training')
index_ref = pt.IndexRef.of('./indices/wikir_en59k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en59k/training")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export wikir/en59k/training scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }Validation set of wikir/en59k. Scoreddocs are the provided BM25 run.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en59k/validation")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/en59k/validation queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/en59k/validation')
index_ref = pt.IndexRef.of('./indices/wikir_en59k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
Inherits docs from wikir/en59k
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en59k/validation")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/en59k/validation docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/en59k/validation')
# Index wikir/en59k
indexer = pt.IterDictIndexer('./indices/wikir_en59k')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|---|
0 | Otherwise |
1 | There is a link to the article with the query as its title in the first sentence |
2 | Query is the article title |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en59k/validation")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikir/en59k/validation qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:wikir/en59k/validation')
index_ref = pt.IndexRef.of('./indices/wikir_en59k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en59k/validation")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export wikir/en59k/validation scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }WikIR for English. This is one of the two versions used in Frej2020Wikir.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en78k")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/en78k docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/en78k')
# Index wikir/en78k
indexer = pt.IterDictIndexer('./indices/wikir_en78k')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Bibtex:
@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }Test set of wikir/en78k. Scoreddocs are the provided BM25 run.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en78k/test")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/en78k/test queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/en78k/test')
index_ref = pt.IndexRef.of('./indices/wikir_en78k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
Inherits docs from wikir/en78k
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en78k/test")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/en78k/test docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/en78k/test')
# Index wikir/en78k
indexer = pt.IterDictIndexer('./indices/wikir_en78k')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|---|
0 | Otherwise |
1 | There is a link to the article with the query as its title in the first sentence |
2 | Query is the article title |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en78k/test")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikir/en78k/test qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:wikir/en78k/test')
index_ref = pt.IndexRef.of('./indices/wikir_en78k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en78k/test")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export wikir/en78k/test scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }Training set of wikir/en78k. Scoreddocs are the provided BM25 run.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en78k/training")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/en78k/training queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/en78k/training')
index_ref = pt.IndexRef.of('./indices/wikir_en78k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
Inherits docs from wikir/en78k
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en78k/training")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/en78k/training docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/en78k/training')
# Index wikir/en78k
indexer = pt.IterDictIndexer('./indices/wikir_en78k')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|---|
0 | Otherwise |
1 | There is a link to the article with the query as its title in the first sentence |
2 | Query is the article title |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en78k/training")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikir/en78k/training qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:wikir/en78k/training')
index_ref = pt.IndexRef.of('./indices/wikir_en78k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en78k/training")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export wikir/en78k/training scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }Validation set of wikir/en78k. Scoreddocs are the provided BM25 run.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en78k/validation")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/en78k/validation queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/en78k/validation')
index_ref = pt.IndexRef.of('./indices/wikir_en78k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
Inherits docs from wikir/en78k
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en78k/validation")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/en78k/validation docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/en78k/validation')
# Index wikir/en78k
indexer = pt.IterDictIndexer('./indices/wikir_en78k')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|---|
0 | Otherwise |
1 | There is a link to the article with the query as its title in the first sentence |
2 | Query is the article title |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en78k/validation")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikir/en78k/validation qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:wikir/en78k/validation')
index_ref = pt.IndexRef.of('./indices/wikir_en78k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/en78k/validation")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export wikir/en78k/validation scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }WikIR for English, using the first sentences of articles as queries. This is one of the two versions used in Frej2020Wikir.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/ens78k")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/ens78k docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/ens78k')
# Index wikir/ens78k
indexer = pt.IterDictIndexer('./indices/wikir_ens78k')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Bibtex:
@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }Test set of wikir/ens78k. Scoreddocs are the provided BM25 run.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/ens78k/test")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/ens78k/test queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/ens78k/test')
index_ref = pt.IndexRef.of('./indices/wikir_ens78k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
Inherits docs from wikir/ens78k
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/ens78k/test")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/ens78k/test docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/ens78k/test')
# Index wikir/ens78k
indexer = pt.IterDictIndexer('./indices/wikir_ens78k')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|---|
0 | Otherwise |
1 | There is a link to the article with the query as its title in the first sentence |
2 | Query is the article title |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/ens78k/test")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikir/ens78k/test qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:wikir/ens78k/test')
index_ref = pt.IndexRef.of('./indices/wikir_ens78k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/ens78k/test")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export wikir/ens78k/test scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }Training set of wikir/ens78k. Scoreddocs are the provided BM25 run.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/ens78k/training")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/ens78k/training queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/ens78k/training')
index_ref = pt.IndexRef.of('./indices/wikir_ens78k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
Inherits docs from wikir/ens78k
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/ens78k/training")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/ens78k/training docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/ens78k/training')
# Index wikir/ens78k
indexer = pt.IterDictIndexer('./indices/wikir_ens78k')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|---|
0 | Otherwise |
1 | There is a link to the article with the query as its title in the first sentence |
2 | Query is the article title |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/ens78k/training")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikir/ens78k/training qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:wikir/ens78k/training')
index_ref = pt.IndexRef.of('./indices/wikir_ens78k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/ens78k/training")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export wikir/ens78k/training scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }Validation set of wikir/ens78k. Scoreddocs are the provided BM25 run.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/ens78k/validation")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/ens78k/validation queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/ens78k/validation')
index_ref = pt.IndexRef.of('./indices/wikir_ens78k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
Inherits docs from wikir/ens78k
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/ens78k/validation")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/ens78k/validation docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/ens78k/validation')
# Index wikir/ens78k
indexer = pt.IterDictIndexer('./indices/wikir_ens78k')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|---|
0 | Otherwise |
1 | There is a link to the article with the query as its title in the first sentence |
2 | Query is the article title |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/ens78k/validation")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikir/ens78k/validation qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:wikir/ens78k/validation')
index_ref = pt.IndexRef.of('./indices/wikir_ens78k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/ens78k/validation")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export wikir/ens78k/validation scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }WikIR for Spanish.
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/es13k")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/es13k docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }Test set of wikir/es13k. Scoreddocs are the provided BM25 run.
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/es13k/test")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/es13k/test queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from wikir/es13k
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/es13k/test")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/es13k/test docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition |
---|---|
0 | Otherwise |
1 | There is a link to the article with the query as its title in the first sentence |
2 | Query is the article title |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/es13k/test")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikir/es13k/test qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/es13k/test")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export wikir/es13k/test scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }Training set of wikir/es13k. Scoreddocs are the provided BM25 run.
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/es13k/training")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/es13k/training queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from wikir/es13k
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/es13k/training")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/es13k/training docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition |
---|---|
0 | Otherwise |
1 | There is a link to the article with the query as its title in the first sentence |
2 | Query is the article title |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/es13k/training")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikir/es13k/training qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/es13k/training")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export wikir/es13k/training scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }Validation set of wikir/es13k. Scoreddocs are the provided BM25 run.
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/es13k/validation")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/es13k/validation queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from wikir/es13k
Language: es
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/es13k/validation")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/es13k/validation docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition |
---|---|
0 | Otherwise |
1 | There is a link to the article with the query as its title in the first sentence |
2 | Query is the article title |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/es13k/validation")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikir/es13k/validation qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/es13k/validation")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export wikir/es13k/validation scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }WikIR for French.
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/fr14k")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/fr14k docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }Test set of wikir/fr14k. Scoreddocs are the provided BM25 run.
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/fr14k/test")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/fr14k/test queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from wikir/fr14k
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/fr14k/test")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/fr14k/test docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition |
---|---|
0 | Otherwise |
1 | There is a link to the article with the query as its title in the first sentence |
2 | Query is the article title |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/fr14k/test")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikir/fr14k/test qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/fr14k/test")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export wikir/fr14k/test scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }Training set of wikir/fr14k. Scoreddocs are the provided BM25 run.
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/fr14k/training")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/fr14k/training queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from wikir/fr14k
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/fr14k/training")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/fr14k/training docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition |
---|---|
0 | Otherwise |
1 | There is a link to the article with the query as its title in the first sentence |
2 | Query is the article title |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/fr14k/training")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikir/fr14k/training qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/fr14k/training")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export wikir/fr14k/training scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }Validation set of wikir/fr14k. Scoreddocs are the provided BM25 run.
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/fr14k/validation")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/fr14k/validation queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from wikir/fr14k
Language: fr
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/fr14k/validation")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/fr14k/validation docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition |
---|---|
0 | Otherwise |
1 | There is a link to the article with the query as its title in the first sentence |
2 | Query is the article title |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/fr14k/validation")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikir/fr14k/validation qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/fr14k/validation")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export wikir/fr14k/validation scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }WikIR for Italian.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/it16k")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/it16k docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }Test set of wikir/it16k. Scoreddocs are the provided BM25 run.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/it16k/test")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/it16k/test queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from wikir/it16k
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/it16k/test")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/it16k/test docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition |
---|---|
0 | Otherwise |
1 | There is a link to the article with the query as its title in the first sentence |
2 | Query is the article title |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/it16k/test")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikir/it16k/test qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/it16k/test")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export wikir/it16k/test scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }Training set of wikir/it16k. Scoreddocs are the provided BM25 run.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/it16k/training")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/it16k/training queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from wikir/it16k
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/it16k/training")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/it16k/training docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition |
---|---|
0 | Otherwise |
1 | There is a link to the article with the query as its title in the first sentence |
2 | Query is the article title |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/it16k/training")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikir/it16k/training qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/it16k/training")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export wikir/it16k/training scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }Validation set of wikir/it16k. Scoreddocs are the provided BM25 run.
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/it16k/validation")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/it16k/validation queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Inherits docs from wikir/it16k
Language: it
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/it16k/validation")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export wikir/it16k/validation docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
Relevance levels
Rel. | Definition |
---|---|
0 | Otherwise |
1 | There is a link to the article with the query as its title in the first sentence |
2 | Query is the article title |
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/it16k/validation")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export wikir/it16k/validation qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
Examples:
import ir_datasets
dataset = ir_datasets.load("wikir/it16k/validation")
for scoreddoc in dataset.scoreddocs_iter():
scoreddoc # namedtuple<query_id, doc_id, score>
You can find more details about the Python API here.
ir_datasets export wikir/it16k/validation scoreddocs --format tsv
[query_id] [doc_id] [score]
...
You can find more details about the CLI here.
No example available for PyTerrier
Bibtex:
@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }