ir_datasets : WikIR

import ir_datasets
dataset = ir_datasets.load("wikir/en1k")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/en1k docs



[doc_id]    [text]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/en1k')
# Index wikir/en1k
indexer = pt.IterDictIndexer('./indices/wikir_en1k')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

\cite{Frej2020Wikir,Frej2020MlWikir}

Bibtex:

@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

`"wikir/en1k/test"`

Test set of wikir/en1k. Scoreddocs are the provided BM25 run.

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/en1k/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/en1k/test queries



[query_id]    [text]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/en1k/test')
index_ref = pt.IndexRef.of('./indices/wikir_en1k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

Inherits docs from wikir/en1k

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/en1k/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/en1k/test docs



[doc_id]    [text]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/en1k/test')
# Index wikir/en1k
indexer = pt.IterDictIndexer('./indices/wikir_en1k')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition
0	Otherwise
1	There is a link to the article with the query as its title in the first sentence
2	Query is the article title

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/en1k/test")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/en1k/test qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:wikir/en1k/test')
index_ref = pt.IndexRef.of('./indices/wikir_en1k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)

You can find more details about PyTerrier experiments here.

Scored Document type:

GenericScoredDoc: (namedtuple)

query_id: str
doc_id: str
score: float

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/en1k/test")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/en1k/test scoreddocs --format tsv



[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

No example available for PyTerrier

\cite{Frej2020Wikir,Frej2020MlWikir}

Bibtex:

@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

`"wikir/en1k/training"`

Training set of wikir/en1k. Scoreddocs are the provided BM25 run.

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/en1k/training")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/en1k/training queries



[query_id]    [text]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/en1k/training')
index_ref = pt.IndexRef.of('./indices/wikir_en1k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

Inherits docs from wikir/en1k

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/en1k/training")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/en1k/training docs



[doc_id]    [text]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/en1k/training')
# Index wikir/en1k
indexer = pt.IterDictIndexer('./indices/wikir_en1k')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition
0	Otherwise
1	There is a link to the article with the query as its title in the first sentence
2	Query is the article title

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/en1k/training")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/en1k/training qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:wikir/en1k/training')
index_ref = pt.IndexRef.of('./indices/wikir_en1k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)

You can find more details about PyTerrier experiments here.

Scored Document type:

GenericScoredDoc: (namedtuple)

query_id: str
doc_id: str
score: float

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/en1k/training")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/en1k/training scoreddocs --format tsv



[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

No example available for PyTerrier

\cite{Frej2020Wikir,Frej2020MlWikir}

Bibtex:

@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

`"wikir/en1k/validation"`

Validation set of wikir/en1k. Scoreddocs are the provided BM25 run.

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/en1k/validation")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/en1k/validation queries



[query_id]    [text]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/en1k/validation')
index_ref = pt.IndexRef.of('./indices/wikir_en1k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

Inherits docs from wikir/en1k

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/en1k/validation")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/en1k/validation docs



[doc_id]    [text]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/en1k/validation')
# Index wikir/en1k
indexer = pt.IterDictIndexer('./indices/wikir_en1k')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition
0	Otherwise
1	There is a link to the article with the query as its title in the first sentence
2	Query is the article title

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/en1k/validation")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/en1k/validation qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:wikir/en1k/validation')
index_ref = pt.IndexRef.of('./indices/wikir_en1k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)

You can find more details about PyTerrier experiments here.

Scored Document type:

GenericScoredDoc: (namedtuple)

query_id: str
doc_id: str
score: float

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/en1k/validation")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/en1k/validation scoreddocs --format tsv



[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

No example available for PyTerrier

\cite{Frej2020Wikir,Frej2020MlWikir}

Bibtex:

@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

`"wikir/en59k"`

WikIR for English.

docs

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/en59k")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/en59k docs



[doc_id]    [text]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/en59k')
# Index wikir/en59k
indexer = pt.IterDictIndexer('./indices/wikir_en59k')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

\cite{Frej2020Wikir,Frej2020MlWikir}

Bibtex:

@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

`"wikir/en59k/test"`

Test set of wikir/en59k. Scoreddocs are the provided BM25 run.

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/en59k/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/en59k/test queries



[query_id]    [text]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/en59k/test')
index_ref = pt.IndexRef.of('./indices/wikir_en59k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

Inherits docs from wikir/en59k

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/en59k/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/en59k/test docs



[doc_id]    [text]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/en59k/test')
# Index wikir/en59k
indexer = pt.IterDictIndexer('./indices/wikir_en59k')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition
0	Otherwise
1	There is a link to the article with the query as its title in the first sentence
2	Query is the article title

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/en59k/test")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/en59k/test qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:wikir/en59k/test')
index_ref = pt.IndexRef.of('./indices/wikir_en59k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)

You can find more details about PyTerrier experiments here.

Scored Document type:

GenericScoredDoc: (namedtuple)

query_id: str
doc_id: str
score: float

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/en59k/test")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/en59k/test scoreddocs --format tsv



[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

No example available for PyTerrier

\cite{Frej2020Wikir,Frej2020MlWikir}

Bibtex:

@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

`"wikir/en59k/training"`

Training set of wikir/en59k. Scoreddocs are the provided BM25 run.

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/en59k/training")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/en59k/training queries



[query_id]    [text]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/en59k/training')
index_ref = pt.IndexRef.of('./indices/wikir_en59k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

Inherits docs from wikir/en59k

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/en59k/training")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/en59k/training docs



[doc_id]    [text]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/en59k/training')
# Index wikir/en59k
indexer = pt.IterDictIndexer('./indices/wikir_en59k')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition
0	Otherwise
1	There is a link to the article with the query as its title in the first sentence
2	Query is the article title

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/en59k/training")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/en59k/training qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:wikir/en59k/training')
index_ref = pt.IndexRef.of('./indices/wikir_en59k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)

You can find more details about PyTerrier experiments here.

Scored Document type:

GenericScoredDoc: (namedtuple)

query_id: str
doc_id: str
score: float

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/en59k/training")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/en59k/training scoreddocs --format tsv



[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

No example available for PyTerrier

\cite{Frej2020Wikir,Frej2020MlWikir}

Bibtex:

@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

`"wikir/en59k/validation"`

Validation set of wikir/en59k. Scoreddocs are the provided BM25 run.

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/en59k/validation")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/en59k/validation queries



[query_id]    [text]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/en59k/validation')
index_ref = pt.IndexRef.of('./indices/wikir_en59k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

Inherits docs from wikir/en59k

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/en59k/validation")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/en59k/validation docs



[doc_id]    [text]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/en59k/validation')
# Index wikir/en59k
indexer = pt.IterDictIndexer('./indices/wikir_en59k')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition
0	Otherwise
1	There is a link to the article with the query as its title in the first sentence
2	Query is the article title

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/en59k/validation")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/en59k/validation qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:wikir/en59k/validation')
index_ref = pt.IndexRef.of('./indices/wikir_en59k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)

You can find more details about PyTerrier experiments here.

Scored Document type:

GenericScoredDoc: (namedtuple)

query_id: str
doc_id: str
score: float

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/en59k/validation")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/en59k/validation scoreddocs --format tsv



[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

No example available for PyTerrier

\cite{Frej2020Wikir,Frej2020MlWikir}

Bibtex:

@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

`"wikir/en78k"`

WikIR for English. This is one of the two versions used in Frej2020Wikir.

docs

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/en78k")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/en78k docs



[doc_id]    [text]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/en78k')
# Index wikir/en78k
indexer = pt.IterDictIndexer('./indices/wikir_en78k')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

\cite{Frej2020Wikir,Frej2020MlWikir}

Bibtex:

@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

`"wikir/en78k/test"`

Test set of wikir/en78k. Scoreddocs are the provided BM25 run.

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/en78k/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/en78k/test queries



[query_id]    [text]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/en78k/test')
index_ref = pt.IndexRef.of('./indices/wikir_en78k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

Inherits docs from wikir/en78k

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/en78k/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/en78k/test docs



[doc_id]    [text]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/en78k/test')
# Index wikir/en78k
indexer = pt.IterDictIndexer('./indices/wikir_en78k')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition
0	Otherwise
1	There is a link to the article with the query as its title in the first sentence
2	Query is the article title

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/en78k/test")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/en78k/test qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:wikir/en78k/test')
index_ref = pt.IndexRef.of('./indices/wikir_en78k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)

You can find more details about PyTerrier experiments here.

Scored Document type:

GenericScoredDoc: (namedtuple)

query_id: str
doc_id: str
score: float

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/en78k/test")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/en78k/test scoreddocs --format tsv



[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

No example available for PyTerrier

\cite{Frej2020Wikir,Frej2020MlWikir}

Bibtex:

@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

`"wikir/en78k/training"`

Training set of wikir/en78k. Scoreddocs are the provided BM25 run.

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/en78k/training")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/en78k/training queries



[query_id]    [text]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/en78k/training')
index_ref = pt.IndexRef.of('./indices/wikir_en78k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

Inherits docs from wikir/en78k

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/en78k/training")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/en78k/training docs



[doc_id]    [text]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/en78k/training')
# Index wikir/en78k
indexer = pt.IterDictIndexer('./indices/wikir_en78k')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition
0	Otherwise
1	There is a link to the article with the query as its title in the first sentence
2	Query is the article title

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/en78k/training")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/en78k/training qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:wikir/en78k/training')
index_ref = pt.IndexRef.of('./indices/wikir_en78k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)

You can find more details about PyTerrier experiments here.

Scored Document type:

GenericScoredDoc: (namedtuple)

query_id: str
doc_id: str
score: float

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/en78k/training")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/en78k/training scoreddocs --format tsv



[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

No example available for PyTerrier

\cite{Frej2020Wikir,Frej2020MlWikir}

Bibtex:

@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

`"wikir/en78k/validation"`

Validation set of wikir/en78k. Scoreddocs are the provided BM25 run.

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/en78k/validation")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/en78k/validation queries



[query_id]    [text]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/en78k/validation')
index_ref = pt.IndexRef.of('./indices/wikir_en78k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

Inherits docs from wikir/en78k

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/en78k/validation")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/en78k/validation docs



[doc_id]    [text]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/en78k/validation')
# Index wikir/en78k
indexer = pt.IterDictIndexer('./indices/wikir_en78k')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition
0	Otherwise
1	There is a link to the article with the query as its title in the first sentence
2	Query is the article title

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/en78k/validation")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/en78k/validation qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:wikir/en78k/validation')
index_ref = pt.IndexRef.of('./indices/wikir_en78k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)

You can find more details about PyTerrier experiments here.

Scored Document type:

GenericScoredDoc: (namedtuple)

query_id: str
doc_id: str
score: float

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/en78k/validation")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/en78k/validation scoreddocs --format tsv



[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

No example available for PyTerrier

\cite{Frej2020Wikir,Frej2020MlWikir}

Bibtex:

@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

`"wikir/ens78k"`

WikIR for English, using the first sentences of articles as queries. This is one of the two versions used in Frej2020Wikir.

docs

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/ens78k")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/ens78k docs



[doc_id]    [text]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/ens78k')
# Index wikir/ens78k
indexer = pt.IterDictIndexer('./indices/wikir_ens78k')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

\cite{Frej2020Wikir,Frej2020MlWikir}

Bibtex:

@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

`"wikir/ens78k/test"`

Test set of wikir/ens78k. Scoreddocs are the provided BM25 run.

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/ens78k/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/ens78k/test queries



[query_id]    [text]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/ens78k/test')
index_ref = pt.IndexRef.of('./indices/wikir_ens78k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

Inherits docs from wikir/ens78k

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/ens78k/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/ens78k/test docs



[doc_id]    [text]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/ens78k/test')
# Index wikir/ens78k
indexer = pt.IterDictIndexer('./indices/wikir_ens78k')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition
0	Otherwise
1	There is a link to the article with the query as its title in the first sentence
2	Query is the article title

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/ens78k/test")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/ens78k/test qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:wikir/ens78k/test')
index_ref = pt.IndexRef.of('./indices/wikir_ens78k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)

You can find more details about PyTerrier experiments here.

Scored Document type:

GenericScoredDoc: (namedtuple)

query_id: str
doc_id: str
score: float

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/ens78k/test")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/ens78k/test scoreddocs --format tsv



[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

No example available for PyTerrier

\cite{Frej2020Wikir,Frej2020MlWikir}

Bibtex:

@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

`"wikir/ens78k/training"`

Training set of wikir/ens78k. Scoreddocs are the provided BM25 run.

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/ens78k/training")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/ens78k/training queries



[query_id]    [text]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/ens78k/training')
index_ref = pt.IndexRef.of('./indices/wikir_ens78k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

Inherits docs from wikir/ens78k

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/ens78k/training")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/ens78k/training docs



[doc_id]    [text]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/ens78k/training')
# Index wikir/ens78k
indexer = pt.IterDictIndexer('./indices/wikir_ens78k')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition
0	Otherwise
1	There is a link to the article with the query as its title in the first sentence
2	Query is the article title

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/ens78k/training")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/ens78k/training qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:wikir/ens78k/training')
index_ref = pt.IndexRef.of('./indices/wikir_ens78k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)

You can find more details about PyTerrier experiments here.

Scored Document type:

GenericScoredDoc: (namedtuple)

query_id: str
doc_id: str
score: float

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/ens78k/training")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/ens78k/training scoreddocs --format tsv



[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

No example available for PyTerrier

\cite{Frej2020Wikir,Frej2020MlWikir}

Bibtex:

@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

`"wikir/ens78k/validation"`

Validation set of wikir/ens78k. Scoreddocs are the provided BM25 run.

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/ens78k/validation")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/ens78k/validation queries



[query_id]    [text]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/ens78k/validation')
index_ref = pt.IndexRef.of('./indices/wikir_ens78k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

Inherits docs from wikir/ens78k

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/ens78k/validation")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/ens78k/validation docs



[doc_id]    [text]
...

You can find more details about the CLI here.

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:wikir/ens78k/validation')
# Index wikir/ens78k
indexer = pt.IterDictIndexer('./indices/wikir_ens78k')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition
0	Otherwise
1	There is a link to the article with the query as its title in the first sentence
2	Query is the article title

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/ens78k/validation")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/ens78k/validation qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:wikir/ens78k/validation')
index_ref = pt.IndexRef.of('./indices/wikir_ens78k') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)

You can find more details about PyTerrier experiments here.

Scored Document type:

GenericScoredDoc: (namedtuple)

query_id: str
doc_id: str
score: float

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/ens78k/validation")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/ens78k/validation scoreddocs --format tsv



[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

No example available for PyTerrier

\cite{Frej2020Wikir,Frej2020MlWikir}

Bibtex:

@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

`"wikir/es13k"`

WikIR for Spanish.

docs

Language: es

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/es13k")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/es13k docs



[doc_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

\cite{Frej2020Wikir,Frej2020MlWikir}

Bibtex:

@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

`"wikir/es13k/test"`

Test set of wikir/es13k. Scoreddocs are the provided BM25 run.

Language: es

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/es13k/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/es13k/test queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

docs

Inherits docs from wikir/es13k

Language: es

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/es13k/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/es13k/test docs



[doc_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition
0	Otherwise
1	There is a link to the article with the query as its title in the first sentence
2	Query is the article title

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/es13k/test")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/es13k/test qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

Scored Document type:

GenericScoredDoc: (namedtuple)

query_id: str
doc_id: str
score: float

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/es13k/test")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/es13k/test scoreddocs --format tsv



[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

No example available for PyTerrier

\cite{Frej2020Wikir,Frej2020MlWikir}

Bibtex:

@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

`"wikir/es13k/training"`

Training set of wikir/es13k. Scoreddocs are the provided BM25 run.

Language: es

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/es13k/training")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/es13k/training queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

docs

Inherits docs from wikir/es13k

Language: es

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/es13k/training")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/es13k/training docs



[doc_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition
0	Otherwise
1	There is a link to the article with the query as its title in the first sentence
2	Query is the article title

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/es13k/training")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/es13k/training qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

Scored Document type:

GenericScoredDoc: (namedtuple)

query_id: str
doc_id: str
score: float

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/es13k/training")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/es13k/training scoreddocs --format tsv



[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

No example available for PyTerrier

\cite{Frej2020Wikir,Frej2020MlWikir}

Bibtex:

@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

`"wikir/es13k/validation"`

Validation set of wikir/es13k. Scoreddocs are the provided BM25 run.

Language: es

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/es13k/validation")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/es13k/validation queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

docs

Inherits docs from wikir/es13k

Language: es

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/es13k/validation")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/es13k/validation docs



[doc_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition
0	Otherwise
1	There is a link to the article with the query as its title in the first sentence
2	Query is the article title

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/es13k/validation")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/es13k/validation qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

Scored Document type:

GenericScoredDoc: (namedtuple)

query_id: str
doc_id: str
score: float

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/es13k/validation")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/es13k/validation scoreddocs --format tsv



[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

No example available for PyTerrier

\cite{Frej2020Wikir,Frej2020MlWikir}

Bibtex:

@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

`"wikir/fr14k"`

WikIR for French.

docs

Language: fr

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/fr14k")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/fr14k docs



[doc_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

\cite{Frej2020Wikir,Frej2020MlWikir}

Bibtex:

@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

`"wikir/fr14k/test"`

Test set of wikir/fr14k. Scoreddocs are the provided BM25 run.

Language: fr

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/fr14k/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/fr14k/test queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

docs

Inherits docs from wikir/fr14k

Language: fr

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/fr14k/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/fr14k/test docs



[doc_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition
0	Otherwise
1	There is a link to the article with the query as its title in the first sentence
2	Query is the article title

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/fr14k/test")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/fr14k/test qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

Scored Document type:

GenericScoredDoc: (namedtuple)

query_id: str
doc_id: str
score: float

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/fr14k/test")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/fr14k/test scoreddocs --format tsv



[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

No example available for PyTerrier

\cite{Frej2020Wikir,Frej2020MlWikir}

Bibtex:

@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

`"wikir/fr14k/training"`

Training set of wikir/fr14k. Scoreddocs are the provided BM25 run.

Language: fr

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/fr14k/training")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/fr14k/training queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

docs

Inherits docs from wikir/fr14k

Language: fr

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/fr14k/training")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/fr14k/training docs



[doc_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition
0	Otherwise
1	There is a link to the article with the query as its title in the first sentence
2	Query is the article title

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/fr14k/training")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/fr14k/training qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

Scored Document type:

GenericScoredDoc: (namedtuple)

query_id: str
doc_id: str
score: float

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/fr14k/training")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/fr14k/training scoreddocs --format tsv



[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

No example available for PyTerrier

\cite{Frej2020Wikir,Frej2020MlWikir}

Bibtex:

@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

`"wikir/fr14k/validation"`

Validation set of wikir/fr14k. Scoreddocs are the provided BM25 run.

Language: fr

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/fr14k/validation")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/fr14k/validation queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

docs

Inherits docs from wikir/fr14k

Language: fr

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/fr14k/validation")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/fr14k/validation docs



[doc_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition
0	Otherwise
1	There is a link to the article with the query as its title in the first sentence
2	Query is the article title

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/fr14k/validation")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/fr14k/validation qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

Scored Document type:

GenericScoredDoc: (namedtuple)

query_id: str
doc_id: str
score: float

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/fr14k/validation")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/fr14k/validation scoreddocs --format tsv



[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

No example available for PyTerrier

\cite{Frej2020Wikir,Frej2020MlWikir}

Bibtex:

@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

`"wikir/it16k"`

WikIR for Italian.

docs

Language: it

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/it16k")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/it16k docs



[doc_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

\cite{Frej2020Wikir,Frej2020MlWikir}

Bibtex:

@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

`"wikir/it16k/test"`

Test set of wikir/it16k. Scoreddocs are the provided BM25 run.

Language: it

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/it16k/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/it16k/test queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

docs

Inherits docs from wikir/it16k

Language: it

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/it16k/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/it16k/test docs



[doc_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition
0	Otherwise
1	There is a link to the article with the query as its title in the first sentence
2	Query is the article title

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/it16k/test")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/it16k/test qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

Scored Document type:

GenericScoredDoc: (namedtuple)

query_id: str
doc_id: str
score: float

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/it16k/test")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/it16k/test scoreddocs --format tsv



[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

No example available for PyTerrier

\cite{Frej2020Wikir,Frej2020MlWikir}

Bibtex:

@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

`"wikir/it16k/training"`

Training set of wikir/it16k. Scoreddocs are the provided BM25 run.

Language: it

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/it16k/training")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/it16k/training queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

docs

Inherits docs from wikir/it16k

Language: it

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/it16k/training")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/it16k/training docs



[doc_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition
0	Otherwise
1	There is a link to the article with the query as its title in the first sentence
2	Query is the article title

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/it16k/training")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/it16k/training qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

Scored Document type:

GenericScoredDoc: (namedtuple)

query_id: str
doc_id: str
score: float

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/it16k/training")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/it16k/training scoreddocs --format tsv



[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

No example available for PyTerrier

\cite{Frej2020Wikir,Frej2020MlWikir}

Bibtex:

@inproceedings{Frej2020Wikir, title={WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={LREC}, year={2020} } @inproceedings{Frej2020MlWikir, title={MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More}, author={Jibril Frej and Didier Schwab and Jean-Pierre Chevallet}, booktitle={CIRCLE}, year={2020} }

`"wikir/it16k/validation"`

Validation set of wikir/it16k. Scoreddocs are the provided BM25 run.

Language: it

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/it16k/validation")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/it16k/validation queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

docs

Inherits docs from wikir/it16k

Language: it

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/it16k/validation")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/it16k/validation docs



[doc_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition
0	Otherwise
1	There is a link to the article with the query as its title in the first sentence
2	Query is the article title

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/it16k/validation")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/it16k/validation qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

Scored Document type:

GenericScoredDoc: (namedtuple)

query_id: str
doc_id: str
score: float

Examples:

import ir_datasets
dataset = ir_datasets.load("wikir/it16k/validation")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI

ir_datasets export wikir/it16k/validation scoreddocs --format tsv



[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

No example available for PyTerrier