← home
Github: datasets/tripclick.py

ir_datasets: TripClick

Index
  1. tripclick
  2. tripclick/test
  3. tripclick/test/head
  4. tripclick/test/tail
  5. tripclick/test/torso
  6. tripclick/train
  7. tripclick/train/head
  8. tripclick/train/head/dctr
  9. tripclick/train/tail
  10. tripclick/train/torso
  11. tripclick/val
  12. tripclick/val/head
  13. tripclick/val/head/dctr
  14. tripclick/val/tail
  15. tripclick/val/torso

Data Access Information

To use this dataset, you need a copy of the source files, provided by the Trip Database.

A copy of the source files can be requested through the procedure detailed here. Documents, queries, and qrels require the "TripClick IR Benchmark"; for scoreddocs and docpairs, you will also need to request the "TripClick Training Package for Deep Learning Models".

The source files you will need are:

ir_datasets expects these files to be copied/linked in ~/.ir_datasets/tripclick/.


"tripclick"

TripClick is a large collection from the Trip Database. Relevance is inferred from click signals.

A copy of this dataset can be obtained from the Trip Database through the process described here. Documents, queries, and qrels require the "TripClick IR Benchmark"; for scoreddocs and docpairs, you will also need to request the "TripClick Training Package for Deep Learning Models".

docs

Language: en

Document type:
TitleUrlTextDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. url: str
  4. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, url, text>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick docs
[doc_id]    [title]    [url]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:tripclick')
# Index tripclick
indexer = pt.IterDictIndexer('./indices/tripclick')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'url', 'text'])

You can find more details about PyTerrier indexing here.

Citation

ir_datasets.bib:

\cite{Rekabsaz2021TripClick}

Bibtex:

@inproceedings{Rekabsaz2021TripClick, title={TripClick: The Log Files of a Large Health Web Search Engine}, author={Navid Rekabsaz and Oleg Lesota and Markus Schedl and Jon Brassey and Carsten Eickhoff}, year={2021}, booktitle={SIGIR} }

"tripclick/test"

Test subset of tripclick, including all queries from tripclick/test/head, tripclick/test/torso, and tripclick/test/tail.

The scoreddocs are the official BM25 results from Anserini.

queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/test queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:tripclick/test')
index_ref = pt.IndexRef.of('./indices/tripclick') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

Language: en

Note: Uses docs from tripclick

Document type:
TitleUrlTextDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. url: str
  4. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, url, text>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/test docs
[doc_id]    [title]    [url]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:tripclick/test')
# Index tripclick
indexer = pt.IterDictIndexer('./indices/tripclick')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'url', 'text'])

You can find more details about PyTerrier indexing here.

scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/test")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/test scoreddocs --format tsv
[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Citation

ir_datasets.bib:

\cite{Rekabsaz2021TripClick}

Bibtex:

@inproceedings{Rekabsaz2021TripClick, title={TripClick: The Log Files of a Large Health Web Search Engine}, author={Navid Rekabsaz and Oleg Lesota and Markus Schedl and Jon Brassey and Carsten Eickhoff}, year={2021}, booktitle={SIGIR} }

"tripclick/test/head"

The most frequent queries in the validation set. This represents 20% of the search engine traffic.

queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/test/head")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/test/head queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:tripclick/test/head')
index_ref = pt.IndexRef.of('./indices/tripclick') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

Language: en

Note: Uses docs from tripclick

Document type:
TitleUrlTextDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. url: str
  4. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/test/head")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, url, text>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/test/head docs
[doc_id]    [title]    [url]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:tripclick/test/head')
# Index tripclick
indexer = pt.IterDictIndexer('./indices/tripclick')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'url', 'text'])

You can find more details about PyTerrier indexing here.

scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/test/head")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/test/head scoreddocs --format tsv
[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Citation

ir_datasets.bib:

\cite{Rekabsaz2021TripClick}

Bibtex:

@inproceedings{Rekabsaz2021TripClick, title={TripClick: The Log Files of a Large Health Web Search Engine}, author={Navid Rekabsaz and Oleg Lesota and Markus Schedl and Jon Brassey and Carsten Eickhoff}, year={2021}, booktitle={SIGIR} }

"tripclick/test/tail"

The least frequent queries in the test set. This represents 50% of the search engine traffic.

queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/test/tail")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/test/tail queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:tripclick/test/tail')
index_ref = pt.IndexRef.of('./indices/tripclick') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

Language: en

Note: Uses docs from tripclick

Document type:
TitleUrlTextDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. url: str
  4. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/test/tail")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, url, text>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/test/tail docs
[doc_id]    [title]    [url]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:tripclick/test/tail')
# Index tripclick
indexer = pt.IterDictIndexer('./indices/tripclick')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'url', 'text'])

You can find more details about PyTerrier indexing here.

scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/test/tail")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/test/tail scoreddocs --format tsv
[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Citation

ir_datasets.bib:

\cite{Rekabsaz2021TripClick}

Bibtex:

@inproceedings{Rekabsaz2021TripClick, title={TripClick: The Log Files of a Large Health Web Search Engine}, author={Navid Rekabsaz and Oleg Lesota and Markus Schedl and Jon Brassey and Carsten Eickhoff}, year={2021}, booktitle={SIGIR} }

"tripclick/test/torso"

The moderately frequent queries in the test set. This represents 30% of the search engine traffic.

queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/test/torso")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/test/torso queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:tripclick/test/torso')
index_ref = pt.IndexRef.of('./indices/tripclick') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

Language: en

Note: Uses docs from tripclick

Document type:
TitleUrlTextDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. url: str
  4. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/test/torso")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, url, text>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/test/torso docs
[doc_id]    [title]    [url]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:tripclick/test/torso')
# Index tripclick
indexer = pt.IterDictIndexer('./indices/tripclick')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'url', 'text'])

You can find more details about PyTerrier indexing here.

scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/test/torso")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/test/torso scoreddocs --format tsv
[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Citation

ir_datasets.bib:

\cite{Rekabsaz2021TripClick}

Bibtex:

@inproceedings{Rekabsaz2021TripClick, title={TripClick: The Log Files of a Large Health Web Search Engine}, author={Navid Rekabsaz and Oleg Lesota and Markus Schedl and Jon Brassey and Carsten Eickhoff}, year={2021}, booktitle={SIGIR} }

"tripclick/train"

Training subset of tripclick, including all queries from tripclick/train/head, tripclick/train/torso, and tripclick/train/tail.

The dataset provides docpairs in a full text format; we map this text back to the query and doc IDs. A small number of docpairs could not be mapped back, so they are skipped.

queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:tripclick/train')
index_ref = pt.IndexRef.of('./indices/tripclick') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

Language: en

Note: Uses docs from tripclick

Document type:
TitleUrlTextDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. url: str
  4. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, url, text>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/train docs
[doc_id]    [title]    [url]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:tripclick/train')
# Index tripclick
indexer = pt.IterDictIndexer('./indices/tripclick')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'url', 'text'])

You can find more details about PyTerrier indexing here.

qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
0not clicked and appeared higher in search results
1clicked

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:tripclick/train')
index_ref = pt.IndexRef.of('./indices/tripclick') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)

You can find more details about PyTerrier experiments here.

docpairs
Document Pair type:
GenericDocPair: (namedtuple)
  1. query_id: str
  2. doc_id_a: str
  3. doc_id_b: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/train")
for docpair in dataset.docpairs_iter():
    docpair # namedtuple<query_id, doc_id_a, doc_id_b>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/train docpairs
[query_id]    [doc_id_a]    [doc_id_b]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Citation

ir_datasets.bib:

\cite{Rekabsaz2021TripClick}

Bibtex:

@inproceedings{Rekabsaz2021TripClick, title={TripClick: The Log Files of a Large Health Web Search Engine}, author={Navid Rekabsaz and Oleg Lesota and Markus Schedl and Jon Brassey and Carsten Eickhoff}, year={2021}, booktitle={SIGIR} }

"tripclick/train/head"

The most frequent queries in the train set. This represents 20% of the search engine traffic.

queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/train/head")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/train/head queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:tripclick/train/head')
index_ref = pt.IndexRef.of('./indices/tripclick') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

Language: en

Note: Uses docs from tripclick

Document type:
TitleUrlTextDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. url: str
  4. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/train/head")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, url, text>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/train/head docs
[doc_id]    [title]    [url]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:tripclick/train/head')
# Index tripclick
indexer = pt.IterDictIndexer('./indices/tripclick')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'url', 'text'])

You can find more details about PyTerrier indexing here.

qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
0not clicked and appeared higher in search results
1clicked

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/train/head")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/train/head qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:tripclick/train/head')
index_ref = pt.IndexRef.of('./indices/tripclick') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Rekabsaz2021TripClick}

Bibtex:

@inproceedings{Rekabsaz2021TripClick, title={TripClick: The Log Files of a Large Health Web Search Engine}, author={Navid Rekabsaz and Oleg Lesota and Markus Schedl and Jon Brassey and Carsten Eickhoff}, year={2021}, booktitle={SIGIR} }

"tripclick/train/head/dctr"

The most frequent queries in the train set. This represents 20% of the search engine traffic.

queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/train/head/dctr")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/train/head/dctr queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:tripclick/train/head/dctr')
index_ref = pt.IndexRef.of('./indices/tripclick') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

Language: en

Note: Uses docs from tripclick

Document type:
TitleUrlTextDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. url: str
  4. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/train/head/dctr")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, url, text>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/train/head/dctr docs
[doc_id]    [title]    [url]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:tripclick/train/head/dctr')
# Index tripclick
indexer = pt.IterDictIndexer('./indices/tripclick')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'url', 'text'])

You can find more details about PyTerrier indexing here.

qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
0not relevant; never clicked
1partially relevant; clicked less than 4% of times it was shown (but at least once)
2relevant; clicked more than 4% but less than 30% of times it was shown
3highly relevant; clicked more than 30% of the times it was shown

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/train/head/dctr")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/train/head/dctr qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:tripclick/train/head/dctr')
index_ref = pt.IndexRef.of('./indices/tripclick') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Rekabsaz2021TripClick}

Bibtex:

@inproceedings{Rekabsaz2021TripClick, title={TripClick: The Log Files of a Large Health Web Search Engine}, author={Navid Rekabsaz and Oleg Lesota and Markus Schedl and Jon Brassey and Carsten Eickhoff}, year={2021}, booktitle={SIGIR} }

"tripclick/train/tail"

The least frequent queries in the train set. This represents 50% of the search engine traffic.

queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/train/tail")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/train/tail queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:tripclick/train/tail')
index_ref = pt.IndexRef.of('./indices/tripclick') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

Language: en

Note: Uses docs from tripclick

Document type:
TitleUrlTextDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. url: str
  4. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/train/tail")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, url, text>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/train/tail docs
[doc_id]    [title]    [url]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:tripclick/train/tail')
# Index tripclick
indexer = pt.IterDictIndexer('./indices/tripclick')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'url', 'text'])

You can find more details about PyTerrier indexing here.

qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
0not clicked and appeared higher in search results
1clicked

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/train/tail")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/train/tail qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:tripclick/train/tail')
index_ref = pt.IndexRef.of('./indices/tripclick') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Rekabsaz2021TripClick}

Bibtex:

@inproceedings{Rekabsaz2021TripClick, title={TripClick: The Log Files of a Large Health Web Search Engine}, author={Navid Rekabsaz and Oleg Lesota and Markus Schedl and Jon Brassey and Carsten Eickhoff}, year={2021}, booktitle={SIGIR} }

"tripclick/train/torso"

The moderately frequent queries in the train set. This represents 30% of the search engine traffic.

queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/train/torso")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/train/torso queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:tripclick/train/torso')
index_ref = pt.IndexRef.of('./indices/tripclick') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

Language: en

Note: Uses docs from tripclick

Document type:
TitleUrlTextDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. url: str
  4. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/train/torso")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, url, text>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/train/torso docs
[doc_id]    [title]    [url]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:tripclick/train/torso')
# Index tripclick
indexer = pt.IterDictIndexer('./indices/tripclick')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'url', 'text'])

You can find more details about PyTerrier indexing here.

qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
0not clicked and appeared higher in search results
1clicked

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/train/torso")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/train/torso qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:tripclick/train/torso')
index_ref = pt.IndexRef.of('./indices/tripclick') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Rekabsaz2021TripClick}

Bibtex:

@inproceedings{Rekabsaz2021TripClick, title={TripClick: The Log Files of a Large Health Web Search Engine}, author={Navid Rekabsaz and Oleg Lesota and Markus Schedl and Jon Brassey and Carsten Eickhoff}, year={2021}, booktitle={SIGIR} }

"tripclick/val"

Validation subset of tripclick, including all queries from tripclick/val/head, tripclick/val/torso, and tripclick/val/tail.

The scoreddocs are the official BM25 results from Anserini.

queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/val")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/val queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:tripclick/val')
index_ref = pt.IndexRef.of('./indices/tripclick') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

Language: en

Note: Uses docs from tripclick

Document type:
TitleUrlTextDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. url: str
  4. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/val")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, url, text>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/val docs
[doc_id]    [title]    [url]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:tripclick/val')
# Index tripclick
indexer = pt.IterDictIndexer('./indices/tripclick')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'url', 'text'])

You can find more details about PyTerrier indexing here.

qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
0not clicked and appeared higher in search results
1clicked

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/val")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/val qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:tripclick/val')
index_ref = pt.IndexRef.of('./indices/tripclick') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)

You can find more details about PyTerrier experiments here.

scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/val")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/val scoreddocs --format tsv
[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Citation

ir_datasets.bib:

\cite{Rekabsaz2021TripClick}

Bibtex:

@inproceedings{Rekabsaz2021TripClick, title={TripClick: The Log Files of a Large Health Web Search Engine}, author={Navid Rekabsaz and Oleg Lesota and Markus Schedl and Jon Brassey and Carsten Eickhoff}, year={2021}, booktitle={SIGIR} }

"tripclick/val/head"

The most frequent queries in the validation set. This represents 20% of the search engine traffic.

queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/val/head")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/val/head queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:tripclick/val/head')
index_ref = pt.IndexRef.of('./indices/tripclick') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

Language: en

Note: Uses docs from tripclick

Document type:
TitleUrlTextDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. url: str
  4. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/val/head")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, url, text>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/val/head docs
[doc_id]    [title]    [url]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:tripclick/val/head')
# Index tripclick
indexer = pt.IterDictIndexer('./indices/tripclick')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'url', 'text'])

You can find more details about PyTerrier indexing here.

qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
0not clicked and appeared higher in search results
1clicked

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/val/head")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/val/head qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:tripclick/val/head')
index_ref = pt.IndexRef.of('./indices/tripclick') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)

You can find more details about PyTerrier experiments here.

scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/val/head")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/val/head scoreddocs --format tsv
[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Citation

ir_datasets.bib:

\cite{Rekabsaz2021TripClick}

Bibtex:

@inproceedings{Rekabsaz2021TripClick, title={TripClick: The Log Files of a Large Health Web Search Engine}, author={Navid Rekabsaz and Oleg Lesota and Markus Schedl and Jon Brassey and Carsten Eickhoff}, year={2021}, booktitle={SIGIR} }

"tripclick/val/head/dctr"

The most frequent queries in the validation set. This represents 20% of the search engine traffic.

queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/val/head/dctr")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/val/head/dctr queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:tripclick/val/head/dctr')
index_ref = pt.IndexRef.of('./indices/tripclick') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

Language: en

Note: Uses docs from tripclick

Document type:
TitleUrlTextDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. url: str
  4. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/val/head/dctr")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, url, text>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/val/head/dctr docs
[doc_id]    [title]    [url]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:tripclick/val/head/dctr')
# Index tripclick
indexer = pt.IterDictIndexer('./indices/tripclick')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'url', 'text'])

You can find more details about PyTerrier indexing here.

qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
0not relevant; never clicked
1partially relevant; clicked less than 4% of times it was shown (but at least once)
2relevant; clicked more than 4% but less than 30% of times it was shown
3highly relevant; clicked more than 30% of the times it was shown

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/val/head/dctr")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/val/head/dctr qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:tripclick/val/head/dctr')
index_ref = pt.IndexRef.of('./indices/tripclick') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)

You can find more details about PyTerrier experiments here.

scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/val/head/dctr")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/val/head/dctr scoreddocs --format tsv
[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Citation

ir_datasets.bib:

\cite{Rekabsaz2021TripClick}

Bibtex:

@inproceedings{Rekabsaz2021TripClick, title={TripClick: The Log Files of a Large Health Web Search Engine}, author={Navid Rekabsaz and Oleg Lesota and Markus Schedl and Jon Brassey and Carsten Eickhoff}, year={2021}, booktitle={SIGIR} }

"tripclick/val/tail"

The least frequent queries in the validation set. This represents 50% of the search engine traffic.

queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/val/tail")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/val/tail queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:tripclick/val/tail')
index_ref = pt.IndexRef.of('./indices/tripclick') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

Language: en

Note: Uses docs from tripclick

Document type:
TitleUrlTextDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. url: str
  4. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/val/tail")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, url, text>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/val/tail docs
[doc_id]    [title]    [url]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:tripclick/val/tail')
# Index tripclick
indexer = pt.IterDictIndexer('./indices/tripclick')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'url', 'text'])

You can find more details about PyTerrier indexing here.

qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
0not clicked and appeared higher in search results
1clicked

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/val/tail")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/val/tail qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:tripclick/val/tail')
index_ref = pt.IndexRef.of('./indices/tripclick') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)

You can find more details about PyTerrier experiments here.

scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/val/tail")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/val/tail scoreddocs --format tsv
[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Citation

ir_datasets.bib:

\cite{Rekabsaz2021TripClick}

Bibtex:

@inproceedings{Rekabsaz2021TripClick, title={TripClick: The Log Files of a Large Health Web Search Engine}, author={Navid Rekabsaz and Oleg Lesota and Markus Schedl and Jon Brassey and Carsten Eickhoff}, year={2021}, booktitle={SIGIR} }

"tripclick/val/torso"

The moderately frequent queries in the validation set. This represents 30% of the search engine traffic.

queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/val/torso")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/val/torso queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:tripclick/val/torso')
index_ref = pt.IndexRef.of('./indices/tripclick') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

Language: en

Note: Uses docs from tripclick

Document type:
TitleUrlTextDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. url: str
  4. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/val/torso")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, url, text>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/val/torso docs
[doc_id]    [title]    [url]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:tripclick/val/torso')
# Index tripclick
indexer = pt.IterDictIndexer('./indices/tripclick')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'url', 'text'])

You can find more details about PyTerrier indexing here.

qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
0not clicked and appeared higher in search results
1clicked

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/val/torso")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/val/torso qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:tripclick/val/torso')
index_ref = pt.IndexRef.of('./indices/tripclick') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [MAP, nDCG@20]
)

You can find more details about PyTerrier experiments here.

scoreddocs
Scored Document type:
GenericScoredDoc: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. score: float

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("tripclick/val/torso")
for scoreddoc in dataset.scoreddocs_iter():
    scoreddoc # namedtuple<query_id, doc_id, score>

You can find more details about the Python API here.

CLI
ir_datasets export tripclick/val/torso scoreddocs --format tsv
[query_id]    [doc_id]    [score]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Citation

ir_datasets.bib:

\cite{Rekabsaz2021TripClick}

Bibtex:

@inproceedings{Rekabsaz2021TripClick, title={TripClick: The Log Files of a Large Health Web Search Engine}, author={Navid Rekabsaz and Oleg Lesota and Markus Schedl and Jon Brassey and Carsten Eickhoff}, year={2021}, booktitle={SIGIR} }