LoTTE - ir_datasets

`"lotte"`

LoTTE (Long-Tail Topic-stratified Evaluation) is a set of test collections focused on out-of-domain evaluation. It consists of data from several StackExchanges, with relevance assumed by either by upvotes (at least 1) or being selected as the accepted answer by the question's author.

Note that the dev and test corpora are disjoint to avoid leakage.

Documents: Answers to StackExchange questions
Queries: Natural language questions
Dataset Paper

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }

`"lotte/lifestyle/dev"`

Answers from lifestyle-focused forums, including bicycles, coffee, crafts, diy, gardening, lifehacks, mechanics, music, outdoors, parenting, pets, sports, and travel.

docs

269K docs

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/lifestyle/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/lifestyle/dev docs



[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/lifestyle/dev')
# Index lotte/lifestyle/dev
indexer = pt.IterDictIndexer('./indices/lotte_lifestyle_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }

Metadata

{
  "docs": {
    "count": 268893,
    "fields": {
      "doc_id": {
        "max_len": 6,
        "common_prefix": ""
      }
    }
  }
}

`"lotte/lifestyle/dev/forum"`

Forum queries for lotte/lifestyle/dev.

Official evaluation measures: Success@5

queries

2.1K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/lifestyle/dev/forum")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/lifestyle/dev/forum queries



[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/lifestyle/dev/forum')
index_ref = pt.IndexRef.of('./indices/lotte_lifestyle_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

269K docs

Inherits docs from lotte/lifestyle/dev

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/lifestyle/dev/forum")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/lifestyle/dev/forum docs



[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/lifestyle/dev/forum')
# Index lotte/lifestyle/dev
indexer = pt.IterDictIndexer('./indices/lotte_lifestyle_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels

13K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	Answer upvoted or accepted on stack exchange	`13K`	100.0%

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/lifestyle/dev/forum")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/lifestyle/dev/forum qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/lifestyle/dev/forum')
index_ref = pt.IndexRef.of('./indices/lotte_lifestyle_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }

Metadata

{
  "docs": {
    "count": 268893,
    "fields": {
      "doc_id": {
        "max_len": 6,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 2076
  },
  "qrels": {
    "count": 12823,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 12823
        }
      }
    }
  }
}

`"lotte/lifestyle/dev/search"`

Search queries for lotte/lifestyle/dev.

Official evaluation measures: Success@5

queries

417 queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/lifestyle/dev/search")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/lifestyle/dev/search queries



[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/lifestyle/dev/search')
index_ref = pt.IndexRef.of('./indices/lotte_lifestyle_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

269K docs

Inherits docs from lotte/lifestyle/dev

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/lifestyle/dev/search")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/lifestyle/dev/search docs



[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/lifestyle/dev/search')
# Index lotte/lifestyle/dev
indexer = pt.IterDictIndexer('./indices/lotte_lifestyle_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels

1.4K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	Answer upvoted or accepted on stack exchange	`1.4K`	100.0%

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/lifestyle/dev/search")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/lifestyle/dev/search qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/lifestyle/dev/search')
index_ref = pt.IndexRef.of('./indices/lotte_lifestyle_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }

Metadata

{
  "docs": {
    "count": 268893,
    "fields": {
      "doc_id": {
        "max_len": 6,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 417
  },
  "qrels": {
    "count": 1376,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 1376
        }
      }
    }
  }
}

`"lotte/lifestyle/test"`

Queries and answers from lifestyle-focused forums, including bicycles, coffee, crafts, diy, gardening, lifehacks, mechanics, music, outdoors, parenting, pets, sports, and travel.

docs

119K docs

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/lifestyle/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/lifestyle/test docs



[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/lifestyle/test')
# Index lotte/lifestyle/test
indexer = pt.IterDictIndexer('./indices/lotte_lifestyle_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }

Metadata

{
  "docs": {
    "count": 119461,
    "fields": {
      "doc_id": {
        "max_len": 6,
        "common_prefix": ""
      }
    }
  }
}

`"lotte/lifestyle/test/forum"`

Forum queries for lotte/lifestyle/test.

Official evaluation measures: Success@5

queries

2.0K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/lifestyle/test/forum")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/lifestyle/test/forum queries



[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/lifestyle/test/forum')
index_ref = pt.IndexRef.of('./indices/lotte_lifestyle_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

119K docs

Inherits docs from lotte/lifestyle/test

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/lifestyle/test/forum")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/lifestyle/test/forum docs



[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/lifestyle/test/forum')
# Index lotte/lifestyle/test
indexer = pt.IterDictIndexer('./indices/lotte_lifestyle_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels

10K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	Answer upvoted or accepted on stack exchange	`10K`	100.0%

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/lifestyle/test/forum")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/lifestyle/test/forum qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/lifestyle/test/forum')
index_ref = pt.IndexRef.of('./indices/lotte_lifestyle_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }

Metadata

{
  "docs": {
    "count": 119461,
    "fields": {
      "doc_id": {
        "max_len": 6,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 2002
  },
  "qrels": {
    "count": 10278,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 10278
        }
      }
    }
  }
}

`"lotte/lifestyle/test/search"`

Search queries for lotte/lifestyle/test.

Official evaluation measures: Success@5

queries

661 queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/lifestyle/test/search")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/lifestyle/test/search queries



[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/lifestyle/test/search')
index_ref = pt.IndexRef.of('./indices/lotte_lifestyle_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

119K docs

Inherits docs from lotte/lifestyle/test

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/lifestyle/test/search")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/lifestyle/test/search docs



[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/lifestyle/test/search')
# Index lotte/lifestyle/test
indexer = pt.IterDictIndexer('./indices/lotte_lifestyle_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels

1.8K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	Answer upvoted or accepted on stack exchange	`1.8K`	100.0%

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/lifestyle/test/search")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/lifestyle/test/search qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/lifestyle/test/search')
index_ref = pt.IndexRef.of('./indices/lotte_lifestyle_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }

Metadata

{
  "docs": {
    "count": 119461,
    "fields": {
      "doc_id": {
        "max_len": 6,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 661
  },
  "qrels": {
    "count": 1804,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 1804
        }
      }
    }
  }
}

`"lotte/pooled/dev"`

Combined version of lotte/lifestyle/dev, lotte/recreation/dev, lotte/science/dev, lotte/technology/dev, and lotte/writing/dev.

docs

2.4M docs

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/pooled/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/pooled/dev docs



[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/pooled/dev')
# Index lotte/pooled/dev
indexer = pt.IterDictIndexer('./indices/lotte_pooled_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }

Metadata

{
  "docs": {
    "count": 2428854,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  }
}

`"lotte/pooled/dev/forum"`

Forum queries for lotte/pooled/dev.

Official evaluation measures: Success@5

queries

10K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/pooled/dev/forum")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/pooled/dev/forum queries



[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/pooled/dev/forum')
index_ref = pt.IndexRef.of('./indices/lotte_pooled_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

2.4M docs

Inherits docs from lotte/pooled/dev

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/pooled/dev/forum")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/pooled/dev/forum docs



[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/pooled/dev/forum')
# Index lotte/pooled/dev
indexer = pt.IterDictIndexer('./indices/lotte_pooled_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels

69K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	Answer upvoted or accepted on stack exchange	`69K`	100.0%

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/pooled/dev/forum")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/pooled/dev/forum qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/pooled/dev/forum')
index_ref = pt.IndexRef.of('./indices/lotte_pooled_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }

Metadata

{
  "docs": {
    "count": 2428854,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 10097
  },
  "qrels": {
    "count": 68685,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 68685
        }
      }
    }
  }
}

`"lotte/pooled/dev/search"`

Search queries for lotte/pooled/dev.

Official evaluation measures: Success@5

queries

2.9K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/pooled/dev/search")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/pooled/dev/search queries



[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/pooled/dev/search')
index_ref = pt.IndexRef.of('./indices/lotte_pooled_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

2.4M docs

Inherits docs from lotte/pooled/dev

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/pooled/dev/search")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/pooled/dev/search docs



[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/pooled/dev/search')
# Index lotte/pooled/dev
indexer = pt.IterDictIndexer('./indices/lotte_pooled_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels

8.6K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	Answer upvoted or accepted on stack exchange	`8.6K`	100.0%

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/pooled/dev/search")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/pooled/dev/search qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/pooled/dev/search')
index_ref = pt.IndexRef.of('./indices/lotte_pooled_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }

Metadata

{
  "docs": {
    "count": 2428854,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 2931
  },
  "qrels": {
    "count": 8573,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 8573
        }
      }
    }
  }
}

`"lotte/pooled/test"`

Combined version of lotte/lifestyle/test, lotte/recreation/test, lotte/science/test, lotte/technology/test, and lotte/writing/test.

docs

2.8M docs

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/pooled/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/pooled/test docs



[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/pooled/test')
# Index lotte/pooled/test
indexer = pt.IterDictIndexer('./indices/lotte_pooled_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }

Metadata

{
  "docs": {
    "count": 2819103,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  }
}

`"lotte/pooled/test/forum"`

Forum queries for lotte/pooled/test.

Official evaluation measures: Success@5

queries

10K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/pooled/test/forum")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/pooled/test/forum queries



[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/pooled/test/forum')
index_ref = pt.IndexRef.of('./indices/lotte_pooled_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

2.8M docs

Inherits docs from lotte/pooled/test

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/pooled/test/forum")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/pooled/test/forum docs



[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/pooled/test/forum')
# Index lotte/pooled/test
indexer = pt.IterDictIndexer('./indices/lotte_pooled_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels

62K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	Answer upvoted or accepted on stack exchange	`62K`	100.0%

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/pooled/test/forum")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/pooled/test/forum qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/pooled/test/forum')
index_ref = pt.IndexRef.of('./indices/lotte_pooled_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }

Metadata

{
  "docs": {
    "count": 2819103,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 10025
  },
  "qrels": {
    "count": 61536,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 61536
        }
      }
    }
  }
}

`"lotte/pooled/test/search"`

Search queries for lotte/pooled/test.

Official evaluation measures: Success@5

queries

3.9K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/pooled/test/search")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/pooled/test/search queries



[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/pooled/test/search')
index_ref = pt.IndexRef.of('./indices/lotte_pooled_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

2.8M docs

Inherits docs from lotte/pooled/test

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/pooled/test/search")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/pooled/test/search docs



[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/pooled/test/search')
# Index lotte/pooled/test
indexer = pt.IterDictIndexer('./indices/lotte_pooled_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels

11K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	Answer upvoted or accepted on stack exchange	`11K`	100.0%

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/pooled/test/search")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/pooled/test/search qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/pooled/test/search')
index_ref = pt.IndexRef.of('./indices/lotte_pooled_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }

Metadata

{
  "docs": {
    "count": 2819103,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 3869
  },
  "qrels": {
    "count": 11124,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 11124
        }
      }
    }
  }
}

`"lotte/recreation/dev"`

Answers from recreation-focused forums, including anime, boardgames, gaming, movies, photo, rpg, and scifi.

docs

263K docs

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/recreation/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/recreation/dev docs



[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/recreation/dev')
# Index lotte/recreation/dev
indexer = pt.IterDictIndexer('./indices/lotte_recreation_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }

Metadata

{
  "docs": {
    "count": 263025,
    "fields": {
      "doc_id": {
        "max_len": 6,
        "common_prefix": ""
      }
    }
  }
}

`"lotte/recreation/dev/forum"`

Forum queries for lotte/recreation/dev.

Official evaluation measures: Success@5

queries

2.0K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/recreation/dev/forum")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/recreation/dev/forum queries



[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/recreation/dev/forum')
index_ref = pt.IndexRef.of('./indices/lotte_recreation_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

263K docs

Inherits docs from lotte/recreation/dev

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/recreation/dev/forum")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/recreation/dev/forum docs



[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/recreation/dev/forum')
# Index lotte/recreation/dev
indexer = pt.IterDictIndexer('./indices/lotte_recreation_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels

13K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	Answer upvoted or accepted on stack exchange	`13K`	100.0%

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/recreation/dev/forum")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/recreation/dev/forum qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/recreation/dev/forum')
index_ref = pt.IndexRef.of('./indices/lotte_recreation_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }

Metadata

{
  "docs": {
    "count": 263025,
    "fields": {
      "doc_id": {
        "max_len": 6,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 2002
  },
  "qrels": {
    "count": 12752,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 12752
        }
      }
    }
  }
}

`"lotte/recreation/dev/search"`

Search queries for lotte/recreation/dev.

Official evaluation measures: Success@5

queries

563 queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/recreation/dev/search")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/recreation/dev/search queries



[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/recreation/dev/search')
index_ref = pt.IndexRef.of('./indices/lotte_recreation_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

263K docs

Inherits docs from lotte/recreation/dev

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/recreation/dev/search")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/recreation/dev/search docs



[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/recreation/dev/search')
# Index lotte/recreation/dev
indexer = pt.IterDictIndexer('./indices/lotte_recreation_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels

1.8K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	Answer upvoted or accepted on stack exchange	`1.8K`	100.0%

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/recreation/dev/search")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/recreation/dev/search qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/recreation/dev/search')
index_ref = pt.IndexRef.of('./indices/lotte_recreation_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }

Metadata

{
  "docs": {
    "count": 263025,
    "fields": {
      "doc_id": {
        "max_len": 6,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 563
  },
  "qrels": {
    "count": 1754,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 1754
        }
      }
    }
  }
}

`"lotte/recreation/test"`

Answers from recreation-focused forums, including anime, boardgames, gaming, movies, photo, rpg, and scifi.

docs

167K docs

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/recreation/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/recreation/test docs



[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/recreation/test')
# Index lotte/recreation/test
indexer = pt.IterDictIndexer('./indices/lotte_recreation_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }

Metadata

{
  "docs": {
    "count": 166975,
    "fields": {
      "doc_id": {
        "max_len": 6,
        "common_prefix": ""
      }
    }
  }
}

`"lotte/recreation/test/forum"`

Forum queries for lotte/recreation/test.

Official evaluation measures: Success@5

queries

2.0K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/recreation/test/forum")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/recreation/test/forum queries



[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/recreation/test/forum')
index_ref = pt.IndexRef.of('./indices/lotte_recreation_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

167K docs

Inherits docs from lotte/recreation/test

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/recreation/test/forum")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/recreation/test/forum docs



[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/recreation/test/forum')
# Index lotte/recreation/test
indexer = pt.IterDictIndexer('./indices/lotte_recreation_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels

6.9K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	Answer upvoted or accepted on stack exchange	`6.9K`	100.0%

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/recreation/test/forum")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/recreation/test/forum qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/recreation/test/forum')
index_ref = pt.IndexRef.of('./indices/lotte_recreation_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }

Metadata

{
  "docs": {
    "count": 166975,
    "fields": {
      "doc_id": {
        "max_len": 6,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 2002
  },
  "qrels": {
    "count": 6947,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 6947
        }
      }
    }
  }
}

`"lotte/recreation/test/search"`

Search queries for lotte/recreation/test.

Official evaluation measures: Success@5

queries

924 queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/recreation/test/search")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/recreation/test/search queries



[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/recreation/test/search')
index_ref = pt.IndexRef.of('./indices/lotte_recreation_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

167K docs

Inherits docs from lotte/recreation/test

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/recreation/test/search")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/recreation/test/search docs



[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/recreation/test/search')
# Index lotte/recreation/test
indexer = pt.IterDictIndexer('./indices/lotte_recreation_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels

2.0K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	Answer upvoted or accepted on stack exchange	`2.0K`	100.0%

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/recreation/test/search")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/recreation/test/search qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/recreation/test/search')
index_ref = pt.IndexRef.of('./indices/lotte_recreation_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }

Metadata

{
  "docs": {
    "count": 166975,
    "fields": {
      "doc_id": {
        "max_len": 6,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 924
  },
  "qrels": {
    "count": 1991,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 1991
        }
      }
    }
  }
}

`"lotte/science/dev"`

Answers from science-focused forums, including academia, astronomy, biology, chemistry, datasciene, earthscience, engineering, math, philosophy, physics, and stats.

docs

344K docs

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/science/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/science/dev docs



[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/science/dev')
# Index lotte/science/dev
indexer = pt.IterDictIndexer('./indices/lotte_science_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }

Metadata

{
  "docs": {
    "count": 343642,
    "fields": {
      "doc_id": {
        "max_len": 6,
        "common_prefix": ""
      }
    }
  }
}

`"lotte/science/dev/forum"`

Forum queries for lotte/science/dev.

Official evaluation measures: Success@5

queries

2.0K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/science/dev/forum")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/science/dev/forum queries



[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/science/dev/forum')
index_ref = pt.IndexRef.of('./indices/lotte_science_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

344K docs

Inherits docs from lotte/science/dev

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/science/dev/forum")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/science/dev/forum docs



[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/science/dev/forum')
# Index lotte/science/dev
indexer = pt.IterDictIndexer('./indices/lotte_science_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels

12K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	Answer upvoted or accepted on stack exchange	`12K`	100.0%

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/science/dev/forum")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/science/dev/forum qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/science/dev/forum')
index_ref = pt.IndexRef.of('./indices/lotte_science_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }

Metadata

{
  "docs": {
    "count": 343642,
    "fields": {
      "doc_id": {
        "max_len": 6,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 2013
  },
  "qrels": {
    "count": 12271,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 12271
        }
      }
    }
  }
}

`"lotte/science/dev/search"`

Search queries for lotte/science/dev.

Official evaluation measures: Success@5

queries

538 queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/science/dev/search")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/science/dev/search queries



[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/science/dev/search')
index_ref = pt.IndexRef.of('./indices/lotte_science_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

344K docs

Inherits docs from lotte/science/dev

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/science/dev/search")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/science/dev/search docs



[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/science/dev/search')
# Index lotte/science/dev
indexer = pt.IterDictIndexer('./indices/lotte_science_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels

1.5K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	Answer upvoted or accepted on stack exchange	`1.5K`	100.0%

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/science/dev/search")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/science/dev/search qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/science/dev/search')
index_ref = pt.IndexRef.of('./indices/lotte_science_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }

Metadata

{
  "docs": {
    "count": 343642,
    "fields": {
      "doc_id": {
        "max_len": 6,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 538
  },
  "qrels": {
    "count": 1480,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 1480
        }
      }
    }
  }
}

`"lotte/science/test"`

Answers from science-focused forums, including academia, astronomy, biology, chemistry, datasciene, earthscience, engineering, math, philosophy, physics, and stats.

docs

1.7M docs

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/science/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/science/test docs



[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/science/test')
# Index lotte/science/test
indexer = pt.IterDictIndexer('./indices/lotte_science_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }

Metadata

{
  "docs": {
    "count": 1694164,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  }
}

`"lotte/science/test/forum"`

Forum queries for lotte/science/test.

Official evaluation measures: Success@5

queries

2.0K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/science/test/forum")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/science/test/forum queries



[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/science/test/forum')
index_ref = pt.IndexRef.of('./indices/lotte_science_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

1.7M docs

Inherits docs from lotte/science/test

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/science/test/forum")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/science/test/forum docs



[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/science/test/forum')
# Index lotte/science/test
indexer = pt.IterDictIndexer('./indices/lotte_science_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels

16K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	Answer upvoted or accepted on stack exchange	`16K`	100.0%

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/science/test/forum")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/science/test/forum qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/science/test/forum')
index_ref = pt.IndexRef.of('./indices/lotte_science_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }

Metadata

{
  "docs": {
    "count": 1694164,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 2017
  },
  "qrels": {
    "count": 15515,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 15515
        }
      }
    }
  }
}

`"lotte/science/test/search"`

Search queries for lotte/science/test.

Official evaluation measures: Success@5

queries

617 queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/science/test/search")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/science/test/search queries



[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/science/test/search')
index_ref = pt.IndexRef.of('./indices/lotte_science_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

1.7M docs

Inherits docs from lotte/science/test

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/science/test/search")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/science/test/search docs



[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/science/test/search')
# Index lotte/science/test
indexer = pt.IterDictIndexer('./indices/lotte_science_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels

1.7K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	Answer upvoted or accepted on stack exchange	`1.7K`	100.0%

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/science/test/search")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/science/test/search qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/science/test/search')
index_ref = pt.IndexRef.of('./indices/lotte_science_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }

Metadata

{
  "docs": {
    "count": 1694164,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 617
  },
  "qrels": {
    "count": 1738,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 1738
        }
      }
    }
  }
}

`"lotte/technology/dev"`

Answers from technology-focused forums, including android, apple, askubuntu, electronics, networkengineering, security, serverfault, softwareengineering, superuser, unix, and webapps.

docs

1.3M docs

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/technology/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/technology/dev docs



[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/technology/dev')
# Index lotte/technology/dev
indexer = pt.IterDictIndexer('./indices/lotte_technology_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }

Metadata

{
  "docs": {
    "count": 1276222,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  }
}

`"lotte/technology/dev/forum"`

Forum queries for lotte/technology/dev.

Official evaluation measures: Success@5

queries

2.0K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/technology/dev/forum")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/technology/dev/forum queries



[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/technology/dev/forum')
index_ref = pt.IndexRef.of('./indices/lotte_technology_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

1.3M docs

Inherits docs from lotte/technology/dev

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/technology/dev/forum")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/technology/dev/forum docs



[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/technology/dev/forum')
# Index lotte/technology/dev
indexer = pt.IterDictIndexer('./indices/lotte_technology_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels

16K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	Answer upvoted or accepted on stack exchange	`16K`	100.0%

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/technology/dev/forum")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/technology/dev/forum qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/technology/dev/forum')
index_ref = pt.IndexRef.of('./indices/lotte_technology_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }

Metadata

{
  "docs": {
    "count": 1276222,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 2003
  },
  "qrels": {
    "count": 15741,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 15741
        }
      }
    }
  }
}

`"lotte/technology/dev/search"`

Search queries for lotte/technology/dev.

Official evaluation measures: Success@5

queries

916 queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/technology/dev/search")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/technology/dev/search queries



[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/technology/dev/search')
index_ref = pt.IndexRef.of('./indices/lotte_technology_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

1.3M docs

Inherits docs from lotte/technology/dev

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/technology/dev/search")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/technology/dev/search docs



[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/technology/dev/search')
# Index lotte/technology/dev
indexer = pt.IterDictIndexer('./indices/lotte_technology_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels

2.7K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	Answer upvoted or accepted on stack exchange	`2.7K`	100.0%

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/technology/dev/search")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/technology/dev/search qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/technology/dev/search')
index_ref = pt.IndexRef.of('./indices/lotte_technology_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }

Metadata

{
  "docs": {
    "count": 1276222,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 916
  },
  "qrels": {
    "count": 2676,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 2676
        }
      }
    }
  }
}

`"lotte/technology/test"`

Answers from technology-focused forums, including android, apple, askubuntu, electronics, networkengineering, security, serverfault, softwareengineering, superuser, unix, and webapps.

docs

639K docs

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/technology/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/technology/test docs



[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/technology/test')
# Index lotte/technology/test
indexer = pt.IterDictIndexer('./indices/lotte_technology_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }

Metadata

{
  "docs": {
    "count": 638509,
    "fields": {
      "doc_id": {
        "max_len": 6,
        "common_prefix": ""
      }
    }
  }
}

`"lotte/technology/test/forum"`

Forum queries for lotte/technology/test.

Official evaluation measures: Success@5

queries

2.0K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/technology/test/forum")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/technology/test/forum queries



[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/technology/test/forum')
index_ref = pt.IndexRef.of('./indices/lotte_technology_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

639K docs

Inherits docs from lotte/technology/test

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/technology/test/forum")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/technology/test/forum docs



[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/technology/test/forum')
# Index lotte/technology/test
indexer = pt.IterDictIndexer('./indices/lotte_technology_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels

16K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	Answer upvoted or accepted on stack exchange	`16K`	100.0%

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/technology/test/forum")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/technology/test/forum qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/technology/test/forum')
index_ref = pt.IndexRef.of('./indices/lotte_technology_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }

Metadata

{
  "docs": {
    "count": 638509,
    "fields": {
      "doc_id": {
        "max_len": 6,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 2004
  },
  "qrels": {
    "count": 15890,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 15890
        }
      }
    }
  }
}

`"lotte/technology/test/search"`

Search queries for lotte/technology/test.

Official evaluation measures: Success@5

queries

596 queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/technology/test/search")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/technology/test/search queries



[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/technology/test/search')
index_ref = pt.IndexRef.of('./indices/lotte_technology_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

639K docs

Inherits docs from lotte/technology/test

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/technology/test/search")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/technology/test/search docs



[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/technology/test/search')
# Index lotte/technology/test
indexer = pt.IterDictIndexer('./indices/lotte_technology_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels

2.0K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	Answer upvoted or accepted on stack exchange	`2.0K`	100.0%

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/technology/test/search")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/technology/test/search qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/technology/test/search')
index_ref = pt.IndexRef.of('./indices/lotte_technology_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }

Metadata

{
  "docs": {
    "count": 638509,
    "fields": {
      "doc_id": {
        "max_len": 6,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 596
  },
  "qrels": {
    "count": 2045,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 2045
        }
      }
    }
  }
}

`"lotte/writing/dev"`

Answers from writing-focused forums, including ell, english, linguistics, literature, worldbuilding, and writing.

docs

277K docs

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/writing/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/writing/dev docs



[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/writing/dev')
# Index lotte/writing/dev
indexer = pt.IterDictIndexer('./indices/lotte_writing_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }

Metadata

{
  "docs": {
    "count": 277072,
    "fields": {
      "doc_id": {
        "max_len": 6,
        "common_prefix": ""
      }
    }
  }
}

`"lotte/writing/dev/forum"`

Forum queries for lotte/writing/dev.

Official evaluation measures: Success@5

queries

2.0K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/writing/dev/forum")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/writing/dev/forum queries



[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/writing/dev/forum')
index_ref = pt.IndexRef.of('./indices/lotte_writing_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

277K docs

Inherits docs from lotte/writing/dev

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/writing/dev/forum")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/writing/dev/forum docs



[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/writing/dev/forum')
# Index lotte/writing/dev
indexer = pt.IterDictIndexer('./indices/lotte_writing_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels

15K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	Answer upvoted or accepted on stack exchange	`15K`	100.0%

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/writing/dev/forum")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/writing/dev/forum qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/writing/dev/forum')
index_ref = pt.IndexRef.of('./indices/lotte_writing_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }

Metadata

{
  "docs": {
    "count": 277072,
    "fields": {
      "doc_id": {
        "max_len": 6,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 2003
  },
  "qrels": {
    "count": 15098,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 15098
        }
      }
    }
  }
}

`"lotte/writing/dev/search"`

Search queries for lotte/writing/dev.

Official evaluation measures: Success@5

queries

497 queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/writing/dev/search")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/writing/dev/search queries



[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/writing/dev/search')
index_ref = pt.IndexRef.of('./indices/lotte_writing_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

277K docs

Inherits docs from lotte/writing/dev

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/writing/dev/search")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/writing/dev/search docs



[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/writing/dev/search')
# Index lotte/writing/dev
indexer = pt.IterDictIndexer('./indices/lotte_writing_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels

1.3K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	Answer upvoted or accepted on stack exchange	`1.3K`	100.0%

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/writing/dev/search")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/writing/dev/search qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/writing/dev/search')
index_ref = pt.IndexRef.of('./indices/lotte_writing_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }

Metadata

{
  "docs": {
    "count": 277072,
    "fields": {
      "doc_id": {
        "max_len": 6,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 497
  },
  "qrels": {
    "count": 1287,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 1287
        }
      }
    }
  }
}

`"lotte/writing/test"`

Answers from writing-focused forums, including ell, english, linguistics, literature, worldbuilding, and writing.

docs

200K docs

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/writing/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/writing/test docs



[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/writing/test')
# Index lotte/writing/test
indexer = pt.IterDictIndexer('./indices/lotte_writing_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }

Metadata

{
  "docs": {
    "count": 199994,
    "fields": {
      "doc_id": {
        "max_len": 6,
        "common_prefix": ""
      }
    }
  }
}

`"lotte/writing/test/forum"`

Forum queries for lotte/writing/test.

Official evaluation measures: Success@5

queries

2.0K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/writing/test/forum")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/writing/test/forum queries



[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/writing/test/forum')
index_ref = pt.IndexRef.of('./indices/lotte_writing_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

200K docs

Inherits docs from lotte/writing/test

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/writing/test/forum")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/writing/test/forum docs



[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/writing/test/forum')
# Index lotte/writing/test
indexer = pt.IterDictIndexer('./indices/lotte_writing_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels

13K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	Answer upvoted or accepted on stack exchange	`13K`	100.0%

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/writing/test/forum")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/writing/test/forum qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/writing/test/forum')
index_ref = pt.IndexRef.of('./indices/lotte_writing_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }

Metadata

{
  "docs": {
    "count": 199994,
    "fields": {
      "doc_id": {
        "max_len": 6,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 2000
  },
  "qrels": {
    "count": 12906,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 12906
        }
      }
    }
  }
}

`"lotte/writing/test/search"`

Search queries for lotte/writing/test.

Official evaluation measures: Success@5

queries

1.1K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/writing/test/search")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/writing/test/search queries



[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/writing/test/search')
index_ref = pt.IndexRef.of('./indices/lotte_writing_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs

200K docs

Inherits docs from lotte/writing/test

Language: en

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/writing/test/search")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/writing/test/search docs



[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/writing/test/search')
# Index lotte/writing/test
indexer = pt.IterDictIndexer('./indices/lotte_writing_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels

3.5K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	Answer upvoted or accepted on stack exchange	`3.5K`	100.0%

Examples:

Python API

import ir_datasets
dataset = ir_datasets.load("lotte/writing/test/search")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export lotte/writing/test/search qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/writing/test/search')
index_ref = pt.IndexRef.of('./indices/lotte_writing_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }

Metadata

{
  "docs": {
    "count": 199994,
    "fields": {
      "doc_id": {
        "max_len": 6,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 1071
  },
  "qrels": {
    "count": 3546,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 3546
        }
      }
    }
  }
}

ir_datasets: LoTTE

"lotte"

"lotte/lifestyle/dev"

"lotte/lifestyle/dev/forum"

"lotte/lifestyle/dev/search"

"lotte/lifestyle/test"

"lotte/lifestyle/test/forum"

"lotte/lifestyle/test/search"

"lotte/pooled/dev"

"lotte/pooled/dev/forum"

"lotte/pooled/dev/search"

"lotte/pooled/test"

"lotte/pooled/test/forum"

"lotte/pooled/test/search"

"lotte/recreation/dev"

"lotte/recreation/dev/forum"

"lotte/recreation/dev/search"

"lotte/recreation/test"

"lotte/recreation/test/forum"

"lotte/recreation/test/search"

"lotte/science/dev"

"lotte/science/dev/forum"

"lotte/science/dev/search"

"lotte/science/test"

"lotte/science/test/forum"

"lotte/science/test/search"

"lotte/technology/dev"

"lotte/technology/dev/forum"

"lotte/technology/dev/search"

"lotte/technology/test"

"lotte/technology/test/forum"

"lotte/technology/test/search"

"lotte/writing/dev"

"lotte/writing/dev/forum"

"lotte/writing/dev/search"

"lotte/writing/test"

"lotte/writing/test/forum"

"lotte/writing/test/search"

`ir_datasets`: LoTTE

`"lotte"`

`"lotte/lifestyle/dev"`

`"lotte/lifestyle/dev/forum"`

`"lotte/lifestyle/dev/search"`

`"lotte/lifestyle/test"`

`"lotte/lifestyle/test/forum"`

`"lotte/lifestyle/test/search"`

`"lotte/pooled/dev"`

`"lotte/pooled/dev/forum"`

`"lotte/pooled/dev/search"`

`"lotte/pooled/test"`

`"lotte/pooled/test/forum"`

`"lotte/pooled/test/search"`

`"lotte/recreation/dev"`

`"lotte/recreation/dev/forum"`

`"lotte/recreation/dev/search"`

`"lotte/recreation/test"`

`"lotte/recreation/test/forum"`

`"lotte/recreation/test/search"`

`"lotte/science/dev"`

`"lotte/science/dev/forum"`

`"lotte/science/dev/search"`

`"lotte/science/test"`

`"lotte/science/test/forum"`

`"lotte/science/test/search"`

`"lotte/technology/dev"`

`"lotte/technology/dev/forum"`

`"lotte/technology/dev/search"`

`"lotte/technology/test"`

`"lotte/technology/test/forum"`

`"lotte/technology/test/search"`

`"lotte/writing/dev"`

`"lotte/writing/dev/forum"`

`"lotte/writing/dev/search"`

`"lotte/writing/test"`

`"lotte/writing/test/forum"`

`"lotte/writing/test/search"`