← home
Github: datasets/lotte.py

ir_datasets: LoTTE

Index
  1. lotte
  2. lotte/lifestyle/dev
  3. lotte/lifestyle/dev/forum
  4. lotte/lifestyle/dev/search
  5. lotte/lifestyle/test
  6. lotte/lifestyle/test/forum
  7. lotte/lifestyle/test/search
  8. lotte/pooled/dev
  9. lotte/pooled/dev/forum
  10. lotte/pooled/dev/search
  11. lotte/pooled/test
  12. lotte/pooled/test/forum
  13. lotte/pooled/test/search
  14. lotte/recreation/dev
  15. lotte/recreation/dev/forum
  16. lotte/recreation/dev/search
  17. lotte/recreation/test
  18. lotte/recreation/test/forum
  19. lotte/recreation/test/search
  20. lotte/science/dev
  21. lotte/science/dev/forum
  22. lotte/science/dev/search
  23. lotte/science/test
  24. lotte/science/test/forum
  25. lotte/science/test/search
  26. lotte/technology/dev
  27. lotte/technology/dev/forum
  28. lotte/technology/dev/search
  29. lotte/technology/test
  30. lotte/technology/test/forum
  31. lotte/technology/test/search
  32. lotte/writing/dev
  33. lotte/writing/dev/forum
  34. lotte/writing/dev/search
  35. lotte/writing/test
  36. lotte/writing/test/forum
  37. lotte/writing/test/search

"lotte"

LoTTE (Long-Tail Topic-stratified Evaluation) is a set of test collections focused on out-of-domain evaluation. It consists of data from several StackExchanges, with relevance assumed by either by upvotes (at least 1) or being selected as the accepted answer by the question's author.

Note that the dev and test corpora are disjoint to avoid leakage.

  • Documents: Answers to StackExchange questions
  • Queries: Natural language questions
  • Dataset Paper
Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }

"lotte/lifestyle/dev"

Answers from lifestyle-focused forums, including bicycles, coffee, crafts, diy, gardening, lifehacks, mechanics, music, outdoors, parenting, pets, sports, and travel.

docs
269K docs

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/lifestyle/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/lifestyle/dev docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/lifestyle/dev')
# Index lotte/lifestyle/dev
indexer = pt.IterDictIndexer('./indices/lotte_lifestyle_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }
Metadata

"lotte/lifestyle/dev/forum"

Forum queries for lotte/lifestyle/dev.

Official evaluation measures: Success@5

queries
2.1K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/lifestyle/dev/forum")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/lifestyle/dev/forum queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/lifestyle/dev/forum')
index_ref = pt.IndexRef.of('./indices/lotte_lifestyle_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs
269K docs

Inherits docs from lotte/lifestyle/dev

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/lifestyle/dev/forum")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/lifestyle/dev/forum docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/lifestyle/dev/forum')
# Index lotte/lifestyle/dev
indexer = pt.IterDictIndexer('./indices/lotte_lifestyle_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels
13K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Answer upvoted or accepted on stack exchange13K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/lifestyle/dev/forum")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/lifestyle/dev/forum qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/lifestyle/dev/forum')
index_ref = pt.IndexRef.of('./indices/lotte_lifestyle_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }
Metadata

"lotte/lifestyle/dev/search"

Search queries for lotte/lifestyle/dev.

Official evaluation measures: Success@5

queries
417 queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/lifestyle/dev/search")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/lifestyle/dev/search queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/lifestyle/dev/search')
index_ref = pt.IndexRef.of('./indices/lotte_lifestyle_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs
269K docs

Inherits docs from lotte/lifestyle/dev

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/lifestyle/dev/search")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/lifestyle/dev/search docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/lifestyle/dev/search')
# Index lotte/lifestyle/dev
indexer = pt.IterDictIndexer('./indices/lotte_lifestyle_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels
1.4K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Answer upvoted or accepted on stack exchange1.4K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/lifestyle/dev/search")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/lifestyle/dev/search qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/lifestyle/dev/search')
index_ref = pt.IndexRef.of('./indices/lotte_lifestyle_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }
Metadata

"lotte/lifestyle/test"

Queries and answers from lifestyle-focused forums, including bicycles, coffee, crafts, diy, gardening, lifehacks, mechanics, music, outdoors, parenting, pets, sports, and travel.

docs
119K docs

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/lifestyle/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/lifestyle/test docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/lifestyle/test')
# Index lotte/lifestyle/test
indexer = pt.IterDictIndexer('./indices/lotte_lifestyle_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }
Metadata

"lotte/lifestyle/test/forum"

Forum queries for lotte/lifestyle/test.

Official evaluation measures: Success@5

queries
2.0K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/lifestyle/test/forum")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/lifestyle/test/forum queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/lifestyle/test/forum')
index_ref = pt.IndexRef.of('./indices/lotte_lifestyle_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs
119K docs

Inherits docs from lotte/lifestyle/test

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/lifestyle/test/forum")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/lifestyle/test/forum docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/lifestyle/test/forum')
# Index lotte/lifestyle/test
indexer = pt.IterDictIndexer('./indices/lotte_lifestyle_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels
10K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Answer upvoted or accepted on stack exchange10K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/lifestyle/test/forum")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/lifestyle/test/forum qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/lifestyle/test/forum')
index_ref = pt.IndexRef.of('./indices/lotte_lifestyle_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }
Metadata

"lotte/lifestyle/test/search"

Search queries for lotte/lifestyle/test.

Official evaluation measures: Success@5

queries
661 queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/lifestyle/test/search")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/lifestyle/test/search queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/lifestyle/test/search')
index_ref = pt.IndexRef.of('./indices/lotte_lifestyle_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs
119K docs

Inherits docs from lotte/lifestyle/test

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/lifestyle/test/search")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/lifestyle/test/search docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/lifestyle/test/search')
# Index lotte/lifestyle/test
indexer = pt.IterDictIndexer('./indices/lotte_lifestyle_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels
1.8K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Answer upvoted or accepted on stack exchange1.8K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/lifestyle/test/search")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/lifestyle/test/search qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/lifestyle/test/search')
index_ref = pt.IndexRef.of('./indices/lotte_lifestyle_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }
Metadata

"lotte/pooled/dev"

docs
2.4M docs

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/pooled/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/pooled/dev docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/pooled/dev')
# Index lotte/pooled/dev
indexer = pt.IterDictIndexer('./indices/lotte_pooled_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }
Metadata

"lotte/pooled/dev/forum"

Forum queries for lotte/pooled/dev.

Official evaluation measures: Success@5

queries
10K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/pooled/dev/forum")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/pooled/dev/forum queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/pooled/dev/forum')
index_ref = pt.IndexRef.of('./indices/lotte_pooled_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs
2.4M docs

Inherits docs from lotte/pooled/dev

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/pooled/dev/forum")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/pooled/dev/forum docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/pooled/dev/forum')
# Index lotte/pooled/dev
indexer = pt.IterDictIndexer('./indices/lotte_pooled_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels
69K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Answer upvoted or accepted on stack exchange69K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/pooled/dev/forum")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/pooled/dev/forum qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/pooled/dev/forum')
index_ref = pt.IndexRef.of('./indices/lotte_pooled_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }
Metadata

"lotte/pooled/dev/search"

Search queries for lotte/pooled/dev.

Official evaluation measures: Success@5

queries
2.9K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/pooled/dev/search")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/pooled/dev/search queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/pooled/dev/search')
index_ref = pt.IndexRef.of('./indices/lotte_pooled_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs
2.4M docs

Inherits docs from lotte/pooled/dev

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/pooled/dev/search")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/pooled/dev/search docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/pooled/dev/search')
# Index lotte/pooled/dev
indexer = pt.IterDictIndexer('./indices/lotte_pooled_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels
8.6K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Answer upvoted or accepted on stack exchange8.6K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/pooled/dev/search")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/pooled/dev/search qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/pooled/dev/search')
index_ref = pt.IndexRef.of('./indices/lotte_pooled_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }
Metadata

"lotte/pooled/test"

docs
2.8M docs

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/pooled/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/pooled/test docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/pooled/test')
# Index lotte/pooled/test
indexer = pt.IterDictIndexer('./indices/lotte_pooled_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }
Metadata

"lotte/pooled/test/forum"

Forum queries for lotte/pooled/test.

Official evaluation measures: Success@5

queries
10K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/pooled/test/forum")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/pooled/test/forum queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/pooled/test/forum')
index_ref = pt.IndexRef.of('./indices/lotte_pooled_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs
2.8M docs

Inherits docs from lotte/pooled/test

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/pooled/test/forum")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/pooled/test/forum docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/pooled/test/forum')
# Index lotte/pooled/test
indexer = pt.IterDictIndexer('./indices/lotte_pooled_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels
62K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Answer upvoted or accepted on stack exchange62K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/pooled/test/forum")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/pooled/test/forum qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/pooled/test/forum')
index_ref = pt.IndexRef.of('./indices/lotte_pooled_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }
Metadata

"lotte/pooled/test/search"

Search queries for lotte/pooled/test.

Official evaluation measures: Success@5

queries
3.9K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/pooled/test/search")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/pooled/test/search queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/pooled/test/search')
index_ref = pt.IndexRef.of('./indices/lotte_pooled_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs
2.8M docs

Inherits docs from lotte/pooled/test

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/pooled/test/search")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/pooled/test/search docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/pooled/test/search')
# Index lotte/pooled/test
indexer = pt.IterDictIndexer('./indices/lotte_pooled_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels
11K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Answer upvoted or accepted on stack exchange11K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/pooled/test/search")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/pooled/test/search qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/pooled/test/search')
index_ref = pt.IndexRef.of('./indices/lotte_pooled_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }
Metadata

"lotte/recreation/dev"

Answers from recreation-focused forums, including anime, boardgames, gaming, movies, photo, rpg, and scifi.

docs
263K docs

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/recreation/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/recreation/dev docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/recreation/dev')
# Index lotte/recreation/dev
indexer = pt.IterDictIndexer('./indices/lotte_recreation_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }
Metadata

"lotte/recreation/dev/forum"

Forum queries for lotte/recreation/dev.

Official evaluation measures: Success@5

queries
2.0K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/recreation/dev/forum")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/recreation/dev/forum queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/recreation/dev/forum')
index_ref = pt.IndexRef.of('./indices/lotte_recreation_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs
263K docs

Inherits docs from lotte/recreation/dev

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/recreation/dev/forum")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/recreation/dev/forum docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/recreation/dev/forum')
# Index lotte/recreation/dev
indexer = pt.IterDictIndexer('./indices/lotte_recreation_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels
13K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Answer upvoted or accepted on stack exchange13K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/recreation/dev/forum")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/recreation/dev/forum qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/recreation/dev/forum')
index_ref = pt.IndexRef.of('./indices/lotte_recreation_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }
Metadata

"lotte/recreation/dev/search"

Search queries for lotte/recreation/dev.

Official evaluation measures: Success@5

queries
563 queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/recreation/dev/search")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/recreation/dev/search queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/recreation/dev/search')
index_ref = pt.IndexRef.of('./indices/lotte_recreation_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs
263K docs

Inherits docs from lotte/recreation/dev

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/recreation/dev/search")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/recreation/dev/search docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/recreation/dev/search')
# Index lotte/recreation/dev
indexer = pt.IterDictIndexer('./indices/lotte_recreation_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels
1.8K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Answer upvoted or accepted on stack exchange1.8K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/recreation/dev/search")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/recreation/dev/search qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/recreation/dev/search')
index_ref = pt.IndexRef.of('./indices/lotte_recreation_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }
Metadata

"lotte/recreation/test"

Answers from recreation-focused forums, including anime, boardgames, gaming, movies, photo, rpg, and scifi.

docs
167K docs

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/recreation/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/recreation/test docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/recreation/test')
# Index lotte/recreation/test
indexer = pt.IterDictIndexer('./indices/lotte_recreation_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }
Metadata

"lotte/recreation/test/forum"

Forum queries for lotte/recreation/test.

Official evaluation measures: Success@5

queries
2.0K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/recreation/test/forum")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/recreation/test/forum queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/recreation/test/forum')
index_ref = pt.IndexRef.of('./indices/lotte_recreation_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs
167K docs

Inherits docs from lotte/recreation/test

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/recreation/test/forum")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/recreation/test/forum docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/recreation/test/forum')
# Index lotte/recreation/test
indexer = pt.IterDictIndexer('./indices/lotte_recreation_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels
6.9K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Answer upvoted or accepted on stack exchange6.9K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/recreation/test/forum")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/recreation/test/forum qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/recreation/test/forum')
index_ref = pt.IndexRef.of('./indices/lotte_recreation_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }
Metadata

"lotte/recreation/test/search"

Search queries for lotte/recreation/test.

Official evaluation measures: Success@5

queries
924 queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/recreation/test/search")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/recreation/test/search queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/recreation/test/search')
index_ref = pt.IndexRef.of('./indices/lotte_recreation_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs
167K docs

Inherits docs from lotte/recreation/test

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/recreation/test/search")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/recreation/test/search docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/recreation/test/search')
# Index lotte/recreation/test
indexer = pt.IterDictIndexer('./indices/lotte_recreation_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels
2.0K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Answer upvoted or accepted on stack exchange2.0K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/recreation/test/search")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/recreation/test/search qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/recreation/test/search')
index_ref = pt.IndexRef.of('./indices/lotte_recreation_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }
Metadata

"lotte/science/dev"

Answers from science-focused forums, including academia, astronomy, biology, chemistry, datasciene, earthscience, engineering, math, philosophy, physics, and stats.

docs
344K docs

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/science/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/science/dev docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/science/dev')
# Index lotte/science/dev
indexer = pt.IterDictIndexer('./indices/lotte_science_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }
Metadata

"lotte/science/dev/forum"

Forum queries for lotte/science/dev.

Official evaluation measures: Success@5

queries
2.0K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/science/dev/forum")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/science/dev/forum queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/science/dev/forum')
index_ref = pt.IndexRef.of('./indices/lotte_science_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs
344K docs

Inherits docs from lotte/science/dev

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/science/dev/forum")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/science/dev/forum docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/science/dev/forum')
# Index lotte/science/dev
indexer = pt.IterDictIndexer('./indices/lotte_science_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels
12K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Answer upvoted or accepted on stack exchange12K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/science/dev/forum")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/science/dev/forum qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/science/dev/forum')
index_ref = pt.IndexRef.of('./indices/lotte_science_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }
Metadata

"lotte/science/dev/search"

Search queries for lotte/science/dev.

Official evaluation measures: Success@5

queries
538 queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/science/dev/search")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/science/dev/search queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/science/dev/search')
index_ref = pt.IndexRef.of('./indices/lotte_science_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs
344K docs

Inherits docs from lotte/science/dev

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/science/dev/search")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/science/dev/search docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/science/dev/search')
# Index lotte/science/dev
indexer = pt.IterDictIndexer('./indices/lotte_science_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels
1.5K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Answer upvoted or accepted on stack exchange1.5K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/science/dev/search")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/science/dev/search qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/science/dev/search')
index_ref = pt.IndexRef.of('./indices/lotte_science_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }
Metadata

"lotte/science/test"

Answers from science-focused forums, including academia, astronomy, biology, chemistry, datasciene, earthscience, engineering, math, philosophy, physics, and stats.

docs
1.7M docs

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/science/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/science/test docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/science/test')
# Index lotte/science/test
indexer = pt.IterDictIndexer('./indices/lotte_science_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }
Metadata

"lotte/science/test/forum"

Forum queries for lotte/science/test.

Official evaluation measures: Success@5

queries
2.0K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/science/test/forum")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/science/test/forum queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/science/test/forum')
index_ref = pt.IndexRef.of('./indices/lotte_science_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs
1.7M docs

Inherits docs from lotte/science/test

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/science/test/forum")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/science/test/forum docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/science/test/forum')
# Index lotte/science/test
indexer = pt.IterDictIndexer('./indices/lotte_science_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels
16K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Answer upvoted or accepted on stack exchange16K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/science/test/forum")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/science/test/forum qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/science/test/forum')
index_ref = pt.IndexRef.of('./indices/lotte_science_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }
Metadata

"lotte/science/test/search"

Search queries for lotte/science/test.

Official evaluation measures: Success@5

queries
617 queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/science/test/search")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/science/test/search queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/science/test/search')
index_ref = pt.IndexRef.of('./indices/lotte_science_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs
1.7M docs

Inherits docs from lotte/science/test

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/science/test/search")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/science/test/search docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/science/test/search')
# Index lotte/science/test
indexer = pt.IterDictIndexer('./indices/lotte_science_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels
1.7K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Answer upvoted or accepted on stack exchange1.7K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/science/test/search")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/science/test/search qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/science/test/search')
index_ref = pt.IndexRef.of('./indices/lotte_science_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }
Metadata

"lotte/technology/dev"

Answers from technology-focused forums, including android, apple, askubuntu, electronics, networkengineering, security, serverfault, softwareengineering, superuser, unix, and webapps.

docs
1.3M docs

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/technology/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/technology/dev docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/technology/dev')
# Index lotte/technology/dev
indexer = pt.IterDictIndexer('./indices/lotte_technology_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }
Metadata

"lotte/technology/dev/forum"

Forum queries for lotte/technology/dev.

Official evaluation measures: Success@5

queries
2.0K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/technology/dev/forum")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/technology/dev/forum queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/technology/dev/forum')
index_ref = pt.IndexRef.of('./indices/lotte_technology_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs
1.3M docs

Inherits docs from lotte/technology/dev

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/technology/dev/forum")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/technology/dev/forum docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/technology/dev/forum')
# Index lotte/technology/dev
indexer = pt.IterDictIndexer('./indices/lotte_technology_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels
16K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Answer upvoted or accepted on stack exchange16K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/technology/dev/forum")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/technology/dev/forum qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/technology/dev/forum')
index_ref = pt.IndexRef.of('./indices/lotte_technology_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }
Metadata

"lotte/technology/dev/search"

Search queries for lotte/technology/dev.

Official evaluation measures: Success@5

queries
916 queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/technology/dev/search")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/technology/dev/search queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/technology/dev/search')
index_ref = pt.IndexRef.of('./indices/lotte_technology_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs
1.3M docs

Inherits docs from lotte/technology/dev

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/technology/dev/search")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/technology/dev/search docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/technology/dev/search')
# Index lotte/technology/dev
indexer = pt.IterDictIndexer('./indices/lotte_technology_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels
2.7K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Answer upvoted or accepted on stack exchange2.7K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/technology/dev/search")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/technology/dev/search qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/technology/dev/search')
index_ref = pt.IndexRef.of('./indices/lotte_technology_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }
Metadata

"lotte/technology/test"

Answers from technology-focused forums, including android, apple, askubuntu, electronics, networkengineering, security, serverfault, softwareengineering, superuser, unix, and webapps.

docs
639K docs

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/technology/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/technology/test docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/technology/test')
# Index lotte/technology/test
indexer = pt.IterDictIndexer('./indices/lotte_technology_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }
Metadata

"lotte/technology/test/forum"

Forum queries for lotte/technology/test.

Official evaluation measures: Success@5

queries
2.0K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/technology/test/forum")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/technology/test/forum queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/technology/test/forum')
index_ref = pt.IndexRef.of('./indices/lotte_technology_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs
639K docs

Inherits docs from lotte/technology/test

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/technology/test/forum")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/technology/test/forum docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/technology/test/forum')
# Index lotte/technology/test
indexer = pt.IterDictIndexer('./indices/lotte_technology_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels
16K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Answer upvoted or accepted on stack exchange16K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/technology/test/forum")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/technology/test/forum qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/technology/test/forum')
index_ref = pt.IndexRef.of('./indices/lotte_technology_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }
Metadata

"lotte/technology/test/search"

Search queries for lotte/technology/test.

Official evaluation measures: Success@5

queries
596 queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/technology/test/search")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/technology/test/search queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/technology/test/search')
index_ref = pt.IndexRef.of('./indices/lotte_technology_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs
639K docs

Inherits docs from lotte/technology/test

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/technology/test/search")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/technology/test/search docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/technology/test/search')
# Index lotte/technology/test
indexer = pt.IterDictIndexer('./indices/lotte_technology_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels
2.0K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Answer upvoted or accepted on stack exchange2.0K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/technology/test/search")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/technology/test/search qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/technology/test/search')
index_ref = pt.IndexRef.of('./indices/lotte_technology_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }
Metadata

"lotte/writing/dev"

Answers from writing-focused forums, including ell, english, linguistics, literature, worldbuilding, and writing.

docs
277K docs

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/writing/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/writing/dev docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/writing/dev')
# Index lotte/writing/dev
indexer = pt.IterDictIndexer('./indices/lotte_writing_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }
Metadata

"lotte/writing/dev/forum"

Forum queries for lotte/writing/dev.

Official evaluation measures: Success@5

queries
2.0K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/writing/dev/forum")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/writing/dev/forum queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/writing/dev/forum')
index_ref = pt.IndexRef.of('./indices/lotte_writing_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs
277K docs

Inherits docs from lotte/writing/dev

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/writing/dev/forum")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/writing/dev/forum docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/writing/dev/forum')
# Index lotte/writing/dev
indexer = pt.IterDictIndexer('./indices/lotte_writing_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels
15K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Answer upvoted or accepted on stack exchange15K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/writing/dev/forum")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/writing/dev/forum qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/writing/dev/forum')
index_ref = pt.IndexRef.of('./indices/lotte_writing_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }
Metadata

"lotte/writing/dev/search"

Search queries for lotte/writing/dev.

Official evaluation measures: Success@5

queries
497 queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/writing/dev/search")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/writing/dev/search queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/writing/dev/search')
index_ref = pt.IndexRef.of('./indices/lotte_writing_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs
277K docs

Inherits docs from lotte/writing/dev

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/writing/dev/search")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/writing/dev/search docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/writing/dev/search')
# Index lotte/writing/dev
indexer = pt.IterDictIndexer('./indices/lotte_writing_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels
1.3K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Answer upvoted or accepted on stack exchange1.3K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/writing/dev/search")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/writing/dev/search qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/writing/dev/search')
index_ref = pt.IndexRef.of('./indices/lotte_writing_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }
Metadata

"lotte/writing/test"

Answers from writing-focused forums, including ell, english, linguistics, literature, worldbuilding, and writing.

docs
200K docs

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/writing/test")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/writing/test docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/writing/test')
# Index lotte/writing/test
indexer = pt.IterDictIndexer('./indices/lotte_writing_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }
Metadata

"lotte/writing/test/forum"

Forum queries for lotte/writing/test.

Official evaluation measures: Success@5

queries
2.0K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/writing/test/forum")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/writing/test/forum queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/writing/test/forum')
index_ref = pt.IndexRef.of('./indices/lotte_writing_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs
200K docs

Inherits docs from lotte/writing/test

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/writing/test/forum")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/writing/test/forum docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/writing/test/forum')
# Index lotte/writing/test
indexer = pt.IterDictIndexer('./indices/lotte_writing_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels
13K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Answer upvoted or accepted on stack exchange13K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/writing/test/forum")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/writing/test/forum qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/writing/test/forum')
index_ref = pt.IndexRef.of('./indices/lotte_writing_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }
Metadata

"lotte/writing/test/search"

Search queries for lotte/writing/test.

Official evaluation measures: Success@5

queries
1.1K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/writing/test/search")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/writing/test/search queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/writing/test/search')
index_ref = pt.IndexRef.of('./indices/lotte_writing_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())

You can find more details about PyTerrier retrieval here.

docs
200K docs

Inherits docs from lotte/writing/test

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/writing/test/search")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/writing/test/search docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/writing/test/search')
# Index lotte/writing/test
indexer = pt.IterDictIndexer('./indices/lotte_writing_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

You can find more details about PyTerrier indexing here.

qrels
3.5K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Answer upvoted or accepted on stack exchange3.5K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("lotte/writing/test/search")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export lotte/writing/test/search qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/writing/test/search')
index_ref = pt.IndexRef.of('./indices/lotte_writing_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    [Success@5]
)

You can find more details about PyTerrier experiments here.

Citation

ir_datasets.bib:

\cite{Santhanam2021ColBERTv2}

Bibtex:

@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }
Metadata