ir_datasets
: LoTTELoTTE (Long-Tail Topic-stratified Evaluation) is a set of test collections focused on out-of-domain evaluation. It consists of data from several StackExchanges, with relevance assumed by either by upvotes (at least 1) or being selected as the accepted answer by the question's author.
Note that the dev and test corpora are disjoint to avoid leakage.
Bibtex:
@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }Answers from lifestyle-focused forums, including bicycles, coffee, crafts, diy, gardening, lifehacks, mechanics, music, outdoors, parenting, pets, sports, and travel.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/lifestyle/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/lifestyle/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/lifestyle/dev')
# Index lotte/lifestyle/dev
indexer = pt.IterDictIndexer('./indices/lotte_lifestyle_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Bibtex:
@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }{ "docs": { "count": 268893, "fields": { "doc_id": { "max_len": 6, "common_prefix": "" } } } }
Forum queries for lotte/lifestyle/dev.
Official evaluation measures: Success@5
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/lifestyle/dev/forum")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/lifestyle/dev/forum queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/lifestyle/dev/forum')
index_ref = pt.IndexRef.of('./indices/lotte_lifestyle_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
Inherits docs from lotte/lifestyle/dev
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/lifestyle/dev/forum")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/lifestyle/dev/forum docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/lifestyle/dev/forum')
# Index lotte/lifestyle/dev
indexer = pt.IterDictIndexer('./indices/lotte_lifestyle_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Answer upvoted or accepted on stack exchange | 13K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/lifestyle/dev/forum")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export lotte/lifestyle/dev/forum qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/lifestyle/dev/forum')
index_ref = pt.IndexRef.of('./indices/lotte_lifestyle_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[Success@5]
)
You can find more details about PyTerrier experiments here.
Bibtex:
@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }{ "docs": { "count": 268893, "fields": { "doc_id": { "max_len": 6, "common_prefix": "" } } }, "queries": { "count": 2076 }, "qrels": { "count": 12823, "fields": { "relevance": { "counts_by_value": { "1": 12823 } } } } }
Search queries for lotte/lifestyle/dev.
Official evaluation measures: Success@5
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/lifestyle/dev/search")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/lifestyle/dev/search queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/lifestyle/dev/search')
index_ref = pt.IndexRef.of('./indices/lotte_lifestyle_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
Inherits docs from lotte/lifestyle/dev
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/lifestyle/dev/search")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/lifestyle/dev/search docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/lifestyle/dev/search')
# Index lotte/lifestyle/dev
indexer = pt.IterDictIndexer('./indices/lotte_lifestyle_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Answer upvoted or accepted on stack exchange | 1.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/lifestyle/dev/search")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export lotte/lifestyle/dev/search qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/lifestyle/dev/search')
index_ref = pt.IndexRef.of('./indices/lotte_lifestyle_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[Success@5]
)
You can find more details about PyTerrier experiments here.
Bibtex:
@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }{ "docs": { "count": 268893, "fields": { "doc_id": { "max_len": 6, "common_prefix": "" } } }, "queries": { "count": 417 }, "qrels": { "count": 1376, "fields": { "relevance": { "counts_by_value": { "1": 1376 } } } } }
Queries and answers from lifestyle-focused forums, including bicycles, coffee, crafts, diy, gardening, lifehacks, mechanics, music, outdoors, parenting, pets, sports, and travel.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/lifestyle/test")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/lifestyle/test docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/lifestyle/test')
# Index lotte/lifestyle/test
indexer = pt.IterDictIndexer('./indices/lotte_lifestyle_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Bibtex:
@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }{ "docs": { "count": 119461, "fields": { "doc_id": { "max_len": 6, "common_prefix": "" } } } }
Forum queries for lotte/lifestyle/test.
Official evaluation measures: Success@5
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/lifestyle/test/forum")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/lifestyle/test/forum queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/lifestyle/test/forum')
index_ref = pt.IndexRef.of('./indices/lotte_lifestyle_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
Inherits docs from lotte/lifestyle/test
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/lifestyle/test/forum")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/lifestyle/test/forum docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/lifestyle/test/forum')
# Index lotte/lifestyle/test
indexer = pt.IterDictIndexer('./indices/lotte_lifestyle_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Answer upvoted or accepted on stack exchange | 10K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/lifestyle/test/forum")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export lotte/lifestyle/test/forum qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/lifestyle/test/forum')
index_ref = pt.IndexRef.of('./indices/lotte_lifestyle_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[Success@5]
)
You can find more details about PyTerrier experiments here.
Bibtex:
@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }{ "docs": { "count": 119461, "fields": { "doc_id": { "max_len": 6, "common_prefix": "" } } }, "queries": { "count": 2002 }, "qrels": { "count": 10278, "fields": { "relevance": { "counts_by_value": { "1": 10278 } } } } }
Search queries for lotte/lifestyle/test.
Official evaluation measures: Success@5
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/lifestyle/test/search")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/lifestyle/test/search queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/lifestyle/test/search')
index_ref = pt.IndexRef.of('./indices/lotte_lifestyle_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
Inherits docs from lotte/lifestyle/test
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/lifestyle/test/search")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/lifestyle/test/search docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/lifestyle/test/search')
# Index lotte/lifestyle/test
indexer = pt.IterDictIndexer('./indices/lotte_lifestyle_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Answer upvoted or accepted on stack exchange | 1.8K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/lifestyle/test/search")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export lotte/lifestyle/test/search qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/lifestyle/test/search')
index_ref = pt.IndexRef.of('./indices/lotte_lifestyle_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[Success@5]
)
You can find more details about PyTerrier experiments here.
Bibtex:
@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }{ "docs": { "count": 119461, "fields": { "doc_id": { "max_len": 6, "common_prefix": "" } } }, "queries": { "count": 661 }, "qrels": { "count": 1804, "fields": { "relevance": { "counts_by_value": { "1": 1804 } } } } }
Combined version of lotte/lifestyle/dev, lotte/recreation/dev, lotte/science/dev, lotte/technology/dev, and lotte/writing/dev.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/pooled/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/pooled/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/pooled/dev')
# Index lotte/pooled/dev
indexer = pt.IterDictIndexer('./indices/lotte_pooled_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Bibtex:
@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }{ "docs": { "count": 2428854, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Forum queries for lotte/pooled/dev.
Official evaluation measures: Success@5
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/pooled/dev/forum")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/pooled/dev/forum queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/pooled/dev/forum')
index_ref = pt.IndexRef.of('./indices/lotte_pooled_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
Inherits docs from lotte/pooled/dev
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/pooled/dev/forum")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/pooled/dev/forum docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/pooled/dev/forum')
# Index lotte/pooled/dev
indexer = pt.IterDictIndexer('./indices/lotte_pooled_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Answer upvoted or accepted on stack exchange | 69K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/pooled/dev/forum")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export lotte/pooled/dev/forum qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/pooled/dev/forum')
index_ref = pt.IndexRef.of('./indices/lotte_pooled_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[Success@5]
)
You can find more details about PyTerrier experiments here.
Bibtex:
@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }{ "docs": { "count": 2428854, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 10097 }, "qrels": { "count": 68685, "fields": { "relevance": { "counts_by_value": { "1": 68685 } } } } }
Search queries for lotte/pooled/dev.
Official evaluation measures: Success@5
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/pooled/dev/search")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/pooled/dev/search queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/pooled/dev/search')
index_ref = pt.IndexRef.of('./indices/lotte_pooled_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
Inherits docs from lotte/pooled/dev
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/pooled/dev/search")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/pooled/dev/search docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/pooled/dev/search')
# Index lotte/pooled/dev
indexer = pt.IterDictIndexer('./indices/lotte_pooled_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Answer upvoted or accepted on stack exchange | 8.6K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/pooled/dev/search")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export lotte/pooled/dev/search qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/pooled/dev/search')
index_ref = pt.IndexRef.of('./indices/lotte_pooled_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[Success@5]
)
You can find more details about PyTerrier experiments here.
Bibtex:
@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }{ "docs": { "count": 2428854, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 2931 }, "qrels": { "count": 8573, "fields": { "relevance": { "counts_by_value": { "1": 8573 } } } } }
Combined version of lotte/lifestyle/test, lotte/recreation/test, lotte/science/test, lotte/technology/test, and lotte/writing/test.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/pooled/test")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/pooled/test docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/pooled/test')
# Index lotte/pooled/test
indexer = pt.IterDictIndexer('./indices/lotte_pooled_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Bibtex:
@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }{ "docs": { "count": 2819103, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Forum queries for lotte/pooled/test.
Official evaluation measures: Success@5
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/pooled/test/forum")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/pooled/test/forum queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/pooled/test/forum')
index_ref = pt.IndexRef.of('./indices/lotte_pooled_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
Inherits docs from lotte/pooled/test
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/pooled/test/forum")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/pooled/test/forum docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/pooled/test/forum')
# Index lotte/pooled/test
indexer = pt.IterDictIndexer('./indices/lotte_pooled_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Answer upvoted or accepted on stack exchange | 62K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/pooled/test/forum")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export lotte/pooled/test/forum qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/pooled/test/forum')
index_ref = pt.IndexRef.of('./indices/lotte_pooled_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[Success@5]
)
You can find more details about PyTerrier experiments here.
Bibtex:
@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }{ "docs": { "count": 2819103, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 10025 }, "qrels": { "count": 61536, "fields": { "relevance": { "counts_by_value": { "1": 61536 } } } } }
Search queries for lotte/pooled/test.
Official evaluation measures: Success@5
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/pooled/test/search")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/pooled/test/search queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/pooled/test/search')
index_ref = pt.IndexRef.of('./indices/lotte_pooled_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
Inherits docs from lotte/pooled/test
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/pooled/test/search")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/pooled/test/search docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/pooled/test/search')
# Index lotte/pooled/test
indexer = pt.IterDictIndexer('./indices/lotte_pooled_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Answer upvoted or accepted on stack exchange | 11K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/pooled/test/search")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export lotte/pooled/test/search qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/pooled/test/search')
index_ref = pt.IndexRef.of('./indices/lotte_pooled_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[Success@5]
)
You can find more details about PyTerrier experiments here.
Bibtex:
@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }{ "docs": { "count": 2819103, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 3869 }, "qrels": { "count": 11124, "fields": { "relevance": { "counts_by_value": { "1": 11124 } } } } }
Answers from recreation-focused forums, including anime, boardgames, gaming, movies, photo, rpg, and scifi.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/recreation/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/recreation/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/recreation/dev')
# Index lotte/recreation/dev
indexer = pt.IterDictIndexer('./indices/lotte_recreation_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Bibtex:
@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }{ "docs": { "count": 263025, "fields": { "doc_id": { "max_len": 6, "common_prefix": "" } } } }
Forum queries for lotte/recreation/dev.
Official evaluation measures: Success@5
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/recreation/dev/forum")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/recreation/dev/forum queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/recreation/dev/forum')
index_ref = pt.IndexRef.of('./indices/lotte_recreation_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
Inherits docs from lotte/recreation/dev
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/recreation/dev/forum")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/recreation/dev/forum docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/recreation/dev/forum')
# Index lotte/recreation/dev
indexer = pt.IterDictIndexer('./indices/lotte_recreation_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Answer upvoted or accepted on stack exchange | 13K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/recreation/dev/forum")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export lotte/recreation/dev/forum qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/recreation/dev/forum')
index_ref = pt.IndexRef.of('./indices/lotte_recreation_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[Success@5]
)
You can find more details about PyTerrier experiments here.
Bibtex:
@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }{ "docs": { "count": 263025, "fields": { "doc_id": { "max_len": 6, "common_prefix": "" } } }, "queries": { "count": 2002 }, "qrels": { "count": 12752, "fields": { "relevance": { "counts_by_value": { "1": 12752 } } } } }
Search queries for lotte/recreation/dev.
Official evaluation measures: Success@5
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/recreation/dev/search")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/recreation/dev/search queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/recreation/dev/search')
index_ref = pt.IndexRef.of('./indices/lotte_recreation_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
Inherits docs from lotte/recreation/dev
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/recreation/dev/search")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/recreation/dev/search docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/recreation/dev/search')
# Index lotte/recreation/dev
indexer = pt.IterDictIndexer('./indices/lotte_recreation_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Answer upvoted or accepted on stack exchange | 1.8K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/recreation/dev/search")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export lotte/recreation/dev/search qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/recreation/dev/search')
index_ref = pt.IndexRef.of('./indices/lotte_recreation_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[Success@5]
)
You can find more details about PyTerrier experiments here.
Bibtex:
@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }{ "docs": { "count": 263025, "fields": { "doc_id": { "max_len": 6, "common_prefix": "" } } }, "queries": { "count": 563 }, "qrels": { "count": 1754, "fields": { "relevance": { "counts_by_value": { "1": 1754 } } } } }
Answers from recreation-focused forums, including anime, boardgames, gaming, movies, photo, rpg, and scifi.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/recreation/test")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/recreation/test docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/recreation/test')
# Index lotte/recreation/test
indexer = pt.IterDictIndexer('./indices/lotte_recreation_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Bibtex:
@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }{ "docs": { "count": 166975, "fields": { "doc_id": { "max_len": 6, "common_prefix": "" } } } }
Forum queries for lotte/recreation/test.
Official evaluation measures: Success@5
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/recreation/test/forum")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/recreation/test/forum queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/recreation/test/forum')
index_ref = pt.IndexRef.of('./indices/lotte_recreation_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
Inherits docs from lotte/recreation/test
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/recreation/test/forum")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/recreation/test/forum docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/recreation/test/forum')
# Index lotte/recreation/test
indexer = pt.IterDictIndexer('./indices/lotte_recreation_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Answer upvoted or accepted on stack exchange | 6.9K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/recreation/test/forum")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export lotte/recreation/test/forum qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/recreation/test/forum')
index_ref = pt.IndexRef.of('./indices/lotte_recreation_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[Success@5]
)
You can find more details about PyTerrier experiments here.
Bibtex:
@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }{ "docs": { "count": 166975, "fields": { "doc_id": { "max_len": 6, "common_prefix": "" } } }, "queries": { "count": 2002 }, "qrels": { "count": 6947, "fields": { "relevance": { "counts_by_value": { "1": 6947 } } } } }
Search queries for lotte/recreation/test.
Official evaluation measures: Success@5
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/recreation/test/search")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/recreation/test/search queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/recreation/test/search')
index_ref = pt.IndexRef.of('./indices/lotte_recreation_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
Inherits docs from lotte/recreation/test
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/recreation/test/search")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/recreation/test/search docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/recreation/test/search')
# Index lotte/recreation/test
indexer = pt.IterDictIndexer('./indices/lotte_recreation_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Answer upvoted or accepted on stack exchange | 2.0K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/recreation/test/search")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export lotte/recreation/test/search qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/recreation/test/search')
index_ref = pt.IndexRef.of('./indices/lotte_recreation_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[Success@5]
)
You can find more details about PyTerrier experiments here.
Bibtex:
@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }{ "docs": { "count": 166975, "fields": { "doc_id": { "max_len": 6, "common_prefix": "" } } }, "queries": { "count": 924 }, "qrels": { "count": 1991, "fields": { "relevance": { "counts_by_value": { "1": 1991 } } } } }
Answers from science-focused forums, including academia, astronomy, biology, chemistry, datasciene, earthscience, engineering, math, philosophy, physics, and stats.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/science/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/science/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/science/dev')
# Index lotte/science/dev
indexer = pt.IterDictIndexer('./indices/lotte_science_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Bibtex:
@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }{ "docs": { "count": 343642, "fields": { "doc_id": { "max_len": 6, "common_prefix": "" } } } }
Forum queries for lotte/science/dev.
Official evaluation measures: Success@5
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/science/dev/forum")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/science/dev/forum queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/science/dev/forum')
index_ref = pt.IndexRef.of('./indices/lotte_science_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
Inherits docs from lotte/science/dev
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/science/dev/forum")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/science/dev/forum docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/science/dev/forum')
# Index lotte/science/dev
indexer = pt.IterDictIndexer('./indices/lotte_science_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Answer upvoted or accepted on stack exchange | 12K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/science/dev/forum")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export lotte/science/dev/forum qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/science/dev/forum')
index_ref = pt.IndexRef.of('./indices/lotte_science_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[Success@5]
)
You can find more details about PyTerrier experiments here.
Bibtex:
@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }{ "docs": { "count": 343642, "fields": { "doc_id": { "max_len": 6, "common_prefix": "" } } }, "queries": { "count": 2013 }, "qrels": { "count": 12271, "fields": { "relevance": { "counts_by_value": { "1": 12271 } } } } }
Search queries for lotte/science/dev.
Official evaluation measures: Success@5
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/science/dev/search")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/science/dev/search queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/science/dev/search')
index_ref = pt.IndexRef.of('./indices/lotte_science_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
Inherits docs from lotte/science/dev
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/science/dev/search")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/science/dev/search docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/science/dev/search')
# Index lotte/science/dev
indexer = pt.IterDictIndexer('./indices/lotte_science_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Answer upvoted or accepted on stack exchange | 1.5K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/science/dev/search")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export lotte/science/dev/search qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/science/dev/search')
index_ref = pt.IndexRef.of('./indices/lotte_science_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[Success@5]
)
You can find more details about PyTerrier experiments here.
Bibtex:
@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }{ "docs": { "count": 343642, "fields": { "doc_id": { "max_len": 6, "common_prefix": "" } } }, "queries": { "count": 538 }, "qrels": { "count": 1480, "fields": { "relevance": { "counts_by_value": { "1": 1480 } } } } }
Answers from science-focused forums, including academia, astronomy, biology, chemistry, datasciene, earthscience, engineering, math, philosophy, physics, and stats.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/science/test")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/science/test docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/science/test')
# Index lotte/science/test
indexer = pt.IterDictIndexer('./indices/lotte_science_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Bibtex:
@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }{ "docs": { "count": 1694164, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Forum queries for lotte/science/test.
Official evaluation measures: Success@5
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/science/test/forum")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/science/test/forum queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/science/test/forum')
index_ref = pt.IndexRef.of('./indices/lotte_science_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
Inherits docs from lotte/science/test
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/science/test/forum")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/science/test/forum docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/science/test/forum')
# Index lotte/science/test
indexer = pt.IterDictIndexer('./indices/lotte_science_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Answer upvoted or accepted on stack exchange | 16K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/science/test/forum")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export lotte/science/test/forum qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/science/test/forum')
index_ref = pt.IndexRef.of('./indices/lotte_science_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[Success@5]
)
You can find more details about PyTerrier experiments here.
Bibtex:
@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }{ "docs": { "count": 1694164, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 2017 }, "qrels": { "count": 15515, "fields": { "relevance": { "counts_by_value": { "1": 15515 } } } } }
Search queries for lotte/science/test.
Official evaluation measures: Success@5
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/science/test/search")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/science/test/search queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/science/test/search')
index_ref = pt.IndexRef.of('./indices/lotte_science_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
Inherits docs from lotte/science/test
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/science/test/search")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/science/test/search docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/science/test/search')
# Index lotte/science/test
indexer = pt.IterDictIndexer('./indices/lotte_science_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Answer upvoted or accepted on stack exchange | 1.7K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/science/test/search")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export lotte/science/test/search qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/science/test/search')
index_ref = pt.IndexRef.of('./indices/lotte_science_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[Success@5]
)
You can find more details about PyTerrier experiments here.
Bibtex:
@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }{ "docs": { "count": 1694164, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 617 }, "qrels": { "count": 1738, "fields": { "relevance": { "counts_by_value": { "1": 1738 } } } } }
Answers from technology-focused forums, including android, apple, askubuntu, electronics, networkengineering, security, serverfault, softwareengineering, superuser, unix, and webapps.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/technology/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/technology/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/technology/dev')
# Index lotte/technology/dev
indexer = pt.IterDictIndexer('./indices/lotte_technology_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Bibtex:
@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }{ "docs": { "count": 1276222, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } } }
Forum queries for lotte/technology/dev.
Official evaluation measures: Success@5
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/technology/dev/forum")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/technology/dev/forum queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/technology/dev/forum')
index_ref = pt.IndexRef.of('./indices/lotte_technology_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
Inherits docs from lotte/technology/dev
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/technology/dev/forum")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/technology/dev/forum docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/technology/dev/forum')
# Index lotte/technology/dev
indexer = pt.IterDictIndexer('./indices/lotte_technology_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Answer upvoted or accepted on stack exchange | 16K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/technology/dev/forum")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export lotte/technology/dev/forum qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/technology/dev/forum')
index_ref = pt.IndexRef.of('./indices/lotte_technology_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[Success@5]
)
You can find more details about PyTerrier experiments here.
Bibtex:
@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }{ "docs": { "count": 1276222, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 2003 }, "qrels": { "count": 15741, "fields": { "relevance": { "counts_by_value": { "1": 15741 } } } } }
Search queries for lotte/technology/dev.
Official evaluation measures: Success@5
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/technology/dev/search")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/technology/dev/search queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/technology/dev/search')
index_ref = pt.IndexRef.of('./indices/lotte_technology_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
Inherits docs from lotte/technology/dev
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/technology/dev/search")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/technology/dev/search docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/technology/dev/search')
# Index lotte/technology/dev
indexer = pt.IterDictIndexer('./indices/lotte_technology_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Answer upvoted or accepted on stack exchange | 2.7K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/technology/dev/search")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export lotte/technology/dev/search qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/technology/dev/search')
index_ref = pt.IndexRef.of('./indices/lotte_technology_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[Success@5]
)
You can find more details about PyTerrier experiments here.
Bibtex:
@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }{ "docs": { "count": 1276222, "fields": { "doc_id": { "max_len": 7, "common_prefix": "" } } }, "queries": { "count": 916 }, "qrels": { "count": 2676, "fields": { "relevance": { "counts_by_value": { "1": 2676 } } } } }
Answers from technology-focused forums, including android, apple, askubuntu, electronics, networkengineering, security, serverfault, softwareengineering, superuser, unix, and webapps.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/technology/test")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/technology/test docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/technology/test')
# Index lotte/technology/test
indexer = pt.IterDictIndexer('./indices/lotte_technology_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Bibtex:
@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }{ "docs": { "count": 638509, "fields": { "doc_id": { "max_len": 6, "common_prefix": "" } } } }
Forum queries for lotte/technology/test.
Official evaluation measures: Success@5
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/technology/test/forum")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/technology/test/forum queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/technology/test/forum')
index_ref = pt.IndexRef.of('./indices/lotte_technology_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
Inherits docs from lotte/technology/test
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/technology/test/forum")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/technology/test/forum docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/technology/test/forum')
# Index lotte/technology/test
indexer = pt.IterDictIndexer('./indices/lotte_technology_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Answer upvoted or accepted on stack exchange | 16K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/technology/test/forum")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export lotte/technology/test/forum qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/technology/test/forum')
index_ref = pt.IndexRef.of('./indices/lotte_technology_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[Success@5]
)
You can find more details about PyTerrier experiments here.
Bibtex:
@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }{ "docs": { "count": 638509, "fields": { "doc_id": { "max_len": 6, "common_prefix": "" } } }, "queries": { "count": 2004 }, "qrels": { "count": 15890, "fields": { "relevance": { "counts_by_value": { "1": 15890 } } } } }
Search queries for lotte/technology/test.
Official evaluation measures: Success@5
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/technology/test/search")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/technology/test/search queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/technology/test/search')
index_ref = pt.IndexRef.of('./indices/lotte_technology_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
Inherits docs from lotte/technology/test
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/technology/test/search")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/technology/test/search docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/technology/test/search')
# Index lotte/technology/test
indexer = pt.IterDictIndexer('./indices/lotte_technology_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Answer upvoted or accepted on stack exchange | 2.0K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/technology/test/search")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export lotte/technology/test/search qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/technology/test/search')
index_ref = pt.IndexRef.of('./indices/lotte_technology_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[Success@5]
)
You can find more details about PyTerrier experiments here.
Bibtex:
@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }{ "docs": { "count": 638509, "fields": { "doc_id": { "max_len": 6, "common_prefix": "" } } }, "queries": { "count": 596 }, "qrels": { "count": 2045, "fields": { "relevance": { "counts_by_value": { "1": 2045 } } } } }
Answers from writing-focused forums, including ell, english, linguistics, literature, worldbuilding, and writing.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/writing/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/writing/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/writing/dev')
# Index lotte/writing/dev
indexer = pt.IterDictIndexer('./indices/lotte_writing_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Bibtex:
@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }{ "docs": { "count": 277072, "fields": { "doc_id": { "max_len": 6, "common_prefix": "" } } } }
Forum queries for lotte/writing/dev.
Official evaluation measures: Success@5
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/writing/dev/forum")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/writing/dev/forum queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/writing/dev/forum')
index_ref = pt.IndexRef.of('./indices/lotte_writing_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
Inherits docs from lotte/writing/dev
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/writing/dev/forum")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/writing/dev/forum docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/writing/dev/forum')
# Index lotte/writing/dev
indexer = pt.IterDictIndexer('./indices/lotte_writing_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Answer upvoted or accepted on stack exchange | 15K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/writing/dev/forum")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export lotte/writing/dev/forum qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/writing/dev/forum')
index_ref = pt.IndexRef.of('./indices/lotte_writing_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[Success@5]
)
You can find more details about PyTerrier experiments here.
Bibtex:
@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }{ "docs": { "count": 277072, "fields": { "doc_id": { "max_len": 6, "common_prefix": "" } } }, "queries": { "count": 2003 }, "qrels": { "count": 15098, "fields": { "relevance": { "counts_by_value": { "1": 15098 } } } } }
Search queries for lotte/writing/dev.
Official evaluation measures: Success@5
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/writing/dev/search")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/writing/dev/search queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/writing/dev/search')
index_ref = pt.IndexRef.of('./indices/lotte_writing_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
Inherits docs from lotte/writing/dev
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/writing/dev/search")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/writing/dev/search docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/writing/dev/search')
# Index lotte/writing/dev
indexer = pt.IterDictIndexer('./indices/lotte_writing_dev')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Answer upvoted or accepted on stack exchange | 1.3K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/writing/dev/search")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export lotte/writing/dev/search qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/writing/dev/search')
index_ref = pt.IndexRef.of('./indices/lotte_writing_dev') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[Success@5]
)
You can find more details about PyTerrier experiments here.
Bibtex:
@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }{ "docs": { "count": 277072, "fields": { "doc_id": { "max_len": 6, "common_prefix": "" } } }, "queries": { "count": 497 }, "qrels": { "count": 1287, "fields": { "relevance": { "counts_by_value": { "1": 1287 } } } } }
Answers from writing-focused forums, including ell, english, linguistics, literature, worldbuilding, and writing.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/writing/test")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/writing/test docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/writing/test')
# Index lotte/writing/test
indexer = pt.IterDictIndexer('./indices/lotte_writing_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Bibtex:
@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }{ "docs": { "count": 199994, "fields": { "doc_id": { "max_len": 6, "common_prefix": "" } } } }
Forum queries for lotte/writing/test.
Official evaluation measures: Success@5
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/writing/test/forum")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/writing/test/forum queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/writing/test/forum')
index_ref = pt.IndexRef.of('./indices/lotte_writing_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
Inherits docs from lotte/writing/test
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/writing/test/forum")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/writing/test/forum docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/writing/test/forum')
# Index lotte/writing/test
indexer = pt.IterDictIndexer('./indices/lotte_writing_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Answer upvoted or accepted on stack exchange | 13K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/writing/test/forum")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export lotte/writing/test/forum qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/writing/test/forum')
index_ref = pt.IndexRef.of('./indices/lotte_writing_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[Success@5]
)
You can find more details about PyTerrier experiments here.
Bibtex:
@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }{ "docs": { "count": 199994, "fields": { "doc_id": { "max_len": 6, "common_prefix": "" } } }, "queries": { "count": 2000 }, "qrels": { "count": 12906, "fields": { "relevance": { "counts_by_value": { "1": 12906 } } } } }
Search queries for lotte/writing/test.
Official evaluation measures: Success@5
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/writing/test/search")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/writing/test/search queries
[query_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/writing/test/search')
index_ref = pt.IndexRef.of('./indices/lotte_writing_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics())
You can find more details about PyTerrier retrieval here.
Inherits docs from lotte/writing/test
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/writing/test/search")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export lotte/writing/test/search docs
[doc_id] [text]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:lotte/writing/test/search')
# Index lotte/writing/test
indexer = pt.IterDictIndexer('./indices/lotte_writing_test')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | Answer upvoted or accepted on stack exchange | 3.5K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("lotte/writing/test/search")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export lotte/writing/test/search qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:lotte/writing/test/search')
index_ref = pt.IndexRef.of('./indices/lotte_writing_test') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics(),
dataset.get_qrels(),
[Success@5]
)
You can find more details about PyTerrier experiments here.
Bibtex:
@article{Santhanam2021ColBERTv2, title = "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction", author = "Keshav Santhanam and Omar Khattab and Jon Saad-Falcon and Christopher Potts and Matei Zaharia", journal= "arXiv preprint arXiv:2112.01488", year = "2021", url = "https://arxiv.org/abs/2112.01488" }{ "docs": { "count": 199994, "fields": { "doc_id": { "max_len": 6, "common_prefix": "" } } }, "queries": { "count": 1071 }, "qrels": { "count": 3546, "fields": { "relevance": { "counts_by_value": { "1": 3546 } } } } }