ir_datasets: neuMARCOA version of msmarco-passage for cross-language information retrieval, provided by JHU HLTCOE with documents translated to other langauges using a Sockeye 2 translation model.
The msmarco-passage corpus, translated to Persian (Farsi).
Language: fa
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/fa")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export neumarco/fa docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.neumarco.fa')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
{
"docs": {
"count": 8841823,
"fields": {
"doc_id": {
"max_len": 7,
"common_prefix": ""
}
}
}
}
A version of msmarco-passage/dev, with the corpus translated to Persian (Farsi).
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/fa/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export neumarco/fa/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.neumarco.fa.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from neumarco/fa
Language: fa
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/fa/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export neumarco/fa/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.neumarco.fa.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/fa/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export neumarco/fa/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.neumarco.fa.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
{
"docs": {
"count": 8841823,
"fields": {
"doc_id": {
"max_len": 7,
"common_prefix": ""
}
}
},
"queries": {
"count": 101093
},
"qrels": {
"count": 59273,
"fields": {
"relevance": {
"counts_by_value": {
"1": 59273
}
}
}
}
}
A version of msmarco-passage/dev/judged, with the corpus translated to Persian (Farsi).
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/fa/dev/judged")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export neumarco/fa/dev/judged queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.neumarco.fa.dev.judged.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from neumarco/fa
Language: fa
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/fa/dev/judged")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export neumarco/fa/dev/judged docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.neumarco.fa.dev.judged')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Inherits qrels from neumarco/fa/dev
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/fa/dev/judged")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export neumarco/fa/dev/judged qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.neumarco.fa.dev.judged.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
{
"docs": {
"count": 8841823,
"fields": {
"doc_id": {
"max_len": 7,
"common_prefix": ""
}
}
},
"queries": {
"count": 55578
},
"qrels": {
"count": 59273,
"fields": {
"relevance": {
"counts_by_value": {
"1": 59273
}
}
}
}
}
A version of msmarco-passage/dev/small, with the corpus translated to Persian (Farsi).
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/fa/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export neumarco/fa/dev/small queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.neumarco.fa.dev.small.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from neumarco/fa
Language: fa
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/fa/dev/small")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export neumarco/fa/dev/small docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.neumarco.fa.dev.small')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/fa/dev/small")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export neumarco/fa/dev/small qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.neumarco.fa.dev.small.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
{
"docs": {
"count": 8841823,
"fields": {
"doc_id": {
"max_len": 7,
"common_prefix": ""
}
}
},
"queries": {
"count": 6980
},
"qrels": {
"count": 7437,
"fields": {
"relevance": {
"counts_by_value": {
"1": 7437
}
}
}
}
}
A version of msmarco-passage/train, with the corpus translated to Persian (Farsi).
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/fa/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export neumarco/fa/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.neumarco.fa.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from neumarco/fa
Language: fa
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/fa/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export neumarco/fa/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.neumarco.fa.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/fa/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export neumarco/fa/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.neumarco.fa.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/fa/train")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export neumarco/fa/train docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
docpairs = datamaestro.prepare_dataset('irds.neumarco.fa.train.docpairs')
next(docpairs.iter()) # Display the first triplet
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets
{
"docs": {
"count": 8841823,
"fields": {
"doc_id": {
"max_len": 7,
"common_prefix": ""
}
}
},
"queries": {
"count": 808731
},
"qrels": {
"count": 532761,
"fields": {
"relevance": {
"counts_by_value": {
"1": 532761
}
}
}
},
"docpairs": {
"count": 269919004
}
}
A version of msmarco-passage/train/judged, with the corpus translated to Persian (Farsi).
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/fa/train/judged")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export neumarco/fa/train/judged queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.neumarco.fa.train.judged.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from neumarco/fa
Language: fa
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/fa/train/judged")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export neumarco/fa/train/judged docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.neumarco.fa.train.judged')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Inherits qrels from neumarco/fa/train
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/fa/train/judged")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export neumarco/fa/train/judged qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.neumarco.fa.train.judged.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Inherits docpairs from neumarco/fa/train
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/fa/train/judged")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export neumarco/fa/train/judged docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
docpairs = datamaestro.prepare_dataset('irds.neumarco.fa.train.judged.docpairs')
next(docpairs.iter()) # Display the first triplet
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets
{
"docs": {
"count": 8841823,
"fields": {
"doc_id": {
"max_len": 7,
"common_prefix": ""
}
}
},
"queries": {
"count": 502939
},
"qrels": {
"count": 532761,
"fields": {
"relevance": {
"counts_by_value": {
"1": 532761
}
}
}
},
"docpairs": {
"count": 269919004
}
}
The msmarco-passage corpus, translated to Russian.
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/ru")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export neumarco/ru docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.neumarco.ru')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
{
"docs": {
"count": 8841823,
"fields": {
"doc_id": {
"max_len": 7,
"common_prefix": ""
}
}
}
}
A version of msmarco-passage/dev, with the corpus translated to Russian.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/ru/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export neumarco/ru/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.neumarco.ru.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from neumarco/ru
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/ru/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export neumarco/ru/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.neumarco.ru.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/ru/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export neumarco/ru/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.neumarco.ru.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
{
"docs": {
"count": 8841823,
"fields": {
"doc_id": {
"max_len": 7,
"common_prefix": ""
}
}
},
"queries": {
"count": 101093
},
"qrels": {
"count": 59273,
"fields": {
"relevance": {
"counts_by_value": {
"1": 59273
}
}
}
}
}
A version of msmarco-passage/dev/judged, with the corpus translated to Russian.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/ru/dev/judged")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export neumarco/ru/dev/judged queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.neumarco.ru.dev.judged.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from neumarco/ru
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/ru/dev/judged")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export neumarco/ru/dev/judged docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.neumarco.ru.dev.judged')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Inherits qrels from neumarco/ru/dev
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/ru/dev/judged")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export neumarco/ru/dev/judged qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.neumarco.ru.dev.judged.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
{
"docs": {
"count": 8841823,
"fields": {
"doc_id": {
"max_len": 7,
"common_prefix": ""
}
}
},
"queries": {
"count": 55578
},
"qrels": {
"count": 59273,
"fields": {
"relevance": {
"counts_by_value": {
"1": 59273
}
}
}
}
}
A version of msmarco-passage/dev/small, with the corpus translated to Russian.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/ru/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export neumarco/ru/dev/small queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.neumarco.ru.dev.small.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from neumarco/ru
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/ru/dev/small")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export neumarco/ru/dev/small docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.neumarco.ru.dev.small')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/ru/dev/small")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export neumarco/ru/dev/small qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.neumarco.ru.dev.small.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
{
"docs": {
"count": 8841823,
"fields": {
"doc_id": {
"max_len": 7,
"common_prefix": ""
}
}
},
"queries": {
"count": 6980
},
"qrels": {
"count": 7437,
"fields": {
"relevance": {
"counts_by_value": {
"1": 7437
}
}
}
}
}
A version of msmarco-passage/train, with the corpus translated to Russian.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/ru/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export neumarco/ru/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.neumarco.ru.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from neumarco/ru
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/ru/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export neumarco/ru/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.neumarco.ru.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/ru/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export neumarco/ru/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.neumarco.ru.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/ru/train")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export neumarco/ru/train docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
docpairs = datamaestro.prepare_dataset('irds.neumarco.ru.train.docpairs')
next(docpairs.iter()) # Display the first triplet
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets
{
"docs": {
"count": 8841823,
"fields": {
"doc_id": {
"max_len": 7,
"common_prefix": ""
}
}
},
"queries": {
"count": 808731
},
"qrels": {
"count": 532761,
"fields": {
"relevance": {
"counts_by_value": {
"1": 532761
}
}
}
},
"docpairs": {
"count": 269919004
}
}
A version of msmarco-passage/train/judged, with the corpus translated to Russian.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/ru/train/judged")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export neumarco/ru/train/judged queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.neumarco.ru.train.judged.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from neumarco/ru
Language: ru
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/ru/train/judged")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export neumarco/ru/train/judged docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.neumarco.ru.train.judged')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Inherits qrels from neumarco/ru/train
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/ru/train/judged")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export neumarco/ru/train/judged qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.neumarco.ru.train.judged.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Inherits docpairs from neumarco/ru/train
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/ru/train/judged")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export neumarco/ru/train/judged docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
docpairs = datamaestro.prepare_dataset('irds.neumarco.ru.train.judged.docpairs')
next(docpairs.iter()) # Display the first triplet
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets
{
"docs": {
"count": 8841823,
"fields": {
"doc_id": {
"max_len": 7,
"common_prefix": ""
}
}
},
"queries": {
"count": 502939
},
"qrels": {
"count": 532761,
"fields": {
"relevance": {
"counts_by_value": {
"1": 532761
}
}
}
},
"docpairs": {
"count": 269919004
}
}
The msmarco-passage corpus, translated to Chinese.
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/zh")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export neumarco/zh docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.neumarco.zh')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
{
"docs": {
"count": 8841823,
"fields": {
"doc_id": {
"max_len": 7,
"common_prefix": ""
}
}
}
}
A version of msmarco-passage/dev, with the corpus translated to Chinese.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/zh/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export neumarco/zh/dev queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.neumarco.zh.dev.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from neumarco/zh
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/zh/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export neumarco/zh/dev docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.neumarco.zh.dev')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/zh/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export neumarco/zh/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.neumarco.zh.dev.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
{
"docs": {
"count": 8841823,
"fields": {
"doc_id": {
"max_len": 7,
"common_prefix": ""
}
}
},
"queries": {
"count": 101093
},
"qrels": {
"count": 59273,
"fields": {
"relevance": {
"counts_by_value": {
"1": 59273
}
}
}
}
}
A version of msmarco-passage/dev/judged, with the corpus translated to Chinese.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/zh/dev/judged")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export neumarco/zh/dev/judged queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.neumarco.zh.dev.judged.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from neumarco/zh
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/zh/dev/judged")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export neumarco/zh/dev/judged docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.neumarco.zh.dev.judged')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Inherits qrels from neumarco/zh/dev
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | Labeled by crowd worker as relevant | 59K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/zh/dev/judged")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export neumarco/zh/dev/judged qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.neumarco.zh.dev.judged.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
{
"docs": {
"count": 8841823,
"fields": {
"doc_id": {
"max_len": 7,
"common_prefix": ""
}
}
},
"queries": {
"count": 55578
},
"qrels": {
"count": 59273,
"fields": {
"relevance": {
"counts_by_value": {
"1": 59273
}
}
}
}
}
A version of msmarco-passage/dev/small, with the corpus translated to Chinese.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/zh/dev/small")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export neumarco/zh/dev/small queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.neumarco.zh.dev.small.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from neumarco/zh
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/zh/dev/small")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export neumarco/zh/dev/small docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.neumarco.zh.dev.small')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | Labeled by crowd worker as relevant | 7.4K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/zh/dev/small")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export neumarco/zh/dev/small qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.neumarco.zh.dev.small.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
{
"docs": {
"count": 8841823,
"fields": {
"doc_id": {
"max_len": 7,
"common_prefix": ""
}
}
},
"queries": {
"count": 6980
},
"qrels": {
"count": 7437,
"fields": {
"relevance": {
"counts_by_value": {
"1": 7437
}
}
}
}
}
A version of msmarco-passage/train, with the corpus translated to Chinese.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/zh/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export neumarco/zh/train queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.neumarco.zh.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from neumarco/zh
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/zh/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export neumarco/zh/train docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.neumarco.zh.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/zh/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export neumarco/zh/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.neumarco.zh.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/zh/train")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export neumarco/zh/train docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
docpairs = datamaestro.prepare_dataset('irds.neumarco.zh.train.docpairs')
next(docpairs.iter()) # Display the first triplet
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets
{
"docs": {
"count": 8841823,
"fields": {
"doc_id": {
"max_len": 7,
"common_prefix": ""
}
}
},
"queries": {
"count": 808731
},
"qrels": {
"count": 532761,
"fields": {
"relevance": {
"counts_by_value": {
"1": 532761
}
}
}
},
"docpairs": {
"count": 269919004
}
}
A version of msmarco-passage/train/judged, with the corpus translated to Chinese.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/zh/train/judged")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export neumarco/zh/train/judged queries
[query_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.neumarco.zh.train.judged.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from neumarco/zh
Language: zh
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/zh/train/judged")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, text>
You can find more details about the Python API here.
ir_datasets export neumarco/zh/train/judged docs
[doc_id] [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.neumarco.zh.train.judged')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Inherits qrels from neumarco/zh/train
Relevance levels
| Rel. | Definition | Count | % |
|---|---|---|---|
| 1 | Labeled by crowd worker as relevant | 533K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/zh/train/judged")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export neumarco/zh/train/judged qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.neumarco.zh.train.judged.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Inherits docpairs from neumarco/zh/train
Examples:
import ir_datasets
dataset = ir_datasets.load("neumarco/zh/train/judged")
for docpair in dataset.docpairs_iter():
docpair # namedtuple<query_id, doc_id_a, doc_id_b>
You can find more details about the Python API here.
ir_datasets export neumarco/zh/train/judged docpairs
[query_id] [doc_id_a] [doc_id_b]
...
You can find more details about the CLI here.
No example available for PyTerrier
import datamaestro # Supposes experimaestro-ir be installed
docpairs = datamaestro.prepare_dataset('irds.neumarco.zh.train.judged.docpairs')
next(docpairs.iter()) # Display the first triplet
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets
{
"docs": {
"count": 8841823,
"fields": {
"doc_id": {
"max_len": 7,
"common_prefix": ""
}
}
},
"queries": {
"count": 502939
},
"qrels": {
"count": 532761,
"fields": {
"relevance": {
"counts_by_value": {
"1": 532761
}
}
}
},
"docpairs": {
"count": 269919004
}
}