← home
Github: datasets/neumarco.py

ir_datasets: neuMARCO

Index
  1. neumarco
  2. neumarco/fa
  3. neumarco/fa/dev
  4. neumarco/fa/dev/judged
  5. neumarco/fa/dev/small
  6. neumarco/fa/train
  7. neumarco/fa/train/judged
  8. neumarco/ru
  9. neumarco/ru/dev
  10. neumarco/ru/dev/judged
  11. neumarco/ru/dev/small
  12. neumarco/ru/train
  13. neumarco/ru/train/judged
  14. neumarco/zh
  15. neumarco/zh/dev
  16. neumarco/zh/dev/judged
  17. neumarco/zh/dev/small
  18. neumarco/zh/train
  19. neumarco/zh/train/judged

"neumarco"

A version of msmarco-passage for cross-language information retrieval, provided by JHU HLTCOE with documents translated to other langauges using a Sockeye 2 translation model.

  • Documents: Web passages using machine translation to English
  • Queries: Natural-language web queries in English

"neumarco/fa"

The msmarco-passage corpus, translated to Persian (Farsi).

docs
8.8M docs

Language: fa

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/fa")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/fa docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Metadata

"neumarco/fa/dev"

A version of msmarco-passage/dev, with the corpus translated to Persian (Farsi).

queries
101K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/fa/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/fa/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

docs
8.8M docs

Inherits docs from neumarco/fa

Language: fa

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/fa/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/fa/dev docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

qrels
59K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant59K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/fa/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/fa/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Metadata

"neumarco/fa/dev/judged"

A version of msmarco-passage/dev/judged, with the corpus translated to Persian (Farsi).

queries
56K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/fa/dev/judged")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/fa/dev/judged queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

docs
8.8M docs

Inherits docs from neumarco/fa

Language: fa

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/fa/dev/judged")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/fa/dev/judged docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

qrels
59K qrels

Inherits qrels from neumarco/fa/dev

Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant59K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/fa/dev/judged")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/fa/dev/judged qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Metadata

"neumarco/fa/dev/small"

A version of msmarco-passage/dev/small, with the corpus translated to Persian (Farsi).

queries
7.0K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/fa/dev/small")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/fa/dev/small queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

docs
8.8M docs

Inherits docs from neumarco/fa

Language: fa

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/fa/dev/small")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/fa/dev/small docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

qrels
7.4K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant7.4K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/fa/dev/small")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/fa/dev/small qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Metadata

"neumarco/fa/train"

A version of msmarco-passage/train, with the corpus translated to Persian (Farsi).

queries
809K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/fa/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/fa/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

docs
8.8M docs

Inherits docs from neumarco/fa

Language: fa

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/fa/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/fa/train docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

qrels
533K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant533K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/fa/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/fa/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

docpairs
270M docpairs
Document Pair type:
GenericDocPair: (namedtuple)
  1. query_id: str
  2. doc_id_a: str
  3. doc_id_b: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/fa/train")
for docpair in dataset.docpairs_iter():
    docpair # namedtuple<query_id, doc_id_a, doc_id_b>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/fa/train docpairs
[query_id]    [doc_id_a]    [doc_id_b]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Metadata

"neumarco/fa/train/judged"

A version of msmarco-passage/train/judged, with the corpus translated to Persian (Farsi).

queries
503K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/fa/train/judged")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/fa/train/judged queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

docs
8.8M docs

Inherits docs from neumarco/fa

Language: fa

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/fa/train/judged")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/fa/train/judged docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

qrels
533K qrels

Inherits qrels from neumarco/fa/train

Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant533K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/fa/train/judged")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/fa/train/judged qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

docpairs
270M docpairs

Inherits docpairs from neumarco/fa/train

Document Pair type:
GenericDocPair: (namedtuple)
  1. query_id: str
  2. doc_id_a: str
  3. doc_id_b: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/fa/train/judged")
for docpair in dataset.docpairs_iter():
    docpair # namedtuple<query_id, doc_id_a, doc_id_b>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/fa/train/judged docpairs
[query_id]    [doc_id_a]    [doc_id_b]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Metadata

"neumarco/ru"

The msmarco-passage corpus, translated to Russian.

docs
8.8M docs

Language: ru

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/ru")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/ru docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Metadata

"neumarco/ru/dev"

A version of msmarco-passage/dev, with the corpus translated to Russian.

queries
101K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/ru/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/ru/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

docs
8.8M docs

Inherits docs from neumarco/ru

Language: ru

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/ru/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/ru/dev docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

qrels
59K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant59K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/ru/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/ru/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Metadata

"neumarco/ru/dev/judged"

A version of msmarco-passage/dev/judged, with the corpus translated to Russian.

queries
56K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/ru/dev/judged")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/ru/dev/judged queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

docs
8.8M docs

Inherits docs from neumarco/ru

Language: ru

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/ru/dev/judged")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/ru/dev/judged docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

qrels
59K qrels

Inherits qrels from neumarco/ru/dev

Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant59K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/ru/dev/judged")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/ru/dev/judged qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Metadata

"neumarco/ru/dev/small"

A version of msmarco-passage/dev/small, with the corpus translated to Russian.

queries
7.0K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/ru/dev/small")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/ru/dev/small queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

docs
8.8M docs

Inherits docs from neumarco/ru

Language: ru

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/ru/dev/small")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/ru/dev/small docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

qrels
7.4K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant7.4K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/ru/dev/small")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/ru/dev/small qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Metadata

"neumarco/ru/train"

A version of msmarco-passage/train, with the corpus translated to Russian.

queries
809K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/ru/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/ru/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

docs
8.8M docs

Inherits docs from neumarco/ru

Language: ru

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/ru/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/ru/train docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

qrels
533K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant533K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/ru/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/ru/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

docpairs
270M docpairs
Document Pair type:
GenericDocPair: (namedtuple)
  1. query_id: str
  2. doc_id_a: str
  3. doc_id_b: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/ru/train")
for docpair in dataset.docpairs_iter():
    docpair # namedtuple<query_id, doc_id_a, doc_id_b>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/ru/train docpairs
[query_id]    [doc_id_a]    [doc_id_b]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Metadata

"neumarco/ru/train/judged"

A version of msmarco-passage/train/judged, with the corpus translated to Russian.

queries
503K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/ru/train/judged")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/ru/train/judged queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

docs
8.8M docs

Inherits docs from neumarco/ru

Language: ru

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/ru/train/judged")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/ru/train/judged docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

qrels
533K qrels

Inherits qrels from neumarco/ru/train

Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant533K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/ru/train/judged")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/ru/train/judged qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

docpairs
270M docpairs

Inherits docpairs from neumarco/ru/train

Document Pair type:
GenericDocPair: (namedtuple)
  1. query_id: str
  2. doc_id_a: str
  3. doc_id_b: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/ru/train/judged")
for docpair in dataset.docpairs_iter():
    docpair # namedtuple<query_id, doc_id_a, doc_id_b>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/ru/train/judged docpairs
[query_id]    [doc_id_a]    [doc_id_b]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Metadata

"neumarco/zh"

The msmarco-passage corpus, translated to Chinese.

docs
8.8M docs

Language: zh

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/zh")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/zh docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Metadata

"neumarco/zh/dev"

A version of msmarco-passage/dev, with the corpus translated to Chinese.

queries
101K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/zh/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/zh/dev queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

docs
8.8M docs

Inherits docs from neumarco/zh

Language: zh

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/zh/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/zh/dev docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

qrels
59K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant59K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/zh/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/zh/dev qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Metadata

"neumarco/zh/dev/judged"

A version of msmarco-passage/dev/judged, with the corpus translated to Chinese.

queries
56K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/zh/dev/judged")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/zh/dev/judged queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

docs
8.8M docs

Inherits docs from neumarco/zh

Language: zh

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/zh/dev/judged")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/zh/dev/judged docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

qrels
59K qrels

Inherits qrels from neumarco/zh/dev

Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant59K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/zh/dev/judged")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/zh/dev/judged qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Metadata

"neumarco/zh/dev/small"

A version of msmarco-passage/dev/small, with the corpus translated to Chinese.

queries
7.0K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/zh/dev/small")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/zh/dev/small queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

docs
8.8M docs

Inherits docs from neumarco/zh

Language: zh

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/zh/dev/small")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/zh/dev/small docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

qrels
7.4K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant7.4K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/zh/dev/small")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/zh/dev/small qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Metadata

"neumarco/zh/train"

A version of msmarco-passage/train, with the corpus translated to Chinese.

queries
809K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/zh/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/zh/train queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

docs
8.8M docs

Inherits docs from neumarco/zh

Language: zh

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/zh/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/zh/train docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

qrels
533K qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant533K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/zh/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/zh/train qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

docpairs
270M docpairs
Document Pair type:
GenericDocPair: (namedtuple)
  1. query_id: str
  2. doc_id_a: str
  3. doc_id_b: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/zh/train")
for docpair in dataset.docpairs_iter():
    docpair # namedtuple<query_id, doc_id_a, doc_id_b>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/zh/train docpairs
[query_id]    [doc_id_a]    [doc_id_b]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Metadata

"neumarco/zh/train/judged"

A version of msmarco-passage/train/judged, with the corpus translated to Chinese.

queries
503K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/zh/train/judged")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/zh/train/judged queries
[query_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

docs
8.8M docs

Inherits docs from neumarco/zh

Language: zh

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/zh/train/judged")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/zh/train/judged docs
[doc_id]    [text]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

qrels
533K qrels

Inherits qrels from neumarco/zh/train

Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.DefinitionCount%
1Labeled by crowd worker as relevant533K100.0%

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/zh/train/judged")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/zh/train/judged qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

docpairs
270M docpairs

Inherits docpairs from neumarco/zh/train

Document Pair type:
GenericDocPair: (namedtuple)
  1. query_id: str
  2. doc_id_a: str
  3. doc_id_b: str

Examples:

Python API
import ir_datasets
dataset = ir_datasets.load("neumarco/zh/train/judged")
for docpair in dataset.docpairs_iter():
    docpair # namedtuple<query_id, doc_id_a, doc_id_b>

You can find more details about the Python API here.

CLI
ir_datasets export neumarco/zh/train/judged docpairs
[query_id]    [doc_id_a]    [doc_id_b]
...

You can find more details about the CLI here.

PyTerrier

No example available for PyTerrier

Metadata