ir_datasets : neuMARCO

import ir_datasets
dataset = ir_datasets.load("neumarco/fa")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/fa docs



[doc_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.neumarco.fa')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

{
  "docs": {
    "count": 8841823,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  }
}

`"neumarco/fa/dev"`

A version of msmarco-passage/dev, with the corpus translated to Persian (Farsi).

101K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/fa/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/fa/dev queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.neumarco.fa.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

8.8M docs

Inherits docs from neumarco/fa

Language: fa

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/fa/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/fa/dev docs



[doc_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.neumarco.fa.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

59K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	Labeled by crowd worker as relevant	`59K`	100.0%

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/fa/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/fa/dev qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.neumarco.fa.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

{
  "docs": {
    "count": 8841823,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 101093
  },
  "qrels": {
    "count": 59273,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 59273
        }
      }
    }
  }
}

`"neumarco/fa/dev/judged"`

A version of msmarco-passage/dev/judged, with the corpus translated to Persian (Farsi).

56K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/fa/dev/judged")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/fa/dev/judged queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.neumarco.fa.dev.judged.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

8.8M docs

Inherits docs from neumarco/fa

Language: fa

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/fa/dev/judged")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/fa/dev/judged docs



[doc_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.neumarco.fa.dev.judged')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

59K qrels

Inherits qrels from neumarco/fa/dev

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	Labeled by crowd worker as relevant	`59K`	100.0%

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/fa/dev/judged")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/fa/dev/judged qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.neumarco.fa.dev.judged.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

{
  "docs": {
    "count": 8841823,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 55578
  },
  "qrels": {
    "count": 59273,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 59273
        }
      }
    }
  }
}

`"neumarco/fa/dev/small"`

A version of msmarco-passage/dev/small, with the corpus translated to Persian (Farsi).

7.0K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/fa/dev/small")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/fa/dev/small queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.neumarco.fa.dev.small.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

8.8M docs

Inherits docs from neumarco/fa

Language: fa

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/fa/dev/small")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/fa/dev/small docs



[doc_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.neumarco.fa.dev.small')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

7.4K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	Labeled by crowd worker as relevant	`7.4K`	100.0%

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/fa/dev/small")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/fa/dev/small qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.neumarco.fa.dev.small.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

{
  "docs": {
    "count": 8841823,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 6980
  },
  "qrels": {
    "count": 7437,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 7437
        }
      }
    }
  }
}

`"neumarco/fa/train"`

A version of msmarco-passage/train, with the corpus translated to Persian (Farsi).

809K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/fa/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/fa/train queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.neumarco.fa.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

8.8M docs

Inherits docs from neumarco/fa

Language: fa

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/fa/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/fa/train docs



[doc_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.neumarco.fa.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

533K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	Labeled by crowd worker as relevant	`533K`	100.0%

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/fa/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/fa/train qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.neumarco.fa.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

270M docpairs

Document Pair type:

GenericDocPair: (namedtuple)

query_id: str
doc_id_a: str
doc_id_b: str

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/fa/train")
for docpair in dataset.docpairs_iter():
    docpair # namedtuple<query_id, doc_id_a, doc_id_b>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/fa/train docpairs



[query_id]    [doc_id_a]    [doc_id_b]
...

You can find more details about the CLI here.

No example available for PyTerrier

import datamaestro # Supposes experimaestro-ir be installed

docpairs = datamaestro.prepare_dataset('irds.neumarco.fa.train.docpairs')
next(docpairs.iter())  # Display the first triplet

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets

{
  "docs": {
    "count": 8841823,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 808731
  },
  "qrels": {
    "count": 532761,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 532761
        }
      }
    }
  },
  "docpairs": {
    "count": 269919004
  }
}

`"neumarco/fa/train/judged"`

A version of msmarco-passage/train/judged, with the corpus translated to Persian (Farsi).

503K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/fa/train/judged")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/fa/train/judged queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.neumarco.fa.train.judged.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

8.8M docs

Inherits docs from neumarco/fa

Language: fa

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/fa/train/judged")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/fa/train/judged docs



[doc_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.neumarco.fa.train.judged')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

533K qrels

Inherits qrels from neumarco/fa/train

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	Labeled by crowd worker as relevant	`533K`	100.0%

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/fa/train/judged")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/fa/train/judged qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.neumarco.fa.train.judged.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

270M docpairs

Inherits docpairs from neumarco/fa/train

Document Pair type:

GenericDocPair: (namedtuple)

query_id: str
doc_id_a: str
doc_id_b: str

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/fa/train/judged")
for docpair in dataset.docpairs_iter():
    docpair # namedtuple<query_id, doc_id_a, doc_id_b>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/fa/train/judged docpairs



[query_id]    [doc_id_a]    [doc_id_b]
...

You can find more details about the CLI here.

No example available for PyTerrier

import datamaestro # Supposes experimaestro-ir be installed

docpairs = datamaestro.prepare_dataset('irds.neumarco.fa.train.judged.docpairs')
next(docpairs.iter())  # Display the first triplet

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets

{
  "docs": {
    "count": 8841823,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 502939
  },
  "qrels": {
    "count": 532761,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 532761
        }
      }
    }
  },
  "docpairs": {
    "count": 269919004
  }
}

`"neumarco/ru"`

The msmarco-passage corpus, translated to Russian.

docs

8.8M docs

Language: ru

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/ru")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/ru docs



[doc_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.neumarco.ru')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

{
  "docs": {
    "count": 8841823,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  }
}

`"neumarco/ru/dev"`

A version of msmarco-passage/dev, with the corpus translated to Russian.

101K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/ru/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/ru/dev queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.neumarco.ru.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

8.8M docs

Inherits docs from neumarco/ru

Language: ru

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/ru/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/ru/dev docs



[doc_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.neumarco.ru.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

59K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	Labeled by crowd worker as relevant	`59K`	100.0%

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/ru/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/ru/dev qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.neumarco.ru.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

{
  "docs": {
    "count": 8841823,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 101093
  },
  "qrels": {
    "count": 59273,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 59273
        }
      }
    }
  }
}

`"neumarco/ru/dev/judged"`

A version of msmarco-passage/dev/judged, with the corpus translated to Russian.

56K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/ru/dev/judged")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/ru/dev/judged queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.neumarco.ru.dev.judged.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

8.8M docs

Inherits docs from neumarco/ru

Language: ru

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/ru/dev/judged")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/ru/dev/judged docs



[doc_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.neumarco.ru.dev.judged')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

59K qrels

Inherits qrels from neumarco/ru/dev

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	Labeled by crowd worker as relevant	`59K`	100.0%

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/ru/dev/judged")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/ru/dev/judged qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.neumarco.ru.dev.judged.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

{
  "docs": {
    "count": 8841823,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 55578
  },
  "qrels": {
    "count": 59273,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 59273
        }
      }
    }
  }
}

`"neumarco/ru/dev/small"`

A version of msmarco-passage/dev/small, with the corpus translated to Russian.

7.0K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/ru/dev/small")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/ru/dev/small queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.neumarco.ru.dev.small.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

8.8M docs

Inherits docs from neumarco/ru

Language: ru

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/ru/dev/small")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/ru/dev/small docs



[doc_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.neumarco.ru.dev.small')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

7.4K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	Labeled by crowd worker as relevant	`7.4K`	100.0%

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/ru/dev/small")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/ru/dev/small qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.neumarco.ru.dev.small.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

{
  "docs": {
    "count": 8841823,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 6980
  },
  "qrels": {
    "count": 7437,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 7437
        }
      }
    }
  }
}

`"neumarco/ru/train"`

A version of msmarco-passage/train, with the corpus translated to Russian.

809K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/ru/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/ru/train queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.neumarco.ru.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

8.8M docs

Inherits docs from neumarco/ru

Language: ru

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/ru/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/ru/train docs



[doc_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.neumarco.ru.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

533K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	Labeled by crowd worker as relevant	`533K`	100.0%

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/ru/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/ru/train qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.neumarco.ru.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

270M docpairs

Document Pair type:

GenericDocPair: (namedtuple)

query_id: str
doc_id_a: str
doc_id_b: str

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/ru/train")
for docpair in dataset.docpairs_iter():
    docpair # namedtuple<query_id, doc_id_a, doc_id_b>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/ru/train docpairs



[query_id]    [doc_id_a]    [doc_id_b]
...

You can find more details about the CLI here.

No example available for PyTerrier

import datamaestro # Supposes experimaestro-ir be installed

docpairs = datamaestro.prepare_dataset('irds.neumarco.ru.train.docpairs')
next(docpairs.iter())  # Display the first triplet

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets

{
  "docs": {
    "count": 8841823,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 808731
  },
  "qrels": {
    "count": 532761,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 532761
        }
      }
    }
  },
  "docpairs": {
    "count": 269919004
  }
}

`"neumarco/ru/train/judged"`

A version of msmarco-passage/train/judged, with the corpus translated to Russian.

503K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/ru/train/judged")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/ru/train/judged queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.neumarco.ru.train.judged.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

8.8M docs

Inherits docs from neumarco/ru

Language: ru

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/ru/train/judged")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/ru/train/judged docs



[doc_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.neumarco.ru.train.judged')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

533K qrels

Inherits qrels from neumarco/ru/train

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	Labeled by crowd worker as relevant	`533K`	100.0%

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/ru/train/judged")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/ru/train/judged qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.neumarco.ru.train.judged.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

270M docpairs

Inherits docpairs from neumarco/ru/train

Document Pair type:

GenericDocPair: (namedtuple)

query_id: str
doc_id_a: str
doc_id_b: str

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/ru/train/judged")
for docpair in dataset.docpairs_iter():
    docpair # namedtuple<query_id, doc_id_a, doc_id_b>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/ru/train/judged docpairs



[query_id]    [doc_id_a]    [doc_id_b]
...

You can find more details about the CLI here.

No example available for PyTerrier

import datamaestro # Supposes experimaestro-ir be installed

docpairs = datamaestro.prepare_dataset('irds.neumarco.ru.train.judged.docpairs')
next(docpairs.iter())  # Display the first triplet

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets

{
  "docs": {
    "count": 8841823,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 502939
  },
  "qrels": {
    "count": 532761,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 532761
        }
      }
    }
  },
  "docpairs": {
    "count": 269919004
  }
}

`"neumarco/zh"`

The msmarco-passage corpus, translated to Chinese.

docs

8.8M docs

Language: zh

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/zh")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/zh docs



[doc_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.neumarco.zh')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

{
  "docs": {
    "count": 8841823,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  }
}

`"neumarco/zh/dev"`

A version of msmarco-passage/dev, with the corpus translated to Chinese.

101K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/zh/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/zh/dev queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.neumarco.zh.dev.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

8.8M docs

Inherits docs from neumarco/zh

Language: zh

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/zh/dev")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/zh/dev docs



[doc_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.neumarco.zh.dev')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

59K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	Labeled by crowd worker as relevant	`59K`	100.0%

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/zh/dev")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/zh/dev qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.neumarco.zh.dev.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

{
  "docs": {
    "count": 8841823,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 101093
  },
  "qrels": {
    "count": 59273,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 59273
        }
      }
    }
  }
}

`"neumarco/zh/dev/judged"`

A version of msmarco-passage/dev/judged, with the corpus translated to Chinese.

56K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/zh/dev/judged")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/zh/dev/judged queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.neumarco.zh.dev.judged.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

8.8M docs

Inherits docs from neumarco/zh

Language: zh

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/zh/dev/judged")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/zh/dev/judged docs



[doc_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.neumarco.zh.dev.judged')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

59K qrels

Inherits qrels from neumarco/zh/dev

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	Labeled by crowd worker as relevant	`59K`	100.0%

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/zh/dev/judged")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/zh/dev/judged qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.neumarco.zh.dev.judged.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

{
  "docs": {
    "count": 8841823,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 55578
  },
  "qrels": {
    "count": 59273,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 59273
        }
      }
    }
  }
}

`"neumarco/zh/dev/small"`

A version of msmarco-passage/dev/small, with the corpus translated to Chinese.

7.0K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/zh/dev/small")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/zh/dev/small queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.neumarco.zh.dev.small.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

8.8M docs

Inherits docs from neumarco/zh

Language: zh

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/zh/dev/small")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/zh/dev/small docs



[doc_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.neumarco.zh.dev.small')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

7.4K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	Labeled by crowd worker as relevant	`7.4K`	100.0%

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/zh/dev/small")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/zh/dev/small qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.neumarco.zh.dev.small.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

{
  "docs": {
    "count": 8841823,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 6980
  },
  "qrels": {
    "count": 7437,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 7437
        }
      }
    }
  }
}

`"neumarco/zh/train"`

A version of msmarco-passage/train, with the corpus translated to Chinese.

809K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/zh/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/zh/train queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.neumarco.zh.train.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

8.8M docs

Inherits docs from neumarco/zh

Language: zh

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/zh/train")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/zh/train docs



[doc_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.neumarco.zh.train')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

533K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	Labeled by crowd worker as relevant	`533K`	100.0%

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/zh/train")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/zh/train qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.neumarco.zh.train.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

270M docpairs

Document Pair type:

GenericDocPair: (namedtuple)

query_id: str
doc_id_a: str
doc_id_b: str

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/zh/train")
for docpair in dataset.docpairs_iter():
    docpair # namedtuple<query_id, doc_id_a, doc_id_b>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/zh/train docpairs



[query_id]    [doc_id_a]    [doc_id_b]
...

You can find more details about the CLI here.

No example available for PyTerrier

import datamaestro # Supposes experimaestro-ir be installed

docpairs = datamaestro.prepare_dataset('irds.neumarco.zh.train.docpairs')
next(docpairs.iter())  # Display the first triplet

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets

{
  "docs": {
    "count": 8841823,
    "fields": {
      "doc_id": {
        "max_len": 7,
        "common_prefix": ""
      }
    }
  },
  "queries": {
    "count": 808731
  },
  "qrels": {
    "count": 532761,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 532761
        }
      }
    }
  },
  "docpairs": {
    "count": 269919004
  }
}

`"neumarco/zh/train/judged"`

A version of msmarco-passage/train/judged, with the corpus translated to Chinese.

503K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/zh/train/judged")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/zh/train/judged queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
topics = prepare_dataset('irds.neumarco.zh.train.judged.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.

docs

8.8M docs

Inherits docs from neumarco/zh

Language: zh

Document type:

GenericDoc: (namedtuple)

doc_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/zh/train/judged")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/zh/train/judged docs



[doc_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.neumarco.zh.train.judged')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore

qrels

533K qrels

Inherits qrels from neumarco/zh/train

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
1	Labeled by crowd worker as relevant	`533K`	100.0%

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/zh/train/judged")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/zh/train/judged qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.neumarco.zh.train.judged.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.

270M docpairs

Inherits docpairs from neumarco/zh/train

Document Pair type:

GenericDocPair: (namedtuple)

query_id: str
doc_id_a: str
doc_id_b: str

Examples:

import ir_datasets
dataset = ir_datasets.load("neumarco/zh/train/judged")
for docpair in dataset.docpairs_iter():
    docpair # namedtuple<query_id, doc_id_a, doc_id_b>

You can find more details about the Python API here.

CLI

ir_datasets export neumarco/zh/train/judged docpairs



[query_id]    [doc_id_a]    [doc_id_b]
...

You can find more details about the CLI here.

No example available for PyTerrier

import datamaestro # Supposes experimaestro-ir be installed

docpairs = datamaestro.prepare_dataset('irds.neumarco.zh.train.judged.docpairs')
next(docpairs.iter())  # Display the first triplet

This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about TrainingTriplets