← home
Github: datasets/antique.py

ir_datasets: ANTIQUE

Index
  1. antique
  2. antique/test
  3. antique/test/non-offensive
  4. antique/train
  5. antique/train/split200-train
  6. antique/train/split200-valid

"antique"

"ANTIQUE is a non-factoid quesiton answering dataset based on the questions and answers of Yahoo! Webscope L6."

  • Documents: Short answer passages (from Yahoo Answers)
  • Queries: Natural language questions (from Yahoo Answers)
  • Dataset Paper
docsCitationMetadata
404K docs

Language: en

Document type:
GenericDoc: (namedtuple)
  1. doc_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("antique")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text>

You can find more details about the Python API here.


"antique/test"

Official test set of the ANTIQUE dataset.

queriesdocsqrelsCitationMetadata
200 queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("antique/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"antique/test/non-offensive"

antique/test without a set of queries deemed by the authors of ANTIQUE to be "offensive (and noisy)."

queriesdocsqrelsCitationMetadata
176 queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("antique/test/non-offensive")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"antique/train"

Official train set of the ANTIQUE dataset.

queriesdocsqrelsCitationMetadata
2.4K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("antique/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"antique/train/split200-train"

antique/train without the 200 queries used by antique/train/split200-valid.

queriesdocsqrelsCitationMetadata
2.2K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("antique/train/split200-train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"antique/train/split200-valid"

A held-out subset of 200 queries from antique/train. Use in conjunction with antique/train/split200-train.

queriesdocsqrelsCitationMetadata
200 queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("antique/train/split200-valid")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.