← home
Github: datasets/trec_spanish.py

ir_datasets: TREC Spanish

Index
  1. trec-spanish
  2. trec-spanish/trec3
  3. trec-spanish/trec4

"trec-spanish"

A collection of news articles in Spanish, used for multi-lingual evaluation in TREC 3 and TREC 4.

Document collection from LDC2000T51.

docs

Language: es

Document type:
TrecDoc: (namedtuple)
  1. doc_id: str
  2. text: str
  3. marked_up_doc: str

Example

import ir_datasets
dataset = ir_datasets.load('trec-spanish')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text, marked_up_doc>
Citation
bibtex: @misc{LDC2000T51, title={TREC Spanish LDC2000T51}, author={Rogers, Willie}, year={2000}, url={https://catalog.ldc.upenn.edu/LDC2000T51}, publisher={Linguistic Data Consortium} }

"trec-spanish/trec3"

Spanish benchmark from TREC 3.

queries

Language: multiple/other/unknown

Query type:
TrecSpanish3Query: (namedtuple)
  1. query_id: str
  2. title_es: str
  3. title_en: str
  4. description_es: str
  5. description_en: str
  6. narrative_es: str
  7. narrative_en: str

Example

import ir_datasets
dataset = ir_datasets.load('trec-spanish/trec3')
for query in dataset.queries_iter():
    query # namedtuple<query_id, title_es, title_en, description_es, description_en, narrative_es, narrative_en>
docs

Language: es

Document type:
TrecDoc: (namedtuple)
  1. doc_id: str
  2. text: str
  3. marked_up_doc: str

Example

import ir_datasets
dataset = ir_datasets.load('trec-spanish/trec3')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text, marked_up_doc>
qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
0not relevant
1relevant

Example

import ir_datasets
dataset = ir_datasets.load('trec-spanish/trec3')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
Citation
bibtex: @inproceedings{harman1994trec3, title={Overview of the Third Text REtrieval Conference (TREC-3)}, author={Donna Harman}, booktitle={TREC}, year={1994} }

"trec-spanish/trec4"

Spanish benchmark from TREC 4.

queries

Language: multiple/other/unknown

Query type:
TrecSpanish4Query: (namedtuple)
  1. query_id: str
  2. description_es1: str
  3. description_en1: str
  4. description_es2: str
  5. description_en2: str

Example

import ir_datasets
dataset = ir_datasets.load('trec-spanish/trec4')
for query in dataset.queries_iter():
    query # namedtuple<query_id, description_es1, description_en1, description_es2, description_en2>
docs

Language: es

Document type:
TrecDoc: (namedtuple)
  1. doc_id: str
  2. text: str
  3. marked_up_doc: str

Example

import ir_datasets
dataset = ir_datasets.load('trec-spanish/trec4')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text, marked_up_doc>
qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
0not relevant
1relevant

Example

import ir_datasets
dataset = ir_datasets.load('trec-spanish/trec4')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
Citation
bibtex: @inproceedings{harman1995trec4, title={Overview of the Fourth Text REtrieval Conference (TREC-4)}, author={Donna Harman}, booktitle={TREC}, year={1995} }