← home
Github: datasets/trec_arabic.py

ir_datasets: TREC Arabic

Index
  1. trec-arabic
  2. trec-arabic/ar2001
  3. trec-arabic/ar2002

"trec-arabic"

A collection of news articles in Arabic, used for multi-lingual evaluation in TREC 2001 and TREC 2002.

Document collection from LDC2001T55.

docsCitation

Language: ar

Document type:
TrecDoc: (namedtuple)
  1. doc_id: str
  2. text: str
  3. marked_up_doc: str

Example

import ir_datasets
dataset = ir_datasets.load('trec-arabic')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text, marked_up_doc>

"trec-arabic/ar2001"

Arabic benchmark from TREC 2001.

queriesdocsqrelsCitation

Language: ar

Query type:
TrecQuery: (namedtuple)
  1. query_id: str
  2. title: str
  3. description: str
  4. narrative: str

Example

import ir_datasets
dataset = ir_datasets.load('trec-arabic/ar2001')
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, description, narrative>

"trec-arabic/ar2002"

Arabic benchmark from TREC 2002.

queriesdocsqrelsCitation

Language: ar

Query type:
TrecQuery: (namedtuple)
  1. query_id: str
  2. title: str
  3. description: str
  4. narrative: str

Example

import ir_datasets
dataset = ir_datasets.load('trec-arabic/ar2002')
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, description, narrative>