← home
Github: datasets/trec_arabic.py

ir_datasets: TREC Arabic

Index
  1. trec-arabic
  2. trec-arabic/ar2001
  3. trec-arabic/ar2002

"trec-arabic"

A collection of news articles in Arabic, used for multi-lingual evaluation in TREC 2001 and TREC 2002.

Document collection from LDC2001T55.

docs

Language: ar

Document type:
TrecDoc: (namedtuple)
  1. doc_id: str
  2. text: str
  3. marked_up_doc: str

Example

import ir_datasets
dataset = ir_datasets.load('trec-arabic')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text, marked_up_doc>
Citation
bibtex: @misc{LDC2001T55, title={Arabic Newswire Part 1 LDC2001T55}, author={Graff, David, and Walker, Kevin}, year={2001}, url={https://catalog.ldc.upenn.edu/LDC2001T55}, publisher={Linguistic Data Consortium} }

"trec-arabic/ar2001"

Arabic benchmark from TREC 2001.

queries

Language: ar

Query type:
TrecQuery: (namedtuple)
  1. query_id: str
  2. title: str
  3. description: str
  4. narrative: str

Example

import ir_datasets
dataset = ir_datasets.load('trec-arabic/ar2001')
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, description, narrative>
docs

Language: ar

Document type:
TrecDoc: (namedtuple)
  1. doc_id: str
  2. text: str
  3. marked_up_doc: str

Example

import ir_datasets
dataset = ir_datasets.load('trec-arabic/ar2001')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text, marked_up_doc>
qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
0not relevant
1relevant

Example

import ir_datasets
dataset = ir_datasets.load('trec-arabic/ar2001')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
Citation
bibtex: @inproceedings{gey2001arabic, title={The TREC-2001 Cross-Language Information Retrieval Track: Searching Arabic using English, French or Arabic Queries}, author={Fredric Gey and Douglas Oard}, booktitle={TREC}, year={2001} }

"trec-arabic/ar2002"

Arabic benchmark from TREC 2002.

queries

Language: ar

Query type:
TrecQuery: (namedtuple)
  1. query_id: str
  2. title: str
  3. description: str
  4. narrative: str

Example

import ir_datasets
dataset = ir_datasets.load('trec-arabic/ar2002')
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, description, narrative>
docs

Language: ar

Document type:
TrecDoc: (namedtuple)
  1. doc_id: str
  2. text: str
  3. marked_up_doc: str

Example

import ir_datasets
dataset = ir_datasets.load('trec-arabic/ar2002')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text, marked_up_doc>
qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
0not relevant
1relevant

Example

import ir_datasets
dataset = ir_datasets.load('trec-arabic/ar2002')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
Citation
bibtex: @inproceedings{gey2002arabic, title={The TREC-2002 Arabic/English CLIR Track}, author={Fredric Gey and Douglas Oard}, booktitle={TREC}, year={2002} }