Github: datasets/trec_arabic.py

`ir_datasets`: TREC Arabic

Index

trec-arabic
trec-arabic/ar2001
trec-arabic/ar2002

`"trec-arabic"`

A collection of news articles in Arabic, used for multi-lingual evaluation in TREC 2001 and TREC 2002.

Document collection from LDC2001T55.

docs

Language: ar

Document type:

TrecDoc: (namedtuple)

doc_id: str
text: str
marked_up_doc: str

Example


import ir_datasets
dataset = ir_datasets.load('trec-arabic')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text, marked_up_doc>

Citation

bibtex: @misc{LDC2001T55, title={Arabic Newswire Part 1 LDC2001T55}, author={Graff, David, and Walker, Kevin}, year={2001}, url={https://catalog.ldc.upenn.edu/LDC2001T55}, publisher={Linguistic Data Consortium} }

`"trec-arabic/ar2001"`

Arabic benchmark from TREC 2001.

Task Overview Paper

queries

Language: ar

Query type:

TrecQuery: (namedtuple)

query_id: str
title: str
description: str
narrative: str

Example


import ir_datasets
dataset = ir_datasets.load('trec-arabic/ar2001')
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, description, narrative>

docs

Language: ar

Document type:

TrecDoc: (namedtuple)

doc_id: str
text: str
marked_up_doc: str

Example


import ir_datasets
dataset = ir_datasets.load('trec-arabic/ar2001')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text, marked_up_doc>

qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition
0	not relevant
1	relevant

Example


import ir_datasets
dataset = ir_datasets.load('trec-arabic/ar2001')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

Citation

bibtex: @inproceedings{gey2001arabic, title={The TREC-2001 Cross-Language Information Retrieval Track: Searching Arabic using English, French or Arabic Queries}, author={Fredric Gey and Douglas Oard}, booktitle={TREC}, year={2001} }

`"trec-arabic/ar2002"`

Arabic benchmark from TREC 2002.

Task Overview Paper

queries

Language: ar

Query type:

TrecQuery: (namedtuple)

query_id: str
title: str
description: str
narrative: str

Example


import ir_datasets
dataset = ir_datasets.load('trec-arabic/ar2002')
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, description, narrative>

docs

Language: ar

Document type:

TrecDoc: (namedtuple)

doc_id: str
text: str
marked_up_doc: str

Example


import ir_datasets
dataset = ir_datasets.load('trec-arabic/ar2002')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, text, marked_up_doc>

qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition
0	not relevant
1	relevant

Example


import ir_datasets
dataset = ir_datasets.load('trec-arabic/ar2002')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

Citation

bibtex: @inproceedings{gey2002arabic, title={The TREC-2002 Arabic/English CLIR Track}, author={Fredric Gey and Douglas Oard}, booktitle={TREC}, year={2002} }

ir_datasets: TREC Arabic

"trec-arabic"

"trec-arabic/ar2001"

"trec-arabic/ar2002"

`ir_datasets`: TREC Arabic

`"trec-arabic"`

`"trec-arabic/ar2001"`

`"trec-arabic/ar2002"`