ir_datasets
: TREC Fair Ranking 2021The TREC Fair Ranking track evaluates systems according to how well they fairly rank documents. The 2021 track focuses on fairly prioritising Wikimedia articles for editing to provide a fair exposure to articles from different groups.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("trec-fair-2021")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text, marked_up_text, url, quality_score, geographic_locations, quality_score_disk>
You can find more details about the Python API here.
ir_datasets export trec-fair-2021 docs
[doc_id] [title] [text] [marked_up_text] [url] [quality_score] [geographic_locations] [quality_score_disk]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-fair-2021')
# Index trec-fair-2021
indexer = pt.IterDictIndexer('./indices/trec-fair-2021')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'text', 'url'])
You can find more details about PyTerrier indexing here.
Official dev set.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("trec-fair-2021/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, keywords, scope, homepage>
You can find more details about the Python API here.
ir_datasets export trec-fair-2021/dev queries
[query_id] [text] [keywords] [scope] [homepage]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-fair-2021/dev')
index_ref = pt.IndexRef.of('./indices/trec-fair-2021') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
Language: en
Note: Uses docs from trec-fair-2021
Examples:
import ir_datasets
dataset = ir_datasets.load("trec-fair-2021/dev")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text, marked_up_text, url, quality_score, geographic_locations, quality_score_disk>
You can find more details about the Python API here.
ir_datasets export trec-fair-2021/dev docs
[doc_id] [title] [text] [marked_up_text] [url] [quality_score] [geographic_locations] [quality_score_disk]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-fair-2021/dev')
# Index trec-fair-2021
indexer = pt.IterDictIndexer('./indices/trec-fair-2021')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'text', 'url'])
You can find more details about PyTerrier indexing here.
Relevance levels
Rel. | Definition |
---|---|
1 | relevant |
Examples:
import ir_datasets
dataset = ir_datasets.load("trec-fair-2021/dev")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export trec-fair-2021/dev qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:trec-fair-2021/dev')
index_ref = pt.IndexRef.of('./indices/trec-fair-2021') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.