ir_datasets
: TREC Fair RankingThe TREC Fair Ranking track evaluates systems according to how well they fairly rank documents.
The TREC Fair Ranking track evaluates systems according to how well they fairly rank documents.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("trec-fair/2021")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text, marked_up_text, url, quality_score, geographic_locations, quality_score_disk>
You can find more details about the Python API here.
ir_datasets export trec-fair/2021 docs
[doc_id] [title] [text] [marked_up_text] [url] [quality_score] [geographic_locations] [quality_score_disk]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-fair/2021')
# Index trec-fair/2021
indexer = pt.IterDictIndexer('./indices/trec-fair_2021')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'text', 'url'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.trec-fair.2021')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
{ "docs": { "count": 6280328, "fields": { "doc_id": { "max_len": 8, "common_prefix": "" } } } }
Official TREC Fair Ranking 2021 evaluation set.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("trec-fair/2021/eval")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, keywords, scope>
You can find more details about the Python API here.
ir_datasets export trec-fair/2021/eval queries
[query_id] [text] [keywords] [scope]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-fair/2021/eval')
index_ref = pt.IndexRef.of('./indices/trec-fair_2021') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.trec-fair.2021.eval.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from trec-fair/2021
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("trec-fair/2021/eval")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text, marked_up_text, url, quality_score, geographic_locations, quality_score_disk>
You can find more details about the Python API here.
ir_datasets export trec-fair/2021/eval docs
[doc_id] [title] [text] [marked_up_text] [url] [quality_score] [geographic_locations] [quality_score_disk]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-fair/2021/eval')
# Index trec-fair/2021
indexer = pt.IterDictIndexer('./indices/trec-fair_2021')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'text', 'url'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.trec-fair.2021.eval')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | relevant | 14K | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("trec-fair/2021/eval")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export trec-fair/2021/eval qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:trec-fair/2021/eval')
index_ref = pt.IndexRef.of('./indices/trec-fair_2021') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.trec-fair.2021.eval.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
{ "docs": { "count": 6280328, "fields": { "doc_id": { "max_len": 8, "common_prefix": "" } } }, "queries": { "count": 49 }, "qrels": { "count": 13757, "fields": { "relevance": { "counts_by_value": { "1": 13757 } } } } }
Official TREC Fair Ranking 2021 train set.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("trec-fair/2021/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, keywords, scope, homepage>
You can find more details about the Python API here.
ir_datasets export trec-fair/2021/train queries
[query_id] [text] [keywords] [scope] [homepage]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-fair/2021/train')
index_ref = pt.IndexRef.of('./indices/trec-fair_2021') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.trec-fair.2021.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from trec-fair/2021
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("trec-fair/2021/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text, marked_up_text, url, quality_score, geographic_locations, quality_score_disk>
You can find more details about the Python API here.
ir_datasets export trec-fair/2021/train docs
[doc_id] [title] [text] [marked_up_text] [url] [quality_score] [geographic_locations] [quality_score_disk]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-fair/2021/train')
# Index trec-fair/2021
indexer = pt.IterDictIndexer('./indices/trec-fair_2021')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'text', 'url'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.trec-fair.2021.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | relevant | 2.2M | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("trec-fair/2021/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export trec-fair/2021/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:trec-fair/2021/train')
index_ref = pt.IndexRef.of('./indices/trec-fair_2021') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.trec-fair.2021.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
{ "docs": { "count": 6280328, "fields": { "doc_id": { "max_len": 8, "common_prefix": "" } } }, "queries": { "count": 57 }, "qrels": { "count": 2185446, "fields": { "relevance": { "counts_by_value": { "1": 2185446 } } } } }
The TREC Fair Ranking 2022 track focuses on fairly prioritising Wikimedia articles for editing to provide a fair exposure to articles from different groups.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("trec-fair/2022")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text, url, pred_qual, qual_cat, page_countries, page_subcont_regions, source_countries, source_subcont_regions, gender, occupations, years, num_sitelinks, relative_pageviews, first_letter, creation_date, first_letter_category, gender_category, creation_date_category, years_category, relative_pageviews_category, num_sitelinks_category>
You can find more details about the Python API here.
ir_datasets export trec-fair/2022 docs
[doc_id] [title] [text] [url] [pred_qual] [qual_cat] [page_countries] [page_subcont_regions] [source_countries] [source_subcont_regions] [gender] [occupations] [years] [num_sitelinks] [relative_pageviews] [first_letter] [creation_date] [first_letter_category] [gender_category] [creation_date_category] [years_category] [relative_pageviews_category] [num_sitelinks_category]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-fair/2022')
# Index trec-fair/2022
indexer = pt.IterDictIndexer('./indices/trec-fair_2022')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'text', 'url'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.trec-fair.2022')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
{ "docs": { "count": 6475537, "fields": { "doc_id": { "max_len": 8, "common_prefix": "" } } } }
Official TREC Fair Ranking 2022 train set.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("trec-fair/2022/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, text, url>
You can find more details about the Python API here.
ir_datasets export trec-fair/2022/train queries
[query_id] [text] [url]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-fair/2022/train')
index_ref = pt.IndexRef.of('./indices/trec-fair_2022') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pipeline(dataset.get_topics('text'))
You can find more details about PyTerrier retrieval here.
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.trec-fair.2022.train.queries') # AdhocTopics
for topic in topics.iter():
print(topic) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from trec-fair/2022
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("trec-fair/2022/train")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, text, url, pred_qual, qual_cat, page_countries, page_subcont_regions, source_countries, source_subcont_regions, gender, occupations, years, num_sitelinks, relative_pageviews, first_letter, creation_date, first_letter_category, gender_category, creation_date_category, years_category, relative_pageviews_category, num_sitelinks_category>
You can find more details about the Python API here.
ir_datasets export trec-fair/2022/train docs
[doc_id] [title] [text] [url] [pred_qual] [qual_cat] [page_countries] [page_subcont_regions] [source_countries] [source_subcont_regions] [gender] [occupations] [years] [num_sitelinks] [relative_pageviews] [first_letter] [creation_date] [first_letter_category] [gender_category] [creation_date_category] [years_category] [relative_pageviews_category] [num_sitelinks_category]
...
You can find more details about the CLI here.
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:trec-fair/2022/train')
# Index trec-fair/2022
indexer = pt.IterDictIndexer('./indices/trec-fair_2022')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'text', 'url'])
You can find more details about PyTerrier indexing here.
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.trec-fair.2022.train')
for doc in dataset.iter_documents():
print(doc) # an AdhocDocumentStore
break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
Rel. | Definition | Count | % |
---|---|---|---|
1 | relevant | 2.1M | 100.0% |
Examples:
import ir_datasets
dataset = ir_datasets.load("trec-fair/2022/train")
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export trec-fair/2022/train qrels --format tsv
[query_id] [doc_id] [relevance] [iteration]
...
You can find more details about the CLI here.
import pyterrier as pt
from pyterrier.measures import *
pt.init()
dataset = pt.get_dataset('irds:trec-fair/2022/train')
index_ref = pt.IndexRef.of('./indices/trec-fair_2022') # assumes you have already built an index
pipeline = pt.BatchRetrieve(index_ref, wmodel='BM25')
# (optionally other pipeline components)
pt.Experiment(
[pipeline],
dataset.get_topics('text'),
dataset.get_qrels(),
[MAP, nDCG@20]
)
You can find more details about PyTerrier experiments here.
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.trec-fair.2022.train.qrels') # AdhocAssessments
for topic_qrels in qrels.iter():
print(topic_qrels) # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
{ "docs": { "count": 6475537, "fields": { "doc_id": { "max_len": 8, "common_prefix": "" } } }, "queries": { "count": 50 }, "qrels": { "count": 2088306, "fields": { "relevance": { "counts_by_value": { "1": 2088306 } } } } }