ir-measures is a Python package that interfaces with several IR evaluation tools, including pytrec_eval, gdeval, trectools, and others.
To get started with ir-measures, see this guide.
ir-measures accepts qrels provided by ir_datasets directly in its python API.
import ir_datasets
import ir_measures
qrels = ir_datasets.load('trec-robust04').qrels_iter()
run = ir_measures.read_trec_run('path/to/run')
ir_measures.calc_aggregate([nDCG@10, P@5, P(rel=2)@5, Judged@10], qrels, run)
{
nDCG@10: 0.3793,
P@5: 0.4185,
P(rel=2)@5: 0.0803,
Judged@10: 0.9628
}
If using the ir-measures CLI and a shell like bash, you can use process substitution treat the output of the ir_datasets export command as a file:
ir_measures <(ir_datasets export trec-robust04 qrels) path/to/run 'nDCG@10 P@5 P(rel=2)@5 Judged@10'
nDCG@10 0.3793
P@5 0.4185
P(rel=2)@5 0.0803
Judged@10 0.9628
Alternatively, you can always save the output of ir_datasets export as a file:
ir_datasets export trec-robust04 qrels > trec-robust04.qrels
ir_measures trec-robust04.qrels path/to/run 'nDCG@10 P@5 P(rel=2)@5 Judged@10'
nDCG@10 0.3793
P@5 0.4185
P(rel=2)@5 0.0803
Judged@10 0.9628