ir_datasets
: Catalog
ir_datasets
provides a common interface to many IR ranking datasets.
Install with pip:
pip install --upgrade ir_datasets
Guides:
✅: Data available as automatic download
⚠️: Data available from a third party
⬆️: Data inherited from a parent dataset (highlights which one on hover)
These datasets have been deprecated. We keep them in the package for reproducibility, but better alternative dataset IDs exist (e.g., with improved corpus parsing).
trec-fair-2021, trec-fair-2021/eval, trec-fair-2021/train, trec-robust04, trec-robust04/fold1, trec-robust04/fold2, trec-robust04/fold3, trec-robust04/fold4, trec-robust04/fold5
When using datasets provided by this package, be sure to properly cite them. Bibtex for each dataset can be found on each dataset's documenation page.
If you use this tool, please cite our SIGIR resource paper:
@inproceedings{macavaney:sigir2021-irds, author = {MacAvaney, Sean and Yates, Andrew and Feldman, Sergey and Downey, Doug and Cohan, Arman and Goharian, Nazli}, title = {Simplified Data Wrangling with ir_datasets}, year = {2021}, booktitle = {SIGIR} }