Github: allenai/ir_datasets

ir_datasets: Catalog

ir_datasets provides a common interface to many IR ranking datasets.

Getting Started

Install with pip:

pip install ir_datasets==0.5.3

Guides:

Featured

Active Shared Tasks: From SIGIR 2022:

Dataset Index

✅: Data available as automatic download

⚠️: Data available from a third party

⬆️: Data inherited from a parent dataset (highlights which one on hover)

Dataset docs queries qrels scoreddocs docpairs qlogs
antique
antique/test⬆️
antique/test/non-offensive⬆️
antique/train⬆️
antique/train/split200-train⬆️
antique/train/split200-valid⬆️
aol-ia⚠️
aquaint⚠️
aquaint/trec-robust-2005⬆️
argsme
argsme/1.0
argsme/1.0-cleaned
argsme/1.0/touche-2020-task-1/uncorrected⬆️
argsme/2020-04-01
argsme/2020-04-01/debateorg
argsme/2020-04-01/debatepedia
argsme/2020-04-01/debatewise
argsme/2020-04-01/idebate
argsme/2020-04-01/parliamentary
argsme/2020-04-01/touche-2020-task-1⬆️
argsme/2020-04-01/touche-2020-task-1/uncorrected⬆️⬆️
argsme/2020-04-01/touche-2021-task-1⬆️
beir
beir/arguana
beir/climate-fever
beir/cqadupstack/android
beir/cqadupstack/english
beir/cqadupstack/gaming
beir/cqadupstack/gis
beir/cqadupstack/mathematica
beir/cqadupstack/physics
beir/cqadupstack/programmers
beir/cqadupstack/stats
beir/cqadupstack/tex
beir/cqadupstack/unix
beir/cqadupstack/webmasters
beir/cqadupstack/wordpress
beir/dbpedia-entity
beir/dbpedia-entity/dev⬆️
beir/dbpedia-entity/test⬆️
beir/fever
beir/fever/dev⬆️
beir/fever/test⬆️
beir/fever/train⬆️
beir/fiqa
beir/fiqa/dev⬆️
beir/fiqa/test⬆️
beir/fiqa/train⬆️
beir/hotpotqa
beir/hotpotqa/dev⬆️
beir/hotpotqa/test⬆️
beir/hotpotqa/train⬆️
beir/msmarco
beir/msmarco/dev⬆️
beir/msmarco/test⬆️
beir/msmarco/train⬆️
beir/nfcorpus
beir/nfcorpus/dev⬆️
beir/nfcorpus/test⬆️
beir/nfcorpus/train⬆️
beir/nq
beir/quora
beir/quora/dev⬆️
beir/quora/test⬆️
beir/scidocs
beir/scifact
beir/scifact/test⬆️
beir/scifact/train⬆️
beir/trec-covid
beir/webis-touche2020
beir/webis-touche2020/v2
c4
c4/en-noclean-tr
c4/en-noclean-tr/trec-misinfo-2021⬆️
car
car/v1.5
car/v1.5/test200⬆️
car/v1.5/train/fold0⬆️
car/v1.5/train/fold1⬆️
car/v1.5/train/fold2⬆️
car/v1.5/train/fold3⬆️
car/v1.5/train/fold4⬆️
car/v1.5/trec-y1⬆️
car/v1.5/trec-y1/auto⬆️⬆️
car/v1.5/trec-y1/manual⬆️⬆️
car/v2.0
clinicaltrials
clinicaltrials/2017
clinicaltrials/2017/trec-pm-2017⬆️
clinicaltrials/2017/trec-pm-2018⬆️
clinicaltrials/2019
clinicaltrials/2019/trec-pm-2019⬆️
clinicaltrials/2021
clinicaltrials/2021/trec-ct-2021⬆️
clinicaltrials/2021/trec-ct-2022⬆️
clirmatrix
clueweb09⚠️
clueweb09/ar⚠️
clueweb09/catb⚠️
clueweb09/catb/trec-web-2009⬆️
clueweb09/catb/trec-web-2009/diversity⬆️
clueweb09/catb/trec-web-2010⬆️
clueweb09/catb/trec-web-2010/diversity⬆️
clueweb09/catb/trec-web-2011⬆️
clueweb09/catb/trec-web-2011/diversity⬆️
clueweb09/catb/trec-web-2012⬆️
clueweb09/catb/trec-web-2012/diversity⬆️
clueweb09/de⚠️
clueweb09/en⚠️
clueweb09/en/trec-web-2009⬆️
clueweb09/en/trec-web-2009/diversity⬆️
clueweb09/en/trec-web-2010⬆️
clueweb09/en/trec-web-2010/diversity⬆️
clueweb09/en/trec-web-2011⬆️
clueweb09/en/trec-web-2011/diversity⬆️
clueweb09/en/trec-web-2012⬆️
clueweb09/en/trec-web-2012/diversity⬆️
clueweb09/es⚠️
clueweb09/fr⚠️
clueweb09/it⚠️
clueweb09/ja⚠️
clueweb09/ko⚠️
clueweb09/pt⚠️
clueweb09/trec-mq-2009⬆️
clueweb09/zh⚠️
clueweb12⚠️
clueweb12/b13⚠️
clueweb12/b13/clef-ehealth⬆️
clueweb12/b13/clef-ehealth/cs⬆️
clueweb12/b13/clef-ehealth/de⬆️
clueweb12/b13/clef-ehealth/fr⬆️
clueweb12/b13/clef-ehealth/hu⬆️
clueweb12/b13/clef-ehealth/pl⬆️
clueweb12/b13/clef-ehealth/sv⬆️
clueweb12/b13/ntcir-www-1⬆️
clueweb12/b13/ntcir-www-2⬆️
clueweb12/b13/ntcir-www-3⬆️
clueweb12/b13/trec-misinfo-2019⬆️
clueweb12/touche-2020-task-2⬆️
clueweb12/touche-2021-task-2⬆️
clueweb12/trec-web-2013⬆️
clueweb12/trec-web-2013/diversity⬆️
clueweb12/trec-web-2014⬆️
clueweb12/trec-web-2014/diversity⬆️
codec⚠️
codec/economics⬆️
codec/history⬆️
codec/politics⬆️
codesearchnet
codesearchnet/challenge⬆️
codesearchnet/test⬆️
codesearchnet/train⬆️
codesearchnet/valid⬆️
cord19
cord19/fulltext
cord19/fulltext/trec-covid⬆️
cord19/trec-covid⬆️
cord19/trec-covid/round1
cord19/trec-covid/round2
cord19/trec-covid/round3
cord19/trec-covid/round4
cord19/trec-covid/round5⬆️⬆️
cranfield
disks45
disks45/nocr⚠️
disks45/nocr/trec-robust-2004⬆️
disks45/nocr/trec-robust-2004/fold1⬆️
disks45/nocr/trec-robust-2004/fold2⬆️
disks45/nocr/trec-robust-2004/fold3⬆️
disks45/nocr/trec-robust-2004/fold4⬆️
disks45/nocr/trec-robust-2004/fold5⬆️
disks45/nocr/trec7⬆️
disks45/nocr/trec8⬆️
dpr-w100
dpr-w100/natural-questions/dev⬆️
dpr-w100/natural-questions/train⬆️
dpr-w100/trivia-qa/dev⬆️
dpr-w100/trivia-qa/train⬆️
gov⚠️
gov/trec-web-2002⬆️
gov/trec-web-2002/named-page⬆️
gov/trec-web-2003⬆️
gov/trec-web-2003/named-page⬆️
gov/trec-web-2004⬆️
gov2⚠️
gov2/trec-mq-2007⬆️
gov2/trec-mq-2008⬆️
gov2/trec-tb-2004⬆️
gov2/trec-tb-2005⬆️
gov2/trec-tb-2005/efficiency⬆️
gov2/trec-tb-2005/named-page⬆️
gov2/trec-tb-2006⬆️
gov2/trec-tb-2006/efficiency⬆️
gov2/trec-tb-2006/efficiency/10k⬆️
gov2/trec-tb-2006/efficiency/stream1⬆️
gov2/trec-tb-2006/efficiency/stream2⬆️
gov2/trec-tb-2006/efficiency/stream3⬆️
gov2/trec-tb-2006/efficiency/stream4⬆️
gov2/trec-tb-2006/named-page⬆️
hc4
hc4/fa⚠️
hc4/fa/dev⬆️
hc4/fa/test⬆️
hc4/fa/train⬆️
hc4/ru⚠️
hc4/ru/dev⬆️
hc4/ru/test⬆️
hc4/ru/train⬆️
hc4/zh⚠️
hc4/zh/dev⬆️
hc4/zh/test⬆️
hc4/zh/train⬆️
highwire
highwire/trec-genomics-2006⬆️
highwire/trec-genomics-2007⬆️
istella22
istella22/test⬆️
istella22/test/fold1⬆️
istella22/test/fold2⬆️
istella22/test/fold3⬆️
istella22/test/fold4⬆️
istella22/test/fold5⬆️
kilt
kilt/codec⬆️
kilt/codec/economics⬆️
kilt/codec/history⬆️
kilt/codec/politics⬆️
lotte
lotte/lifestyle/dev
lotte/lifestyle/dev/forum⬆️
lotte/lifestyle/dev/search⬆️
lotte/lifestyle/test
lotte/lifestyle/test/forum⬆️
lotte/lifestyle/test/search⬆️
lotte/pooled/dev
lotte/pooled/dev/forum⬆️
lotte/pooled/dev/search⬆️
lotte/pooled/test
lotte/pooled/test/forum⬆️
lotte/pooled/test/search⬆️
lotte/recreation/dev
lotte/recreation/dev/forum⬆️
lotte/recreation/dev/search⬆️
lotte/recreation/test
lotte/recreation/test/forum⬆️
lotte/recreation/test/search⬆️
lotte/science/dev
lotte/science/dev/forum⬆️
lotte/science/dev/search⬆️
lotte/science/test
lotte/science/test/forum⬆️
lotte/science/test/search⬆️
lotte/technology/dev
lotte/technology/dev/forum⬆️
lotte/technology/dev/search⬆️
lotte/technology/test
lotte/technology/test/forum⬆️
lotte/technology/test/search⬆️
lotte/writing/dev
lotte/writing/dev/forum⬆️
lotte/writing/dev/search⬆️
lotte/writing/test
lotte/writing/test/forum⬆️
lotte/writing/test/search⬆️
medline
medline/2004
medline/2004/trec-genomics-2004⬆️
medline/2004/trec-genomics-2005⬆️
medline/2017
medline/2017/trec-pm-2017⬆️
medline/2017/trec-pm-2018⬆️
mmarco
mmarco/de
mmarco/de/dev⬆️
mmarco/de/dev/small⬆️
mmarco/de/train⬆️
mmarco/es
mmarco/es/dev⬆️
mmarco/es/dev/small⬆️
mmarco/es/train⬆️
mmarco/fr
mmarco/fr/dev⬆️
mmarco/fr/dev/small⬆️
mmarco/fr/train⬆️
mmarco/id
mmarco/id/dev⬆️
mmarco/id/dev/small⬆️
mmarco/id/train⬆️
mmarco/it
mmarco/it/dev⬆️
mmarco/it/dev/small⬆️
mmarco/it/train⬆️
mmarco/pt
mmarco/pt/dev⬆️
mmarco/pt/dev/small⬆️
mmarco/pt/dev/small/v1.1⬆️⬆️
mmarco/pt/dev/v1.1⬆️⬆️
mmarco/pt/train⬆️
mmarco/pt/train/v1.1⬆️⬆️⬆️
mmarco/ru
mmarco/ru/dev⬆️
mmarco/ru/dev/small⬆️
mmarco/ru/train⬆️
mmarco/v2/ar
mmarco/v2/ar/dev⬆️
mmarco/v2/ar/dev/small⬆️
mmarco/v2/ar/train⬆️
mmarco/v2/de
mmarco/v2/de/dev⬆️
mmarco/v2/de/dev/small⬆️
mmarco/v2/de/train⬆️
mmarco/v2/dt
mmarco/v2/dt/dev⬆️
mmarco/v2/dt/dev/small⬆️
mmarco/v2/dt/train⬆️
mmarco/v2/es
mmarco/v2/es/dev⬆️
mmarco/v2/es/dev/small⬆️
mmarco/v2/es/train⬆️
mmarco/v2/fr
mmarco/v2/fr/dev⬆️
mmarco/v2/fr/dev/small⬆️
mmarco/v2/fr/train⬆️
mmarco/v2/hi
mmarco/v2/hi/dev⬆️
mmarco/v2/hi/dev/small⬆️
mmarco/v2/hi/train⬆️
mmarco/v2/id
mmarco/v2/id/dev⬆️
mmarco/v2/id/dev/small⬆️
mmarco/v2/id/train⬆️
mmarco/v2/it
mmarco/v2/it/dev⬆️
mmarco/v2/it/dev/small⬆️
mmarco/v2/it/train⬆️
mmarco/v2/ja
mmarco/v2/ja/dev⬆️
mmarco/v2/ja/dev/small⬆️
mmarco/v2/ja/train⬆️
mmarco/v2/pt
mmarco/v2/pt/dev⬆️
mmarco/v2/pt/dev/small⬆️
mmarco/v2/pt/train⬆️
mmarco/v2/ru
mmarco/v2/ru/dev⬆️
mmarco/v2/ru/dev/small⬆️
mmarco/v2/ru/train⬆️
mmarco/v2/vi
mmarco/v2/vi/dev⬆️
mmarco/v2/vi/dev/small⬆️
mmarco/v2/vi/train⬆️
mmarco/v2/zh
mmarco/v2/zh/dev⬆️
mmarco/v2/zh/dev/small⬆️
mmarco/v2/zh/train⬆️
mmarco/zh
mmarco/zh/dev⬆️
mmarco/zh/dev/small⬆️
mmarco/zh/dev/small/v1.1⬆️⬆️
mmarco/zh/dev/v1.1⬆️⬆️
mmarco/zh/train⬆️
mr-tydi
mr-tydi/ar
mr-tydi/ar/dev⬆️
mr-tydi/ar/test⬆️
mr-tydi/ar/train⬆️
mr-tydi/bn
mr-tydi/bn/dev⬆️
mr-tydi/bn/test⬆️
mr-tydi/bn/train⬆️
mr-tydi/en
mr-tydi/en/dev⬆️
mr-tydi/en/test⬆️
mr-tydi/en/train⬆️
mr-tydi/fi
mr-tydi/fi/dev⬆️
mr-tydi/fi/test⬆️
mr-tydi/fi/train⬆️
mr-tydi/id
mr-tydi/id/dev⬆️
mr-tydi/id/test⬆️
mr-tydi/id/train⬆️
mr-tydi/ja
mr-tydi/ja/dev⬆️
mr-tydi/ja/test⬆️
mr-tydi/ja/train⬆️
mr-tydi/ko
mr-tydi/ko/dev⬆️
mr-tydi/ko/test⬆️
mr-tydi/ko/train⬆️
mr-tydi/ru
mr-tydi/ru/dev⬆️
mr-tydi/ru/test⬆️
mr-tydi/ru/train⬆️
mr-tydi/sw
mr-tydi/sw/dev⬆️
mr-tydi/sw/test⬆️
mr-tydi/sw/train⬆️
mr-tydi/te
mr-tydi/te/dev⬆️
mr-tydi/te/test⬆️
mr-tydi/te/train⬆️
mr-tydi/th
mr-tydi/th/dev⬆️
mr-tydi/th/test⬆️
mr-tydi/th/train⬆️
msmarco-document
msmarco-document/anchor-text
msmarco-document/dev⬆️
msmarco-document/eval⬆️
msmarco-document/orcas⬆️
msmarco-document/train⬆️
msmarco-document/trec-dl-2019⬆️
msmarco-document/trec-dl-2019/judged⬆️⬆️
msmarco-document/trec-dl-2020⬆️
msmarco-document/trec-dl-2020/judged⬆️⬆️
msmarco-document/trec-dl-hard⬆️
msmarco-document/trec-dl-hard/fold1⬆️
msmarco-document/trec-dl-hard/fold2⬆️
msmarco-document/trec-dl-hard/fold3⬆️
msmarco-document/trec-dl-hard/fold4⬆️
msmarco-document/trec-dl-hard/fold5⬆️
msmarco-document-v2
msmarco-document-v2/anchor-text
msmarco-document-v2/dev1⬆️
msmarco-document-v2/dev2⬆️
msmarco-document-v2/train⬆️
msmarco-document-v2/trec-dl-2019⬆️
msmarco-document-v2/trec-dl-2019/judged⬆️⬆️
msmarco-document-v2/trec-dl-2020⬆️
msmarco-document-v2/trec-dl-2020/judged⬆️⬆️
msmarco-document-v2/trec-dl-2021⬆️
msmarco-document-v2/trec-dl-2021/judged⬆️⬆️
msmarco-document-v2/trec-dl-2022⬆️
msmarco-passage
msmarco-passage/dev⬆️
msmarco-passage/dev/judged⬆️⬆️
msmarco-passage/dev/small⬆️
msmarco-passage/eval⬆️
msmarco-passage/eval/small⬆️
msmarco-passage/train⬆️
msmarco-passage/train/judged⬆️⬆️⬆️
msmarco-passage/train/medical⬆️
msmarco-passage/train/split200-train⬆️
msmarco-passage/train/split200-valid⬆️
msmarco-passage/train/triples-small⬆️⬆️⬆️⬆️
msmarco-passage/train/triples-v2⬆️⬆️⬆️⬆️
msmarco-passage/trec-dl-2019⬆️
msmarco-passage/trec-dl-2019/judged⬆️⬆️
msmarco-passage/trec-dl-2020⬆️
msmarco-passage/trec-dl-2020/judged⬆️⬆️
msmarco-passage/trec-dl-hard⬆️
msmarco-passage/trec-dl-hard/fold1⬆️
msmarco-passage/trec-dl-hard/fold2⬆️
msmarco-passage/trec-dl-hard/fold3⬆️
msmarco-passage/trec-dl-hard/fold4⬆️
msmarco-passage/trec-dl-hard/fold5⬆️
msmarco-passage-v2
msmarco-passage-v2/dev1⬆️
msmarco-passage-v2/dev2⬆️
msmarco-passage-v2/train⬆️
msmarco-passage-v2/trec-dl-2021⬆️
msmarco-passage-v2/trec-dl-2021/judged⬆️⬆️
msmarco-passage-v2/trec-dl-2022⬆️
msmarco-qna
msmarco-qna/dev⬆️
msmarco-qna/eval⬆️
msmarco-qna/train⬆️
natural-questions
natural-questions/dev⬆️
natural-questions/train⬆️
neuclir
neuclir/1
neuclir/1/fa⚠️
neuclir/1/fa/hc4-filtered⚠️
neuclir/1/ru⚠️
neuclir/1/ru/hc4-filtered⚠️
neuclir/1/zh⚠️
neuclir/1/zh/hc4-filtered⚠️
neumarco
neumarco/fa
neumarco/fa/dev⬆️
neumarco/fa/dev/judged⬆️⬆️
neumarco/fa/dev/small⬆️
neumarco/fa/train⬆️
neumarco/fa/train/judged⬆️⬆️⬆️
neumarco/ru
neumarco/ru/dev⬆️
neumarco/ru/dev/judged⬆️⬆️
neumarco/ru/dev/small⬆️
neumarco/ru/train⬆️
neumarco/ru/train/judged⬆️⬆️⬆️
neumarco/zh
neumarco/zh/dev⬆️
neumarco/zh/dev/judged⬆️⬆️
neumarco/zh/dev/small⬆️
neumarco/zh/train⬆️
neumarco/zh/train/judged⬆️⬆️⬆️
nfcorpus
nfcorpus/dev⬆️
nfcorpus/dev/nontopic⬆️
nfcorpus/dev/video⬆️
nfcorpus/test⬆️
nfcorpus/test/nontopic⬆️
nfcorpus/test/video⬆️
nfcorpus/train⬆️
nfcorpus/train/nontopic⬆️
nfcorpus/train/video⬆️
nyt⚠️
nyt/trec-core-2017⬆️
nyt/wksup⬆️⚠️⚠️
nyt/wksup/train⬆️⚠️⚠️
nyt/wksup/valid⬆️⚠️⚠️
pmc
pmc/v1
pmc/v1/trec-cds-2014⬆️
pmc/v1/trec-cds-2015⬆️
pmc/v2
pmc/v2/trec-cds-2016⬆️
trec-arabic⚠️
trec-arabic/ar2001⬆️
trec-arabic/ar2002⬆️
trec-cast
trec-cast/v0⚠️
trec-cast/v0/train⬆️
trec-cast/v0/train/judged⬆️⬆️
trec-cast/v1
trec-cast/v1/2019⬆️
trec-cast/v1/2019/judged⬆️⬆️
trec-cast/v1/2020⬆️
trec-cast/v1/2020/judged⬆️⬆️
trec-fair
trec-fair/2021
trec-fair/2021/eval⬆️
trec-fair/2021/train⬆️
trec-fair/2022
trec-fair/2022/train⬆️
trec-mandarin⚠️
trec-mandarin/trec5⬆️
trec-mandarin/trec6⬆️
trec-spanish⚠️
trec-spanish/trec3⬆️
trec-spanish/trec4⬆️
tripclick⚠️
tripclick/logs⚠️⚠️
tripclick/test⬆️⚠️⚠️
tripclick/test/head⬆️⚠️⚠️
tripclick/test/tail⬆️⚠️⚠️
tripclick/test/torso⬆️⚠️⚠️
tripclick/train⬆️⚠️⚠️⚠️
tripclick/train/head⬆️⚠️⚠️
tripclick/train/head/dctr⬆️⬆️⚠️
tripclick/train/hofstaetter-triples⬆️⬆️⬆️
tripclick/train/tail⬆️⚠️⚠️
tripclick/train/torso⬆️⚠️⚠️
tripclick/val⬆️⚠️⚠️⚠️
tripclick/val/head⬆️⚠️⚠️⚠️
tripclick/val/head/dctr⬆️⬆️⚠️⬆️
tripclick/val/tail⬆️⚠️⚠️⚠️
tripclick/val/torso⬆️⚠️⚠️⚠️
tweets2013-ia
tweets2013-ia/trec-mb-2013⬆️
tweets2013-ia/trec-mb-2014⬆️
vaswani
wapo
wapo/v2⚠️
wapo/v2/trec-core-2018⬆️
wapo/v2/trec-news-2018⬆️
wapo/v2/trec-news-2019⬆️
wapo/v3/trec-news-2020
wikiclir
wikiclir/ar
wikiclir/ca
wikiclir/cs
wikiclir/de
wikiclir/en-simple
wikiclir/es
wikiclir/fi
wikiclir/fr
wikiclir/it
wikiclir/ja
wikiclir/ko
wikiclir/nl
wikiclir/nn
wikiclir/no
wikiclir/pl
wikiclir/pt
wikiclir/ro
wikiclir/ru
wikiclir/sv
wikiclir/sw
wikiclir/tl
wikiclir/tr
wikiclir/uk
wikiclir/vi
wikiclir/zh
wikir
wikir/en1k
wikir/en1k/test⬆️
wikir/en1k/training⬆️
wikir/en1k/validation⬆️
wikir/en59k
wikir/en59k/test⬆️
wikir/en59k/training⬆️
wikir/en59k/validation⬆️
wikir/en78k
wikir/en78k/test⬆️
wikir/en78k/training⬆️
wikir/en78k/validation⬆️
wikir/ens78k
wikir/ens78k/test⬆️
wikir/ens78k/training⬆️
wikir/ens78k/validation⬆️
wikir/es13k
wikir/es13k/test⬆️
wikir/es13k/training⬆️
wikir/es13k/validation⬆️
wikir/fr14k
wikir/fr14k/test⬆️
wikir/fr14k/training⬆️
wikir/fr14k/validation⬆️
wikir/it16k
wikir/it16k/test⬆️
wikir/it16k/training⬆️
wikir/it16k/validation⬆️

Deprecated

These datasets have been deprecated. We keep them in the package for reproducibility, but better alternative dataset IDs exist (e.g., with improved corpus parsing).

trec-fair-2021, trec-fair-2021/eval, trec-fair-2021/train, trec-robust04, trec-robust04/fold1, trec-robust04/fold2, trec-robust04/fold3, trec-robust04/fold4, trec-robust04/fold5

Other Versions

Citation

When using datasets provided by this package, be sure to properly cite them. Bibtex for each dataset can be found on each dataset's documenation page.

If you use this tool, please cite our SIGIR resource paper:

@inproceedings{macavaney:sigir2021-irds, author = {MacAvaney, Sean and Yates, Andrew and Feldman, Sergey and Downey, Doug and Cohan, Arman and Goharian, Nazli}, title = {Simplified Data Wrangling with ir_datasets}, year = {2021}, booktitle = {SIGIR} }