← home
Github: allenai/ir_datasets

ir_datasets: Counts

Other formats: CSV, JSON

K: Thousand (×1,000)

M: Million (×1,000,000)

B: Billion (×1,000,000,000)

/q: Per query (value divided by query count)

Hover over number for exact count.

Dataset docs queries qrels /q scoreddocs /q docpairs /q qlogs
antique 404K
antique/test 404K 200  6.6K 32.9
antique/test/non-offensive 404K 176  5.8K 32.7
antique/train 404K 2.4K 27K 11.3
antique/train/split200-train 404K 2.2K 25K 11.3
antique/train/split200-valid 404K 200  2.2K 11.0
aol-ia 1.5M 10.0M 19M 2.0 36M
aquaint 1.0M
aquaint/trec-robust-2005 1.0M 50  38K 756.0
argsme
argsme/1.0 388K
argsme/1.0-cleaned 383K
argsme/1.0/touche-2020-task-1/uncorrected 388K 49  3.0K 60.5
argsme/2020-04-01 388K
argsme/2020-04-01/debateorg 339K
argsme/2020-04-01/debatepedia 21K
argsme/2020-04-01/debatewise 14K
argsme/2020-04-01/idebate 14K
argsme/2020-04-01/parliamentary 48 
argsme/2020-04-01/touche-2020-task-1 388K 49  2.3K 46.9
argsme/2020-04-01/touche-2020-task-1/uncorrected 388K 49  2.3K 46.9
argsme/2020-04-01/touche-2021-task-1 388K 50  3.7K 74.2
beir
beir/arguana 8.7K 1.4K 1.4K 1.0
beir/climate-fever 5.4M 1.5K 4.7K 3.0
beir/cqadupstack/android 23K 699  1.7K 2.4
beir/cqadupstack/english 40K 1.6K 3.8K 2.4
beir/cqadupstack/gaming 45K 1.6K 2.3K 1.4
beir/cqadupstack/gis 38K 885  1.1K 1.3
beir/cqadupstack/mathematica 17K 804  1.4K 1.7
beir/cqadupstack/physics 38K 1.0K 1.9K 1.9
beir/cqadupstack/programmers 32K 876  1.7K 1.9
beir/cqadupstack/stats 42K 652  913  1.4
beir/cqadupstack/tex 68K 2.9K 5.2K 1.8
beir/cqadupstack/unix 47K 1.1K 1.7K 1.6
beir/cqadupstack/webmasters 17K 506  1.4K 2.8
beir/cqadupstack/wordpress 49K 541  744  1.4
beir/dbpedia-entity 4.6M 467 
beir/dbpedia-entity/dev 4.6M 67  5.7K 84.7
beir/dbpedia-entity/test 4.6M 400  44K 108.8
beir/fever 5.4M 123K
beir/fever/dev 5.4M 6.7K 8.1K 1.2
beir/fever/test 5.4M 6.7K 7.9K 1.2
beir/fever/train 5.4M 110K 140K 1.3
beir/fiqa 58K 6.6K
beir/fiqa/dev 58K 500  1.2K 2.5
beir/fiqa/test 58K 648  1.7K 2.6
beir/fiqa/train 58K 5.5K 14K 2.6
beir/hotpotqa 5.2M 98K
beir/hotpotqa/dev 5.2M 5.4K 11K 2.0
beir/hotpotqa/test 5.2M 7.4K 15K 2.0
beir/hotpotqa/train 5.2M 85K 170K 2.0
beir/msmarco 8.8M 510K
beir/msmarco/dev 8.8M 7.0K 7.4K 1.1
beir/msmarco/test 8.8M 43  9.3K 215.3
beir/msmarco/train 8.8M 503K 533K 1.1
beir/nfcorpus 3.6K 3.2K
beir/nfcorpus/dev 3.6K 324  11K 35.1
beir/nfcorpus/test 3.6K 323  12K 38.2
beir/nfcorpus/train 3.6K 2.6K 111K 42.7
beir/nq 2.7M 3.5K 4.2K 1.2
beir/quora 523K 15K
beir/quora/dev 523K 5.0K 7.6K 1.5
beir/quora/test 523K 10K 16K 1.6
beir/scidocs 26K 1.0K 30K 29.9
beir/scifact 5.2K 1.1K
beir/scifact/test 5.2K 300  339  1.1
beir/scifact/train 5.2K 809  919  1.1
beir/trec-covid 171K 50  66K 1326.7
beir/webis-touche2020 383K 49  3.0K 60.4
beir/webis-touche2020/v2 383K 49  2.2K 45.2
c4
c4/en-noclean-tr 1.1B
c4/en-noclean-tr/trec-misinfo-2021 1.1B 50 
car
car/v1.5 30M
car/v1.5/test200 30M 2.0K 4.7K 2.4
car/v1.5/train/fold0 30M 468K 1.1M 2.3
car/v1.5/train/fold1 30M 467K 1.1M 2.3
car/v1.5/train/fold2 30M 469K 1.1M 2.3
car/v1.5/train/fold3 30M 463K 1.0M 2.3
car/v1.5/train/fold4 30M 469K 1.1M 2.3
car/v1.5/trec-y1 30M 2.3K
car/v1.5/trec-y1/auto 30M 2.3K 5.8K 2.5
car/v1.5/trec-y1/manual 30M 2.3K 30K 12.9
car/v2.0 30M
clinicaltrials
clinicaltrials/2017 241K
clinicaltrials/2017/trec-pm-2017 241K 30  13K 434.0
clinicaltrials/2017/trec-pm-2018 241K 50  14K 283.8
clinicaltrials/2019 306K
clinicaltrials/2019/trec-pm-2019 306K 40  13K 324.9
clinicaltrials/2021 376K
clinicaltrials/2021/trec-ct-2021 376K 75  36K 477.8
clirmatrix
clueweb09 1.0B
clueweb09/ar 29M
clueweb09/catb 50M
clueweb09/catb/trec-web-2009 50M 50  13K 262.4
clueweb09/catb/trec-web-2010 50M 50  16K 316.9
clueweb09/catb/trec-web-2011 50M 50  13K 261.6
clueweb09/catb/trec-web-2012 50M 50  10K 200.4
clueweb09/de 50M
clueweb09/en 504M
clueweb09/en/trec-web-2009 504M 50  24K 472.0
clueweb09/en/trec-web-2010 504M 50  25K 506.6
clueweb09/en/trec-web-2011 504M 50  19K 387.6
clueweb09/en/trec-web-2012 504M 50  16K 321.1
clueweb09/es 79M
clueweb09/fr 51M
clueweb09/it 27M
clueweb09/ja 67M
clueweb09/ko 18M
clueweb09/pt 38M
clueweb09/trec-mq-2009 1.0B 40K 35K 0.9
clueweb09/zh 177M
clueweb12 733M
clueweb12/b13 52M
clueweb12/b13/clef-ehealth 52M 300  269K 897.4
clueweb12/b13/clef-ehealth/cs 52M 300  269K 897.4
clueweb12/b13/clef-ehealth/de 52M 300  269K 897.4
clueweb12/b13/clef-ehealth/fr 52M 300  269K 897.4
clueweb12/b13/clef-ehealth/hu 52M 300  269K 897.4
clueweb12/b13/clef-ehealth/pl 52M 300  269K 897.4
clueweb12/b13/clef-ehealth/sv 52M 300  269K 897.4
clueweb12/b13/ntcir-www-1 52M 100  25K 254.7
clueweb12/b13/ntcir-www-2 52M 80  28K 345.3
clueweb12/b13/ntcir-www-3 52M 160 
clueweb12/b13/trec-misinfo-2019 52M 51  23K 448.2
clueweb12/touche-2020-task-2 733M 50  1.8K 35.7
clueweb12/touche-2021-task-2 733M 50  2.1K 41.5
clueweb12/trec-web-2013 733M 50  14K 289.5
clueweb12/trec-web-2014 733M 50  14K 288.6
codec 36  5.1K 142.5
codec/economics 12  1.6K 132.7
codec/history 12  1.7K 141.2
codec/politics 12  1.8K 153.6
codesearchnet 2.1M
codesearchnet/challenge 2.1M 99  4.0K 40.5
codesearchnet/test 2.1M 101K 101K 1.0
codesearchnet/train 2.1M 1.9M 1.9M 1.0
codesearchnet/valid 2.1M 89K 89K 1.0
cord19 193K
cord19/fulltext 193K
cord19/fulltext/trec-covid 193K 50  69K 1386.4
cord19/trec-covid 193K 50  69K 1386.4
cord19/trec-covid/round1 51K 30  8.7K 289.7
cord19/trec-covid/round2 60K 35  12K 343.9
cord19/trec-covid/round3 128K 40  13K 317.8
cord19/trec-covid/round4 158K 45  13K 294.7
cord19/trec-covid/round5 193K 50  23K 463.0
cranfield 1.4K 225  1.8K 8.2
disks45
disks45/nocr 528K
disks45/nocr/trec-robust-2004 528K 250  311K 1245.6
disks45/nocr/trec-robust-2004/fold1 528K 50  63K 1255.8
disks45/nocr/trec-robust-2004/fold2 528K 50  64K 1278.3
disks45/nocr/trec-robust-2004/fold3 528K 50  63K 1258.0
disks45/nocr/trec-robust-2004/fold4 528K 50  58K 1159.2
disks45/nocr/trec-robust-2004/fold5 528K 50  64K 1276.8
disks45/nocr/trec7 528K 50  80K 1606.9
disks45/nocr/trec8 528K 50  87K 1736.6
dpr-w100 21M
dpr-w100/natural-questions/dev 21M 6.5K 980K 150.4
dpr-w100/natural-questions/train 21M 59K 8.9M 150.4
dpr-w100/trivia-qa/dev 21M 8.8K 884K 100.0
dpr-w100/trivia-qa/train 21M 79K 7.9M 100.0
gov 1.2M
gov/trec-web-2002 1.2M 50  57K 1133.0
gov/trec-web-2002/named-page 1.2M 150  170  1.1
gov/trec-web-2003 1.2M 50  51K 1021.2
gov/trec-web-2003/named-page 1.2M 300  352  1.2
gov/trec-web-2004 1.2M 225  89K 393.6
gov2 25M
gov2/trec-mq-2007 25M 10K 73K 7.3
gov2/trec-mq-2008 25M 10K 15K 1.5
gov2/trec-tb-2004 25M 50  58K 1161.5
gov2/trec-tb-2005 25M 50  45K 905.8
gov2/trec-tb-2005/efficiency 25M 50K 45K 0.9
gov2/trec-tb-2005/named-page 25M 252  12K 46.5
gov2/trec-tb-2006 25M 50  32K 639.7
gov2/trec-tb-2006/efficiency 25M 100K 32K 0.3
gov2/trec-tb-2006/efficiency/10k 25M 10K
gov2/trec-tb-2006/efficiency/stream1 25M 25K
gov2/trec-tb-2006/efficiency/stream2 25M 25K
gov2/trec-tb-2006/efficiency/stream3 25M 25K 32K 1.3
gov2/trec-tb-2006/efficiency/stream4 25M 25K
gov2/trec-tb-2006/named-page 25M 181  2.4K 13.0
hc4
hc4/fa 486K
hc4/fa/dev 486K 10  565  56.5
hc4/fa/test 486K 50  2.5K 50.4
hc4/fa/train 486K 8  112  14.0
hc4/ru 4.7M
hc4/ru/dev 4.7M 4  265  66.2
hc4/ru/test 4.7M 50  3.0K 59.4
hc4/ru/train 4.7M 7  92  13.1
hc4/zh 646K
hc4/zh/dev 646K 10  466  46.6
hc4/zh/test 646K 50  2.8K 55.0
hc4/zh/train 646K 23  341  14.8
highwire 162K
highwire/trec-genomics-2006 162K 28  28K 1000.0
highwire/trec-genomics-2007 162K 36  36K 999.9
kilt 5.9M
kilt/codec 5.9M 36  10K 284.4
kilt/codec/economics 5.9M 12  1.6K 132.7
kilt/codec/history 5.9M 12  1.7K 141.2
kilt/codec/politics 5.9M 12  1.8K 153.6
lotte
lotte/lifestyle/dev 269K
lotte/lifestyle/dev/forum 269K 2.1K 13K 6.2
lotte/lifestyle/dev/search 269K 417  1.4K 3.3
lotte/lifestyle/test 119K
lotte/lifestyle/test/forum 119K 2.0K 10K 5.1
lotte/lifestyle/test/search 119K 661  1.8K 2.7
lotte/pooled/dev 2.4M
lotte/pooled/dev/forum 2.4M 10K 69K 6.8
lotte/pooled/dev/search 2.4M 2.9K 8.6K 2.9
lotte/pooled/test 2.8M
lotte/pooled/test/forum 2.8M 10K 62K 6.1
lotte/pooled/test/search 2.8M 3.9K 11K 2.9
lotte/recreation/dev 263K
lotte/recreation/dev/forum 263K 2.0K 13K 6.4
lotte/recreation/dev/search 263K 563  1.8K 3.1
lotte/recreation/test 167K
lotte/recreation/test/forum 167K 2.0K 6.9K 3.5
lotte/recreation/test/search 167K 924  2.0K 2.2
lotte/science/dev 344K
lotte/science/dev/forum 344K 2.0K 12K 6.1
lotte/science/dev/search 344K 538  1.5K 2.8
lotte/science/test 1.7M
lotte/science/test/forum 1.7M 2.0K 16K 7.7
lotte/science/test/search 1.7M 617  1.7K 2.8
lotte/technology/dev 1.3M
lotte/technology/dev/forum 1.3M 2.0K 16K 7.9
lotte/technology/dev/search 1.3M 916  2.7K 2.9
lotte/technology/test 639K
lotte/technology/test/forum 639K 2.0K 16K 7.9
lotte/technology/test/search 639K 596  2.0K 3.4
lotte/writing/dev 277K
lotte/writing/dev/forum 277K 2.0K 15K 7.5
lotte/writing/dev/search 277K 497  1.3K 2.6
lotte/writing/test 200K
lotte/writing/test/forum 200K 2.0K 13K 6.5
lotte/writing/test/search 200K 1.1K 3.5K 3.3
medline
medline/2004 3.7M
medline/2004/trec-genomics-2004 3.7M 50  8.3K 165.4
medline/2004/trec-genomics-2005 3.7M 50  40K 799.2
medline/2017 27M
medline/2017/trec-pm-2017 27M 30  23K 754.7
medline/2017/trec-pm-2018 27M 50  22K 448.6
mmarco
mmarco/de 8.8M
mmarco/de/dev 8.8M 101K 59K 0.6
mmarco/de/dev/small 8.8M 7.0K 7.4K 1.1 6.6M 944.7
mmarco/de/train 8.8M 809K 533K 0.7 40M 49.2
mmarco/es 8.8M
mmarco/es/dev 8.8M 101K 59K 0.6
mmarco/es/dev/small 8.8M 7.0K 7.4K 1.1 6.8M 972.3
mmarco/es/train 8.8M 809K 533K 0.7 40M 49.2
mmarco/fr 8.8M
mmarco/fr/dev 8.8M 101K 59K 0.6
mmarco/fr/dev/small 8.8M 7.0K 7.4K 1.1 6.8M 972.2
mmarco/fr/train 8.8M 809K 533K 0.7 40M 49.2
mmarco/id 8.8M
mmarco/id/dev 8.8M 101K 59K 0.6
mmarco/id/dev/small 8.8M 7.0K 7.4K 1.1 6.8M 980.2
mmarco/id/train 8.8M 809K 533K 0.7 40M 49.2
mmarco/it 8.8M
mmarco/it/dev 8.8M 101K 59K 0.6
mmarco/it/dev/small 8.8M 7.0K 7.4K 1.1 7.0M 998.1
mmarco/it/train 8.8M 809K 533K 0.7 40M 49.2
mmarco/pt 8.8M
mmarco/pt/dev 8.8M 102K 59K 0.6
mmarco/pt/dev/small 8.8M 7.0K 7.4K 1.1
mmarco/pt/dev/small/v1.1 8.8M 7.0K 7.4K 1.1 7.0M 999.5
mmarco/pt/dev/v1.1 8.8M 101K 59K 0.6
mmarco/pt/train 8.8M 812K 533K 0.7 40M 49.0
mmarco/pt/train/v1.1 8.8M 809K 533K 0.7 40M 49.2
mmarco/ru 8.8M
mmarco/ru/dev 8.8M 101K 59K 0.6
mmarco/ru/dev/small 8.8M 7.0K 7.4K 1.1 7.0M 997.0
mmarco/ru/train 8.8M 809K 533K 0.7 40M 49.2
mmarco/v2/ar 8.8M
mmarco/v2/ar/dev 8.8M 101K 59K 0.6
mmarco/v2/ar/dev/small 8.8M 7.0K 7.4K 1.1 6.8M 981.2
mmarco/v2/ar/train 8.8M 809K 533K 0.7 40M 49.2
mmarco/v2/de 8.8M
mmarco/v2/de/dev 8.8M 101K 59K 0.6
mmarco/v2/de/dev/small 8.8M 7.0K 7.4K 1.1 6.6M 943.7
mmarco/v2/de/train 8.8M 809K 533K 0.7 40M 49.2
mmarco/v2/dt 8.8M
mmarco/v2/dt/dev 8.8M 101K 59K 0.6
mmarco/v2/dt/dev/small 8.8M 7.0K 7.4K 1.1 6.6M 946.7
mmarco/v2/dt/train 8.8M 809K 533K 0.7 40M 49.2
mmarco/v2/es 8.8M
mmarco/v2/es/dev 8.8M 101K 59K 0.6
mmarco/v2/es/dev/small 8.8M 7.0K 7.4K 1.1 6.8M 970.9
mmarco/v2/es/train 8.8M 809K 533K 0.7 40M 49.2
mmarco/v2/fr 8.8M
mmarco/v2/fr/dev 8.8M 101K 59K 0.6
mmarco/v2/fr/dev/small 8.8M 7.0K 7.4K 1.1 6.8M 978.8
mmarco/v2/fr/train 8.8M 809K 533K 0.7 40M 49.2
mmarco/v2/hi 8.8M
mmarco/v2/hi/dev 8.8M 101K 59K 0.6
mmarco/v2/hi/dev/small 8.8M 7.0K 7.4K 1.1 7.0M 997.4
mmarco/v2/hi/train 8.8M 809K 533K 0.7 40M 49.2
mmarco/v2/id 8.8M
mmarco/v2/id/dev 8.8M 101K 59K 0.6
mmarco/v2/id/dev/small 8.8M 7.0K 7.4K 1.1 6.8M 973.0
mmarco/v2/id/train 8.8M 809K 533K 0.7 40M 49.2
mmarco/v2/it 8.8M
mmarco/v2/it/dev 8.8M 101K 59K 0.6
mmarco/v2/it/dev/small 8.8M 7.0K 7.4K 1.1 7.0M 996.1
mmarco/v2/it/train 8.8M 809K 533K 0.7 40M 49.2
mmarco/v2/ja 8.8M
mmarco/v2/ja/dev 8.8M 101K 59K 0.6
mmarco/v2/ja/dev/small 8.8M 7.0K 7.4K 1.1 6.8M 976.7
mmarco/v2/ja/train 8.8M 809K 533K 0.7 40M 49.2
mmarco/v2/pt 8.8M
mmarco/v2/pt/dev 8.8M 101K 59K 0.6
mmarco/v2/pt/dev/small 8.8M 7.0K 7.4K 1.1 7.0M 999.3
mmarco/v2/pt/train 8.8M 809K 533K 0.7 40M 49.2
mmarco/v2/ru 8.8M
mmarco/v2/ru/dev 8.8M 101K 59K 0.6
mmarco/v2/ru/dev/small 8.8M 7.0K 7.4K 1.1 6.9M 993.1
mmarco/v2/ru/train 8.8M 809K 533K 0.7 40M 49.2
mmarco/v2/vi 8.8M
mmarco/v2/vi/dev 8.8M 101K 59K 0.6
mmarco/v2/vi/dev/small 8.8M 7.0K 7.4K 1.1 7.0M 999.5
mmarco/v2/vi/train 8.8M 809K 533K 0.7 40M 49.2
mmarco/v2/zh 8.8M
mmarco/v2/zh/dev 8.8M 101K 59K 0.6
mmarco/v2/zh/dev/small 8.8M 7.0K 7.4K 1.1 7.0M 999.9
mmarco/v2/zh/train 8.8M 809K 533K 0.7 40M 49.2
mmarco/zh 8.8M
mmarco/zh/dev 8.8M 101K 59K 0.6
mmarco/zh/dev/small 8.8M 7.0K 7.4K 1.1
mmarco/zh/dev/small/v1.1 8.8M 7.0K 7.4K 1.1 1.0M 148.2
mmarco/zh/dev/v1.1 8.8M 101K 59K 0.6
mmarco/zh/train 8.8M 809K 533K 0.7 40M 49.2
mr-tydi
mr-tydi/ar 2.1M 17K 17K 1.0
mr-tydi/ar/dev 2.1M 3.1K 3.1K 1.0
mr-tydi/ar/test 2.1M 1.1K 1.3K 1.2
mr-tydi/ar/train 2.1M 12K 12K 1.0
mr-tydi/bn 304K 2.3K 2.3K 1.0
mr-tydi/bn/dev 304K 440  443  1.0
mr-tydi/bn/test 304K 111  130  1.2
mr-tydi/bn/train 304K 1.7K 1.7K 1.0
mr-tydi/en 33M 5.2K 5.4K 1.0
mr-tydi/en/dev 33M 878  878  1.0
mr-tydi/en/test 33M 744  935  1.3
mr-tydi/en/train 33M 3.5K 3.5K 1.0
mr-tydi/fi 1.9M 9.6K 9.8K 1.0
mr-tydi/fi/dev 1.9M 1.7K 1.7K 1.0
mr-tydi/fi/test 1.9M 1.3K 1.5K 1.2
mr-tydi/fi/train 1.9M 6.6K 6.6K 1.0
mr-tydi/id 1.5M 7.0K 7.1K 1.0
mr-tydi/id/dev 1.5M 1.2K 1.2K 1.0
mr-tydi/id/test 1.5M 829  961  1.2
mr-tydi/id/train 1.5M 4.9K 4.9K 1.0
mr-tydi/ja 7.0M 5.4K 5.5K 1.0
mr-tydi/ja/dev 7.0M 928  928  1.0
mr-tydi/ja/test 7.0M 720  923  1.3
mr-tydi/ja/train 7.0M 3.7K 3.7K 1.0
mr-tydi/ko 1.5M 2.0K 2.1K 1.0
mr-tydi/ko/dev 1.5M 303  307  1.0
mr-tydi/ko/test 1.5M 421  492  1.2
mr-tydi/ko/train 1.5M 1.3K 1.3K 1.0
mr-tydi/ru 9.6M 7.8K 7.9K 1.0
mr-tydi/ru/dev 9.6M 1.4K 1.4K 1.0
mr-tydi/ru/test 9.6M 995  1.2K 1.2
mr-tydi/ru/train 9.6M 5.4K 5.4K 1.0
mr-tydi/sw 137K 3.3K 3.8K 1.2
mr-tydi/sw/dev 137K 526  623  1.2
mr-tydi/sw/test 137K 670  743  1.1
mr-tydi/sw/train 137K 2.1K 2.4K 1.2
mr-tydi/te 548K 5.5K 5.5K 1.0
mr-tydi/te/dev 548K 983  983  1.0
mr-tydi/te/test 548K 646  677  1.0
mr-tydi/te/train 548K 3.9K 3.9K 1.0
mr-tydi/th 569K 5.3K 5.5K 1.0
mr-tydi/th/dev 569K 807  817  1.0
mr-tydi/th/test 569K 1.2K 1.4K 1.1
mr-tydi/th/train 569K 3.3K 3.4K 1.0
msmarco-document 3.2M
msmarco-document/anchor-text 1.7M
msmarco-document/dev 3.2M 5.2K 5.2K 1.0 519K 100.0
msmarco-document/eval 3.2M 5.8K 579K 100.0
msmarco-document/orcas 3.2M 10M 19M 1.8 983M 94.5
msmarco-document/train 3.2M 367K 367K 1.0 37M 100.0
msmarco-document/trec-dl-2019 3.2M 200  16K 81.3 20K 100.0
msmarco-document/trec-dl-2019/judged 3.2M 43  16K 378.1 4.3K 100.0
msmarco-document/trec-dl-2020 3.2M 200  9.1K 45.5 20K 100.0
msmarco-document/trec-dl-2020/judged 3.2M 45  9.1K 202.2 4.5K 100.0
msmarco-document/trec-dl-hard 3.2M 50  8.5K 170.9
msmarco-document/trec-dl-hard/fold1 3.2M 10  1.6K 155.7
msmarco-document/trec-dl-hard/fold2 3.2M 10  1.3K 134.5
msmarco-document/trec-dl-hard/fold3 3.2M 10  474  47.4
msmarco-document/trec-dl-hard/fold4 3.2M 10  1.1K 105.4
msmarco-document/trec-dl-hard/fold5 3.2M 10  4.1K 411.4
msmarco-document-v2 12M
msmarco-document-v2/anchor-text 4.8M
msmarco-document-v2/dev1 12M 4.6K 4.7K 1.0 455K 100.0
msmarco-document-v2/dev2 12M 5.0K 5.2K 1.0 500K 100.0
msmarco-document-v2/train 12M 322K 332K 1.0 32M 100.0
msmarco-document-v2/trec-dl-2019 12M 200  14K 69.7
msmarco-document-v2/trec-dl-2019/judged 12M 43  14K 324.2
msmarco-document-v2/trec-dl-2020 12M 200  7.9K 39.7
msmarco-document-v2/trec-dl-2020/judged 12M 45  7.9K 176.5
msmarco-document-v2/trec-dl-2021 12M 477  13K 27.4 48K 100.0
msmarco-document-v2/trec-dl-2021/judged 12M 57  13K 229.1 5.7K 100.0
msmarco-passage 8.8M
msmarco-passage/dev 8.8M 101K 59K 0.6
msmarco-passage/dev/judged 8.8M 56K 59K 1.1
msmarco-passage/dev/small 8.8M 7.0K 7.4K 1.1 6.7M 955.4
msmarco-passage/eval 8.8M 101K
msmarco-passage/eval/small 8.8M 6.8K 6.5M 953.0
msmarco-passage/train 8.8M 809K 533K 0.7 478M 591.1 270M 333.8
msmarco-passage/train/judged 8.8M 503K 533K 1.1 478M 950.4 270M 536.7
msmarco-passage/train/medical 8.8M 79K 55K 0.7 49M 619.2 29M 367.2
msmarco-passage/train/split200-train 8.8M 809K 533K 0.7 478M 591.1 270M 333.8
msmarco-passage/train/split200-valid 8.8M 200  131  0.7 119K 595.1 64K 320.8
msmarco-passage/train/triples-small 8.8M 809K 533K 0.7 478M 591.1 40M 49.2
msmarco-passage/train/triples-v2 8.8M 809K 533K 0.7 478M 591.1 398M 491.8
msmarco-passage/trec-dl-2019 8.8M 200  9.3K 46.3 190K 949.4
msmarco-passage/trec-dl-2019/judged 8.8M 43  9.3K 215.3 41K 954.5
msmarco-passage/trec-dl-2020 8.8M 200  11K 56.9 191K 953.5
msmarco-passage/trec-dl-2020/judged 8.8M 54  11K 210.9 50K 926.4
msmarco-passage/trec-dl-hard 8.8M 50  4.3K 85.1
msmarco-passage/trec-dl-hard/fold1 8.8M 10  1.1K 107.2
msmarco-passage/trec-dl-hard/fold2 8.8M 10  898  89.8
msmarco-passage/trec-dl-hard/fold3 8.8M 10  444  44.4
msmarco-passage/trec-dl-hard/fold4 8.8M 10  716  71.6
msmarco-passage/trec-dl-hard/fold5 8.8M 10  1.1K 112.6
msmarco-passage-v2 138M
msmarco-passage-v2/dev1 138M 3.9K 4.0K 1.0 390K 100.0
msmarco-passage-v2/dev2 138M 4.3K 4.4K 1.0 428K 100.0
msmarco-passage-v2/train 138M 277K 284K 1.0 28M 100.0
msmarco-passage-v2/trec-dl-2021 138M 477  11K 22.7 48K 100.0
msmarco-passage-v2/trec-dl-2021/judged 138M 53  11K 204.3 5.3K 100.0
msmarco-qna 9.0M
msmarco-qna/dev 9.0M 101K 1.0M 10.0 1.0M 10.0
msmarco-qna/eval 9.0M 101K 1.0M 10.0
msmarco-qna/train 9.0M 809K 8.1M 10.0 8.1M 10.0
natural-questions 28M
natural-questions/dev 28M 7.8K 7.7K 1.0 973K 124.3
natural-questions/train 28M 307K 152K 0.5 40M 131.4
neuclir
neuclir/1
neuclir/1/fa 2.2M
neuclir/1/fa/hc4-filtered 392K 60  3.1K 51.5
neuclir/1/ru 4.6M
neuclir/1/ru/hc4-filtered 965K 54  3.2K 59.9
neuclir/1/zh 3.2M
neuclir/1/zh/hc4-filtered 520K 60  3.2K 53.6
neumarco
neumarco/fa 8.8M
neumarco/fa/dev 8.8M 101K 59K 0.6
neumarco/fa/dev/judged 8.8M 56K 59K 1.1
neumarco/fa/dev/small 8.8M 7.0K 7.4K 1.1
neumarco/fa/train 8.8M 809K 533K 0.7 270M 333.8
neumarco/fa/train/judged 8.8M 503K 533K 1.1 270M 536.7
neumarco/ru 8.8M
neumarco/ru/dev 8.8M 101K 59K 0.6
neumarco/ru/dev/judged 8.8M 56K 59K 1.1
neumarco/ru/dev/small 8.8M 7.0K 7.4K 1.1
neumarco/ru/train 8.8M 809K 533K 0.7 270M 333.8
neumarco/ru/train/judged 8.8M 503K 533K 1.1 270M 536.7
neumarco/zh 8.8M
neumarco/zh/dev 8.8M 101K 59K 0.6
neumarco/zh/dev/judged 8.8M 56K 59K 1.1
neumarco/zh/dev/small 8.8M 7.0K 7.4K 1.1
neumarco/zh/train 8.8M 809K 533K 0.7 270M 333.8
neumarco/zh/train/judged 8.8M 503K 533K 1.1 270M 536.7
nfcorpus 5.4K
nfcorpus/dev 5.4K 325  15K 44.9
nfcorpus/dev/nontopic 5.4K 144  4.4K 30.2
nfcorpus/dev/video 5.4K 102  3.1K 30.1
nfcorpus/test 5.4K 325  16K 48.7
nfcorpus/test/nontopic 5.4K 144  4.5K 31.5
nfcorpus/test/video 5.4K 102  3.1K 30.5
nfcorpus/train 5.4K 2.6K 139K 53.7
nfcorpus/train/nontopic 5.4K 1.1K 37K 32.8
nfcorpus/train/video 5.4K 812  27K 33.8
nyt 1.9M
nyt/trec-core-2017 1.9M 50  30K 600.6
nyt/wksup 1.9M 1.9M 1.9M 1.0
nyt/wksup/train 1.9M 1.9M 1.9M 1.0
nyt/wksup/valid 1.9M 1.0K 1.0K 1.0
pmc
pmc/v1 733K
pmc/v1/trec-cds-2014 733K 30  38K 1265.0
pmc/v1/trec-cds-2015 733K 30  38K 1260.2
pmc/v2 1.3M
pmc/v2/trec-cds-2016 1.3M 30  38K 1256.9
trec-arabic 384K
trec-arabic/ar2001 384K 25  23K 909.8
trec-arabic/ar2002 384K 50  38K 768.6
trec-cast
trec-cast/v0 48M
trec-cast/v0/train 48M 269  2.4K 8.9 269K 1000.0
trec-cast/v0/train/judged 48M 120  2.4K 20.0 120K 1000.0
trec-cast/v1 39M
trec-cast/v1/2019 39M 479  29K 61.3 479K 1000.0
trec-cast/v1/2019/judged 39M 173  29K 169.7 173K 1000.0
trec-cast/v1/2020 39M 216  40K 187.3
trec-cast/v1/2020/judged 39M 208  40K 194.5
trec-fair-2021 6.3M
trec-fair-2021/eval 6.3M 49 
trec-fair-2021/train 6.3M 57  2.2M 38341.2
trec-mandarin 165K
trec-mandarin/trec5 165K 28  16K 556.7
trec-mandarin/trec6 165K 26  9.2K 355.2
trec-robust04 528K 250  311K 1245.6
trec-robust04/fold1 528K 50  63K 1255.8
trec-robust04/fold2 528K 50  64K 1278.3
trec-robust04/fold3 528K 50  63K 1258.0
trec-robust04/fold4 528K 50  58K 1159.2
trec-robust04/fold5 528K 50  64K 1276.8
trec-spanish 121K
trec-spanish/trec3 121K 25  19K 760.2
trec-spanish/trec4 121K 25  13K 524.4
tripclick 1.5M
tripclick/logs 5.2M 5.3M
tripclick/test 1.5M 3.5K 3.5M 989.1
tripclick/test/head 1.5M 1.2K 1.2M 986.6
tripclick/test/tail 1.5M 1.2K 1.2M 991.6
tripclick/test/torso 1.5M 1.2K 1.2M 988.9
tripclick/train 1.5M 686K 2.7M 3.9 23M 33.9
tripclick/train/head 1.5M 3.5K 117K 33.1
tripclick/train/head/dctr 1.5M 3.5K 128K 36.4
tripclick/train/hofstaetter-triples 1.5M 686K 2.7M 3.9 10M 14.6
tripclick/train/tail 1.5M 576K 1.6M 2.8
tripclick/train/torso 1.5M 106K 967K 9.1
tripclick/val 1.5M 3.5K 82K 23.4 3.5M 993.8
tripclick/val/head 1.5M 1.2K 64K 54.8 1.2M 993.0
tripclick/val/head/dctr 1.5M 1.2K 67K 56.9 1.2M 993.0
tripclick/val/tail 1.5M 1.2K 3.9K 3.3 1.2M 992.5
tripclick/val/torso 1.5M 1.2K 14K 12.0 1.2M 996.0
tweets2013-ia 253M
tweets2013-ia/trec-mb-2013 253M 60  71K 1188.0
tweets2013-ia/trec-mb-2014 253M 55  58K 1054.3
vaswani 11K 93  2.1K 22.4
wapo
wapo/v2 595K
wapo/v2/trec-core-2018 595K 50  26K 524.7
wapo/v2/trec-news-2018 595K 50  8.5K 170.2
wapo/v2/trec-news-2019 595K 60  16K 260.9
wapo/v3/trec-news-2020 50  18K 355.3
wikiclir
wikiclir/ar 535K 324K 519K 1.6
wikiclir/ca 549K 340K 965K 2.8
wikiclir/cs 387K 234K 954K 4.1
wikiclir/de 2.1M 938K 5.6M 5.9
wikiclir/en-simple 127K 115K 250K 2.2
wikiclir/es 1.3M 782K 2.9M 3.7
wikiclir/fi 419K 274K 940K 3.4
wikiclir/fr 1.9M 1.1M 5.1M 4.7
wikiclir/it 1.3M 809K 3.4M 4.3
wikiclir/ja 1.1M 426K 3.3M 7.8
wikiclir/ko 394K 225K 568K 2.5
wikiclir/nl 1.9M 688K 2.3M 3.4
wikiclir/nn 133K 99K 250K 2.5
wikiclir/no 471K 300K 964K 3.2
wikiclir/pl 1.2M 694K 2.5M 3.6
wikiclir/pt 973K 612K 1.7M 2.8
wikiclir/ro 377K 199K 451K 2.3
wikiclir/ru 1.4M 665K 2.3M 3.5
wikiclir/sv 3.8M 639K 2.1M 3.2
wikiclir/sw 37K 23K 58K 2.5
wikiclir/tl 79K 49K 72K 1.5
wikiclir/tr 296K 185K 381K 2.1
wikiclir/uk 705K 348K 913K 2.6
wikiclir/vi 1.4M 354K 611K 1.7
wikiclir/zh 951K 463K 926K 2.0
wikir
wikir/en1k 370K
wikir/en1k/test 370K 100  4.4K 44.4 10K 100.0
wikir/en1k/training 370K 1.4K 48K 33.0 144K 100.0
wikir/en1k/validation 370K 100  5.0K 49.8 10K 100.0
wikir/en59k 2.5M
wikir/en59k/test 2.5M 1.0K 105K 104.7 100K 100.0
wikir/en59k/training 2.5M 57K 2.4M 42.7 5.7M 100.0
wikir/en59k/validation 2.5M 1.0K 69K 68.9 100K 100.0
wikir/en78k 2.5M
wikir/en78k/test 2.5M 7.9K 353K 44.9 786K 99.9
wikir/en78k/training 2.5M 63K 2.4M 38.7 6.3M 99.9
wikir/en78k/validation 2.5M 7.9K 272K 34.6 786K 99.9
wikir/ens78k 2.5M
wikir/ens78k/test 2.5M 7.9K 353K 44.9 786K 100.0
wikir/ens78k/training 2.5M 63K 2.4M 38.7 6.3M 100.0
wikir/ens78k/validation 2.5M 7.9K 272K 34.6 786K 100.0
wikir/es13k 646K
wikir/es13k/test 646K 1.3K 71K 54.9 130K 100.0
wikir/es13k/training 646K 11K 477K 42.6 1.1M 100.0
wikir/es13k/validation 646K 1.3K 59K 45.2 130K 100.0
wikir/fr14k 737K
wikir/fr14k/test 737K 1.4K 56K 39.7 140K 100.0
wikir/fr14k/training 737K 11K 609K 53.7 1.1M 100.0
wikir/fr14k/validation 737K 1.4K 81K 58.0 140K 100.0
wikir/it16k 503K
wikir/it16k/test 503K 1.6K 49K 30.8 160K 100.0
wikir/it16k/training 503K 13K 382K 28.5 1.3M 100.0
wikir/it16k/validation 503K 1.6K 45K 28.1 160K 100.0