ir_datasets
: MedlineMedical articles from Medline. This collection was used by TREC Genomics 2004-05 (2004 version of dataset) and by TREC Precision Medicine 2017-18 (2017 version).
3M Medline articles including titles and abstracts, used for the TREC 2004-05 Genomics track.
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('medline/2004')
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, abstract>
The TREC Genomics Track 2004 benchmark. Contains 50 queries with article-level relevance judgments.
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('medline/2004/trec-genomics-2004')
for query in dataset.queries_iter():
query # namedtuple<query_id, title, need, context>
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('medline/2004/trec-genomics-2004')
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, abstract>
Relevance levels
Rel. | Definition |
---|---|
0 | not relevant |
1 | possibly relevant |
2 | definitely relevant |
Example
import ir_datasets
dataset = ir_datasets.load('medline/2004/trec-genomics-2004')
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
The TREC Genomics Track 2005 benchmark. Contains 36 queries with passage-level relevance judgments.
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('medline/2004/trec-genomics-2005')
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('medline/2004/trec-genomics-2005')
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, abstract>
Relevance levels
Rel. | Definition |
---|---|
0 | not relevant |
1 | possibly relevant |
2 | definitely relevant |
Example
import ir_datasets
dataset = ir_datasets.load('medline/2004/trec-genomics-2005')
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
26M Medline and AACR/ASCO Proceedings articles including titles and abstracts. This collection is used for the TREC 2017-18 TREC Precision Medicine track.
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('medline/2017')
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, abstract>
The TREC Precision Medicine (PM) Track 2017 benchmark. Contains 30 queries containing disease, gene, and target demographic information.
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('medline/2017/trec-pm-2017')
for query in dataset.queries_iter():
query # namedtuple<query_id, disease, gene, demographic, other>
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('medline/2017/trec-pm-2017')
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, abstract>
Relevance levels
Rel. | Definition |
---|---|
0 | not relevant |
1 | possibly relevant |
2 | definitely relevant |
Example
import ir_datasets
dataset = ir_datasets.load('medline/2017/trec-pm-2017')
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
The TREC Precision Medicine (PM) Track 2018 benchmark. Contains 50 queries containing disease, gene, and target demographic information.
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('medline/2017/trec-pm-2018')
for query in dataset.queries_iter():
query # namedtuple<query_id, disease, gene, demographic>
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('medline/2017/trec-pm-2018')
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, abstract>
Relevance levels
Rel. | Definition |
---|---|
0 | not relevant |
1 | possibly relevant |
2 | definitely relevant |
Example
import ir_datasets
dataset = ir_datasets.load('medline/2017/trec-pm-2018')
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>