ir_datasets
: MedlineMedical articles from Medline. This collection was used by TREC Genomics 2004-05 (2004 version of dataset) and by TREC Precision Medicine 2017-18 (2017 version).
3M Medline articles including titles and abstracts, used for the TREC 2004-05 Genomics track.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("medline/2004")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, abstract>
You can find more details about the Python API here.
The TREC Genomics Track 2004 benchmark. Contains 50 queries with article-level relevance judgments.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("medline/2004/trec-genomics-2004")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, need, context>
You can find more details about the Python API here.
The TREC Genomics Track 2005 benchmark. Contains 50 queries with article-level relevance judgments.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("medline/2004/trec-genomics-2005")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
26M Medline and AACR/ASCO Proceedings articles including titles and abstracts. This collection is used for the TREC 2017-18 TREC Precision Medicine track.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("medline/2017")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, abstract>
You can find more details about the Python API here.
The TREC Precision Medicine (PM) Track 2017 benchmark. Contains 30 queries containing disease, gene, and target demographic information.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("medline/2017/trec-pm-2017")
for query in dataset.queries_iter():
query # namedtuple<query_id, disease, gene, demographic, other>
You can find more details about the Python API here.
The TREC Precision Medicine (PM) Track 2018 benchmark. Contains 50 queries containing disease, gene, and target demographic information.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("medline/2017/trec-pm-2018")
for query in dataset.queries_iter():
query # namedtuple<query_id, disease, gene, demographic>
You can find more details about the Python API here.