ir_datasets
: PubMed Central (TREC CDS)Bio-medical articles from PubMed Central. Right now, only includes subsets used for the TREC Clinical Decision Support (CDS) 2014-16 tasks.
Subset of PMC articles used for the TREC 2014 and 2015 tasks (v1). Inclues titles, abstracts, full text. Collected from the open access segment on January 21, 2014.
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('pmc/v1')
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, journal, title, abstract, body>
The TREC Clinical Decision Support (CDS) track from 2014.
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('pmc/v1/trec-cds-2014')
for query in dataset.queries_iter():
query # namedtuple<query_id, type, description, summary>
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('pmc/v1/trec-cds-2014')
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, journal, title, abstract, body>
Relevance levels
Rel. | Definition |
---|---|
0 | not relevant |
1 | possibly relevant |
2 | definitely relevant |
Example
import ir_datasets
dataset = ir_datasets.load('pmc/v1/trec-cds-2014')
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
The TREC Clinical Decision Support (CDS) track from 2015.
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('pmc/v1/trec-cds-2015')
for query in dataset.queries_iter():
query # namedtuple<query_id, type, description, summary>
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('pmc/v1/trec-cds-2015')
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, journal, title, abstract, body>
Relevance levels
Rel. | Definition |
---|---|
0 | not relevant |
1 | possibly relevant |
2 | definitely relevant |
Example
import ir_datasets
dataset = ir_datasets.load('pmc/v1/trec-cds-2015')
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
Subset of PMC articles used for the TREC 2016 task (v2). Inclues titles, abstracts, full text. Collected from the open access segment on March 28, 2016.
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('pmc/v2')
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, journal, title, abstract, body>
The TREC Clinical Decision Support (CDS) track from 2016.
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('pmc/v2/trec-cds-2016')
for query in dataset.queries_iter():
query # namedtuple<query_id, type, note, description, summary>
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('pmc/v2/trec-cds-2016')
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, journal, title, abstract, body>
Relevance levels
Rel. | Definition |
---|---|
0 | not relevant |
1 | possibly relevant |
2 | definitely relevant |
Example
import ir_datasets
dataset = ir_datasets.load('pmc/v2/trec-cds-2016')
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>