← home
Github: datasets/pmc.py

ir_datasets: PubMed Central (TREC CDS)

Index
  1. pmc
  2. pmc/v1
  3. pmc/v1/trec-cds-2014
  4. pmc/v1/trec-cds-2015
  5. pmc/v2
  6. pmc/v2/trec-cds-2016

"pmc"

Bio-medical articles from PubMed Central. Right now, only includes subsets used for the TREC Clinical Decision Support (CDS) 2014-16 tasks.


"pmc/v1"

Subset of PMC articles used for the TREC 2014 and 2015 tasks (v1). Inclues titles, abstracts, full text. Collected from the open access segment on January 21, 2014.

docs

Language: en

Document type:
PmcDoc: (namedtuple)
  1. doc_id: str
  2. journal: str
  3. title: str
  4. abstract: str
  5. body: str

Example

import ir_datasets
dataset = ir_datasets.load('pmc/v1')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, journal, title, abstract, body>

"pmc/v1/trec-cds-2014"

The TREC Clinical Decision Support (CDS) track from 2014.

queries

Language: en

Query type:
TrecCdsQuery: (namedtuple)
  1. query_id: str
  2. type: str
  3. description: str
  4. summary: str

Example

import ir_datasets
dataset = ir_datasets.load('pmc/v1/trec-cds-2014')
for query in dataset.queries_iter():
    query # namedtuple<query_id, type, description, summary>
docs

Language: en

Document type:
PmcDoc: (namedtuple)
  1. doc_id: str
  2. journal: str
  3. title: str
  4. abstract: str
  5. body: str

Example

import ir_datasets
dataset = ir_datasets.load('pmc/v1/trec-cds-2014')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, journal, title, abstract, body>
qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
0not relevant
1possibly relevant
2definitely relevant

Example

import ir_datasets
dataset = ir_datasets.load('pmc/v1/trec-cds-2014')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
Citation
bibtex: @inproceedings{Simpson2014TrecCds, title={Overview of the TREC 2014 Clinical Decision Support Track}, author={Matthew S. Simpson and Ellen M. Voorhees and William Hersh}, booktitle={TREC}, year={2014} }

"pmc/v1/trec-cds-2015"

The TREC Clinical Decision Support (CDS) track from 2015.

queries

Language: en

Query type:
TrecCdsQuery: (namedtuple)
  1. query_id: str
  2. type: str
  3. description: str
  4. summary: str

Example

import ir_datasets
dataset = ir_datasets.load('pmc/v1/trec-cds-2015')
for query in dataset.queries_iter():
    query # namedtuple<query_id, type, description, summary>
docs

Language: en

Document type:
PmcDoc: (namedtuple)
  1. doc_id: str
  2. journal: str
  3. title: str
  4. abstract: str
  5. body: str

Example

import ir_datasets
dataset = ir_datasets.load('pmc/v1/trec-cds-2015')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, journal, title, abstract, body>
qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
0not relevant
1possibly relevant
2definitely relevant

Example

import ir_datasets
dataset = ir_datasets.load('pmc/v1/trec-cds-2015')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
Citation
bibtex: @inproceedings{Roberts2015TrecCds, title={Overview of the TREC 2015 Clinical Decision Support Track}, author={Kirk Roberts and Matthew S. Simpson and Ellen Voorhees and William R. Hersh}, booktitle={TREC}, year={2015} }

"pmc/v2"

Subset of PMC articles used for the TREC 2016 task (v2). Inclues titles, abstracts, full text. Collected from the open access segment on March 28, 2016.

docs

Language: en

Document type:
PmcDoc: (namedtuple)
  1. doc_id: str
  2. journal: str
  3. title: str
  4. abstract: str
  5. body: str

Example

import ir_datasets
dataset = ir_datasets.load('pmc/v2')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, journal, title, abstract, body>

"pmc/v2/trec-cds-2016"

The TREC Clinical Decision Support (CDS) track from 2016.

queries

Language: en

Query type:
TrecCds2016Query: (namedtuple)
  1. query_id: str
  2. type: str
  3. note: str
  4. description: str
  5. summary: str

Example

import ir_datasets
dataset = ir_datasets.load('pmc/v2/trec-cds-2016')
for query in dataset.queries_iter():
    query # namedtuple<query_id, type, note, description, summary>
docs

Language: en

Document type:
PmcDoc: (namedtuple)
  1. doc_id: str
  2. journal: str
  3. title: str
  4. abstract: str
  5. body: str

Example

import ir_datasets
dataset = ir_datasets.load('pmc/v2/trec-cds-2016')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, journal, title, abstract, body>
qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
0not relevant
1possibly relevant
2definitely relevant

Example

import ir_datasets
dataset = ir_datasets.load('pmc/v2/trec-cds-2016')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
Citation
bibtex: @inproceedings{Roberts2016TrecCds, title={Overview of the TREC 2016 Clinical Decision Support Track}, author={Kirk Roberts and Dina Demner-Fushman and Ellen M. Voorhees and William R. Hersh}, booktitle={TREC}, year={2016} }