← home
Github: datasets/pmc.py

ir_datasets: PubMed Central (TREC CDS)

Index
  1. pmc
  2. pmc/v1
  3. pmc/v1/trec-cds-2014
  4. pmc/v1/trec-cds-2015
  5. pmc/v2
  6. pmc/v2/trec-cds-2016

"pmc"

Bio-medical articles from PubMed Central. Right now, only includes subsets used for the TREC Clinical Decision Support (CDS) 2014-16 tasks.


"pmc/v1"

Subset of PMC articles used for the TREC 2014 and 2015 tasks (v1). Inclues titles, abstracts, full text. Collected from the open access segment on January 21, 2014.

docs

Language: en

Document type:
PmcDoc: (namedtuple)
  1. doc_id: str
  2. journal: str
  3. title: str
  4. abstract: str
  5. body: str

Example

import ir_datasets
dataset = ir_datasets.load('pmc/v1')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, journal, title, abstract, body>

"pmc/v1/trec-cds-2014"

The TREC Clinical Decision Support (CDS) track from 2014.

queriesdocsqrelsCitation

Language: en

Query type:
TrecCdsQuery: (namedtuple)
  1. query_id: str
  2. type: str
  3. description: str
  4. summary: str

Example

import ir_datasets
dataset = ir_datasets.load('pmc/v1/trec-cds-2014')
for query in dataset.queries_iter():
    query # namedtuple<query_id, type, description, summary>

"pmc/v1/trec-cds-2015"

The TREC Clinical Decision Support (CDS) track from 2015.

queriesdocsqrelsCitation

Language: en

Query type:
TrecCdsQuery: (namedtuple)
  1. query_id: str
  2. type: str
  3. description: str
  4. summary: str

Example

import ir_datasets
dataset = ir_datasets.load('pmc/v1/trec-cds-2015')
for query in dataset.queries_iter():
    query # namedtuple<query_id, type, description, summary>

"pmc/v2"

Subset of PMC articles used for the TREC 2016 task (v2). Inclues titles, abstracts, full text. Collected from the open access segment on March 28, 2016.

docs

Language: en

Document type:
PmcDoc: (namedtuple)
  1. doc_id: str
  2. journal: str
  3. title: str
  4. abstract: str
  5. body: str

Example

import ir_datasets
dataset = ir_datasets.load('pmc/v2')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, journal, title, abstract, body>

"pmc/v2/trec-cds-2016"

The TREC Clinical Decision Support (CDS) track from 2016.

queriesdocsqrelsCitation

Language: en

Query type:
TrecCds2016Query: (namedtuple)
  1. query_id: str
  2. type: str
  3. note: str
  4. description: str
  5. summary: str

Example

import ir_datasets
dataset = ir_datasets.load('pmc/v2/trec-cds-2016')
for query in dataset.queries_iter():
    query # namedtuple<query_id, type, note, description, summary>