← home
Github: datasets/pmc.py

ir_datasets: PubMed Central (TREC CDS)

Index
  1. pmc
  2. pmc/v1
  3. pmc/v1/trec-cds-2014
  4. pmc/v1/trec-cds-2015
  5. pmc/v2
  6. pmc/v2/trec-cds-2016

"pmc"

Bio-medical articles from PubMed Central. Right now, only includes subsets used for the TREC Clinical Decision Support (CDS) 2014-16 tasks.


"pmc/v1"

Subset of PMC articles used for the TREC 2014 and 2015 tasks (v1). Inclues titles, abstracts, full text. Collected from the open access segment on January 21, 2014.

docsMetadata
733K docs

Language: en

Document type:
PmcDoc: (namedtuple)
  1. doc_id: str
  2. journal: str
  3. title: str
  4. abstract: str
  5. body: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("pmc/v1")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, journal, title, abstract, body>

You can find more details about the Python API here.


"pmc/v1/trec-cds-2014"

The TREC Clinical Decision Support (CDS) track from 2014.

queriesdocsqrelsCitationMetadata
30 queries

Language: en

Query type:
TrecCdsQuery: (namedtuple)
  1. query_id: str
  2. type: str
  3. description: str
  4. summary: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("pmc/v1/trec-cds-2014")
for query in dataset.queries_iter():
    query # namedtuple<query_id, type, description, summary>

You can find more details about the Python API here.


"pmc/v1/trec-cds-2015"

The TREC Clinical Decision Support (CDS) track from 2015.

queriesdocsqrelsCitationMetadata
30 queries

Language: en

Query type:
TrecCdsQuery: (namedtuple)
  1. query_id: str
  2. type: str
  3. description: str
  4. summary: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("pmc/v1/trec-cds-2015")
for query in dataset.queries_iter():
    query # namedtuple<query_id, type, description, summary>

You can find more details about the Python API here.


"pmc/v2"

Subset of PMC articles used for the TREC 2016 task (v2). Inclues titles, abstracts, full text. Collected from the open access segment on March 28, 2016.

docsMetadata
1.3M docs

Language: en

Document type:
PmcDoc: (namedtuple)
  1. doc_id: str
  2. journal: str
  3. title: str
  4. abstract: str
  5. body: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("pmc/v2")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, journal, title, abstract, body>

You can find more details about the Python API here.


"pmc/v2/trec-cds-2016"

The TREC Clinical Decision Support (CDS) track from 2016.

queriesdocsqrelsCitationMetadata
30 queries

Language: en

Query type:
TrecCds2016Query: (namedtuple)
  1. query_id: str
  2. type: str
  3. note: str
  4. description: str
  5. summary: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("pmc/v2/trec-cds-2016")
for query in dataset.queries_iter():
    query # namedtuple<query_id, type, note, description, summary>

You can find more details about the Python API here.