`ir_datasets`: PubMed Central (TREC CDS)

Index

pmc
pmc/v1
pmc/v1/trec-cds-2014
pmc/v1/trec-cds-2015
pmc/v2
pmc/v2/trec-cds-2016

`"pmc"`

Bio-medical articles from PubMed Central. Right now, only includes subsets used for the TREC Clinical Decision Support (CDS) 2014-16 tasks.

`"pmc/v1"`

Subset of PMC articles used for the TREC 2014 and 2015 tasks (v1). Inclues titles, abstracts, full text. Collected from the open access segment on January 21, 2014.

Information on documents

docs

Language: en

Document type:

PmcDoc: (namedtuple)

doc_id: str
journal: str
title: str
abstract: str
body: str

Example


import ir_datasets
dataset = ir_datasets.load('pmc/v1')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, journal, title, abstract, body>

`"pmc/v1/trec-cds-2014"`

The TREC Clinical Decision Support (CDS) track from 2014.

queries

Language: en

Query type:

TrecCdsQuery: (namedtuple)

query_id: str
type: str
description: str
summary: str

Example


import ir_datasets
dataset = ir_datasets.load('pmc/v1/trec-cds-2014')
for query in dataset.queries_iter():
    query # namedtuple<query_id, type, description, summary>

docs

Language: en

Document type:

PmcDoc: (namedtuple)

doc_id: str
journal: str
title: str
abstract: str
body: str

Example


import ir_datasets
dataset = ir_datasets.load('pmc/v1/trec-cds-2014')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, journal, title, abstract, body>

qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition
0	not relevant
1	possibly relevant
2	definitely relevant

Example


import ir_datasets
dataset = ir_datasets.load('pmc/v1/trec-cds-2014')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

Citation

bibtex: @inproceedings{Simpson2014TrecCds, title={Overview of the TREC 2014 Clinical Decision Support Track}, author={Matthew S. Simpson and Ellen M. Voorhees and William Hersh}, booktitle={TREC}, year={2014} }

`"pmc/v1/trec-cds-2015"`

The TREC Clinical Decision Support (CDS) track from 2015.

queries

Language: en

Query type:

TrecCdsQuery: (namedtuple)

query_id: str
type: str
description: str
summary: str

Example


import ir_datasets
dataset = ir_datasets.load('pmc/v1/trec-cds-2015')
for query in dataset.queries_iter():
    query # namedtuple<query_id, type, description, summary>

docs

Language: en

Document type:

PmcDoc: (namedtuple)

doc_id: str
journal: str
title: str
abstract: str
body: str

Example


import ir_datasets
dataset = ir_datasets.load('pmc/v1/trec-cds-2015')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, journal, title, abstract, body>

qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition
0	not relevant
1	possibly relevant
2	definitely relevant

Example


import ir_datasets
dataset = ir_datasets.load('pmc/v1/trec-cds-2015')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

Citation

bibtex: @inproceedings{Roberts2015TrecCds, title={Overview of the TREC 2015 Clinical Decision Support Track}, author={Kirk Roberts and Matthew S. Simpson and Ellen Voorhees and William R. Hersh}, booktitle={TREC}, year={2015} }

`"pmc/v2"`

Subset of PMC articles used for the TREC 2016 task (v2). Inclues titles, abstracts, full text. Collected from the open access segment on March 28, 2016.

Information on documents

docs

Language: en

Document type:

PmcDoc: (namedtuple)

doc_id: str
journal: str
title: str
abstract: str
body: str

Example


import ir_datasets
dataset = ir_datasets.load('pmc/v2')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, journal, title, abstract, body>

`"pmc/v2/trec-cds-2016"`

The TREC Clinical Decision Support (CDS) track from 2016.

queries

Language: en

Query type:

TrecCds2016Query: (namedtuple)

query_id: str
type: str
note: str
description: str
summary: str

Example


import ir_datasets
dataset = ir_datasets.load('pmc/v2/trec-cds-2016')
for query in dataset.queries_iter():
    query # namedtuple<query_id, type, note, description, summary>

docs

Language: en

Document type:

PmcDoc: (namedtuple)

doc_id: str
journal: str
title: str
abstract: str
body: str

Example


import ir_datasets
dataset = ir_datasets.load('pmc/v2/trec-cds-2016')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, journal, title, abstract, body>

qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition
0	not relevant
1	possibly relevant
2	definitely relevant

Example


import ir_datasets
dataset = ir_datasets.load('pmc/v2/trec-cds-2016')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

Citation

bibtex: @inproceedings{Roberts2016TrecCds, title={Overview of the TREC 2016 Clinical Decision Support Track}, author={Kirk Roberts and Dina Demner-Fushman and Ellen M. Voorhees and William R. Hersh}, booktitle={TREC}, year={2016} }

ir_datasets: PubMed Central (TREC CDS)

"pmc"

"pmc/v1"

"pmc/v1/trec-cds-2014"

"pmc/v1/trec-cds-2015"

"pmc/v2"

"pmc/v2/trec-cds-2016"

`ir_datasets`: PubMed Central (TREC CDS)

`"pmc"`

`"pmc/v1"`

`"pmc/v1/trec-cds-2014"`

`"pmc/v1/trec-cds-2015"`

`"pmc/v2"`

`"pmc/v2/trec-cds-2016"`