← home
Github: datasets/medline.py

ir_datasets: Medline

Index
  1. medline
  2. medline/2004
  3. medline/2004/trec-genomics-2004
  4. medline/2004/trec-genomics-2005
  5. medline/2017
  6. medline/2017/trec-pm-2017
  7. medline/2017/trec-pm-2018

"medline"

Medical articles from Medline. This collection was used by TREC Genomics 2004-05 (2004 version of dataset) and by TREC Precision Medicine 2017-18 (2017 version).


"medline/2004"

3M Medline articles including titles and abstracts, used for the TREC 2004-05 Genomics track.

docs

Language: en

Document type:
MedlineDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. abstract: str

Example

import ir_datasets
dataset = ir_datasets.load('medline/2004')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, abstract>

"medline/2004/trec-genomics-2004"

The TREC Genomics Track 2004 benchmark. Contains 50 queries with article-level relevance judgments.

queries

Language: en

Query type:
TrecGenomicsQuery: (namedtuple)
  1. query_id: str
  2. title: str
  3. need: str
  4. context: str

Example

import ir_datasets
dataset = ir_datasets.load('medline/2004/trec-genomics-2004')
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, need, context>
docs

Language: en

Document type:
MedlineDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. abstract: str

Example

import ir_datasets
dataset = ir_datasets.load('medline/2004/trec-genomics-2004')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, abstract>
qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
0not relevant
1possibly relevant
2definitely relevant

Example

import ir_datasets
dataset = ir_datasets.load('medline/2004/trec-genomics-2004')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
Citation
bibtex: @inproceedings{Hersh2004TrecGenomics, title={TREC 2004 Genomics Track Overview}, author={William R. Hersh and Ravi Teja Bhuptiraju and Laura Ross and Phoebe Johnson and Aaron M. Cohen and Dale F. Kraemer}, booktitle={TREC}, year={2004} }

"medline/2004/trec-genomics-2005"

The TREC Genomics Track 2005 benchmark. Contains 36 queries with passage-level relevance judgments.

queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('medline/2004/trec-genomics-2005')
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
docs

Language: en

Document type:
MedlineDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. abstract: str

Example

import ir_datasets
dataset = ir_datasets.load('medline/2004/trec-genomics-2005')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, abstract>
qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
0not relevant
1possibly relevant
2definitely relevant

Example

import ir_datasets
dataset = ir_datasets.load('medline/2004/trec-genomics-2005')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
Citation
bibtex: @inproceedings{Hersh2005TrecGenomics, title={TREC 2005 Genomics Track Overview}, author={William Hersh and Aaron Cohen and Jianji Yang and Ravi Teja Bhupatiraju and Phoebe Roberts and Marti Hearst}, booktitle={TREC}, year={2007} }

"medline/2017"

26M Medline and AACR/ASCO Proceedings articles including titles and abstracts. This collection is used for the TREC 2017-18 TREC Precision Medicine track.

docs

Language: en

Document type:
MedlineDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. abstract: str

Example

import ir_datasets
dataset = ir_datasets.load('medline/2017')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, abstract>

"medline/2017/trec-pm-2017"

The TREC Precision Medicine (PM) Track 2017 benchmark. Contains 30 queries containing disease, gene, and target demographic information.

queries

Language: en

Query type:
TrecPm2017Query: (namedtuple)
  1. query_id: str
  2. disease: str
  3. gene: str
  4. demographic: str
  5. other: str

Example

import ir_datasets
dataset = ir_datasets.load('medline/2017/trec-pm-2017')
for query in dataset.queries_iter():
    query # namedtuple<query_id, disease, gene, demographic, other>
docs

Language: en

Document type:
MedlineDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. abstract: str

Example

import ir_datasets
dataset = ir_datasets.load('medline/2017/trec-pm-2017')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, abstract>
qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
0not relevant
1possibly relevant
2definitely relevant

Example

import ir_datasets
dataset = ir_datasets.load('medline/2017/trec-pm-2017')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
Citation
bibtex: @inproceedings{Roberts2017TrecPm, title={Overview of the TREC 2017 Precision Medicine Track}, author={Kirk Roberts and Dina Demner-Fushman and Ellen M. Voorhees and William R. Hersh and Steven Bedrick and Alexander J. Lazar and Shubham Pant}, booktitle={TREC}, year={2017} }

"medline/2017/trec-pm-2018"

The TREC Precision Medicine (PM) Track 2018 benchmark. Contains 50 queries containing disease, gene, and target demographic information.

queries

Language: en

Query type:
TrecPmQuery: (namedtuple)
  1. query_id: str
  2. disease: str
  3. gene: str
  4. demographic: str

Example

import ir_datasets
dataset = ir_datasets.load('medline/2017/trec-pm-2018')
for query in dataset.queries_iter():
    query # namedtuple<query_id, disease, gene, demographic>
docs

Language: en

Document type:
MedlineDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. abstract: str

Example

import ir_datasets
dataset = ir_datasets.load('medline/2017/trec-pm-2018')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, abstract>
qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
0not relevant
1possibly relevant
2definitely relevant

Example

import ir_datasets
dataset = ir_datasets.load('medline/2017/trec-pm-2018')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
Citation
bibtex: @inproceedings{Roberts2018TrecPm, title={Overview of the TREC 2018 Precision Medicine Track}, author={Kirk Roberts and Dina Demner-Fushman and Ellen M. Voorhees and William R. Hersh and Steven Bedrick and Alexander J. Lazar}, booktitle={TREC}, year={2018} }