← home
Github: datasets/medline.py

ir_datasets: Medline

Index
  1. medline
  2. medline/2004
  3. medline/2004/trec-genomics-2004
  4. medline/2004/trec-genomics-2005
  5. medline/2017
  6. medline/2017/trec-pm-2017
  7. medline/2017/trec-pm-2018

"medline"

Medical articles from Medline. This collection was used by TREC Genomics 2004-05 (2004 version of dataset) and by TREC Precision Medicine 2017-18 (2017 version).


"medline/2004"

3M Medline articles including titles and abstracts, used for the TREC 2004-05 Genomics track.

docs

Language: en

Document type:
MedlineDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. abstract: str

Example

import ir_datasets
dataset = ir_datasets.load('medline/2004')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, abstract>

"medline/2004/trec-genomics-2004"

The TREC Genomics Track 2004 benchmark. Contains 50 queries with article-level relevance judgments.

queriesdocsqrelsCitation

Language: en

Query type:
TrecGenomicsQuery: (namedtuple)
  1. query_id: str
  2. title: str
  3. need: str
  4. context: str

Example

import ir_datasets
dataset = ir_datasets.load('medline/2004/trec-genomics-2004')
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, need, context>

"medline/2004/trec-genomics-2005"

The TREC Genomics Track 2005 benchmark. Contains 36 queries with passage-level relevance judgments.

queriesdocsqrelsCitation

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('medline/2004/trec-genomics-2005')
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

"medline/2017"

26M Medline and AACR/ASCO Proceedings articles including titles and abstracts. This collection is used for the TREC 2017-18 TREC Precision Medicine track.

docs

Language: en

Document type:
MedlineDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. abstract: str

Example

import ir_datasets
dataset = ir_datasets.load('medline/2017')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, abstract>

"medline/2017/trec-pm-2017"

The TREC Precision Medicine (PM) Track 2017 benchmark. Contains 30 queries containing disease, gene, and target demographic information.

queriesdocsqrelsCitation

Language: en

Query type:
TrecPm2017Query: (namedtuple)
  1. query_id: str
  2. disease: str
  3. gene: str
  4. demographic: str
  5. other: str

Example

import ir_datasets
dataset = ir_datasets.load('medline/2017/trec-pm-2017')
for query in dataset.queries_iter():
    query # namedtuple<query_id, disease, gene, demographic, other>

"medline/2017/trec-pm-2018"

The TREC Precision Medicine (PM) Track 2018 benchmark. Contains 50 queries containing disease, gene, and target demographic information.

queriesdocsqrelsCitation

Language: en

Query type:
TrecPmQuery: (namedtuple)
  1. query_id: str
  2. disease: str
  3. gene: str
  4. demographic: str

Example

import ir_datasets
dataset = ir_datasets.load('medline/2017/trec-pm-2018')
for query in dataset.queries_iter():
    query # namedtuple<query_id, disease, gene, demographic>