← home
Github: datasets/medline.py

ir_datasets: Medline

Index
  1. medline
  2. medline/2004
  3. medline/2004/trec-genomics-2004
  4. medline/2004/trec-genomics-2005
  5. medline/2017
  6. medline/2017/trec-pm-2017
  7. medline/2017/trec-pm-2018

"medline"

Medical articles from Medline. This collection was used by TREC Genomics 2004-05 (2004 version of dataset) and by TREC Precision Medicine 2017-18 (2017 version).


"medline/2004"

3M Medline articles including titles and abstracts, used for the TREC 2004-05 Genomics track.

docsMetadata
3.7M docs

Language: en

Document type:
MedlineDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. abstract: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("medline/2004")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, abstract>

You can find more details about the Python API here.


"medline/2004/trec-genomics-2004"

The TREC Genomics Track 2004 benchmark. Contains 50 queries with article-level relevance judgments.

queriesdocsqrelsCitationMetadata
50 queries

Language: en

Query type:
TrecGenomicsQuery: (namedtuple)
  1. query_id: str
  2. title: str
  3. need: str
  4. context: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("medline/2004/trec-genomics-2004")
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, need, context>

You can find more details about the Python API here.


"medline/2004/trec-genomics-2005"

The TREC Genomics Track 2005 benchmark. Contains 50 queries with article-level relevance judgments.

queriesdocsqrelsCitationMetadata
50 queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("medline/2004/trec-genomics-2005")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"medline/2017"

26M Medline and AACR/ASCO Proceedings articles including titles and abstracts. This collection is used for the TREC 2017-18 TREC Precision Medicine track.

docsMetadata
27M docs

Language: en

Document type:
MedlineDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. abstract: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("medline/2017")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, abstract>

You can find more details about the Python API here.


"medline/2017/trec-pm-2017"

The TREC Precision Medicine (PM) Track 2017 benchmark. Contains 30 queries containing disease, gene, and target demographic information.

queriesdocsqrelsCitationMetadata
30 queries

Language: en

Query type:
TrecPm2017Query: (namedtuple)
  1. query_id: str
  2. disease: str
  3. gene: str
  4. demographic: str
  5. other: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("medline/2017/trec-pm-2017")
for query in dataset.queries_iter():
    query # namedtuple<query_id, disease, gene, demographic, other>

You can find more details about the Python API here.


"medline/2017/trec-pm-2018"

The TREC Precision Medicine (PM) Track 2018 benchmark. Contains 50 queries containing disease, gene, and target demographic information.

queriesdocsqrelsCitationMetadata
50 queries

Language: en

Query type:
TrecPmQuery: (namedtuple)
  1. query_id: str
  2. disease: str
  3. gene: str
  4. demographic: str

Examples:

Python APICLIPyTerrierXPM-IR
import ir_datasets
dataset = ir_datasets.load("medline/2017/trec-pm-2018")
for query in dataset.queries_iter():
    query # namedtuple<query_id, disease, gene, demographic>

You can find more details about the Python API here.