← home
Github: datasets/cord19.py

ir_datasets: CORD-19

Index
  1. cord19
  2. cord19/fulltext
  3. cord19/fulltext/trec-covid
  4. cord19/trec-covid

"cord19"

Collection of scientific articles related to COVID-19.

Uses the 2020-07-16 version of the dataset, corresponding to the "complete" collection used for TREC COVID.

Note that this version of the document collection only provides article meta-data. To get the full text, use cord19/fulltext.

docsCitation

Language: en

Document type:
Cord19Doc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. doi: str
  4. date: str
  5. abstract: str

Example

import ir_datasets
dataset = ir_datasets.load('cord19')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, doi, date, abstract>

"cord19/fulltext"

Version of cord19 dataset that includes article full texts. This dataset takes longer to load than the version that only includes article meata-data.

docsCitation

Language: en

Document type:
Cord19FullTextDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. doi: str
  4. date: str
  5. abstract: str
  6. body: Tuple[
    Cord19FullTextSection: (namedtuple)
    1. title: str
    2. text: str
    , ...]

Example

import ir_datasets
dataset = ir_datasets.load('cord19/fulltext')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, doi, date, abstract, body>

"cord19/fulltext/trec-covid"

Version of cord19/trec-covid dataset that includes article full texts. This dataset takes longer to load than the version that only includes article meata-data.

Queries and qrels are the same as cord19/trec-covid; it just uses the extended documents from cord19/fulltext.

queriesdocsqrelsCitation

Language: en

Query type:
TrecQuery: (namedtuple)
  1. query_id: str
  2. title: str
  3. description: str
  4. narrative: str

Example

import ir_datasets
dataset = ir_datasets.load('cord19/fulltext/trec-covid')
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, description, narrative>

"cord19/trec-covid"

The TREC COVID collection. Queries related to COVID-19, including deep relevance judgments.

queriesdocsqrelsCitation

Language: en

Query type:
TrecQuery: (namedtuple)
  1. query_id: str
  2. title: str
  3. description: str
  4. narrative: str

Example

import ir_datasets
dataset = ir_datasets.load('cord19/trec-covid')
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, description, narrative>