← home
Github: datasets/cord19.py

ir_datasets: CORD-19

Index
  1. cord19
  2. cord19/fulltext
  3. cord19/fulltext/trec-covid
  4. cord19/trec-covid
  5. cord19/trec-covid/round1
  6. cord19/trec-covid/round2
  7. cord19/trec-covid/round3
  8. cord19/trec-covid/round4
  9. cord19/trec-covid/round5

"cord19"

Collection of scientific articles related to COVID-19.

Uses the 2020-07-16 version of the dataset, corresponding to the "complete" collection used for TREC COVID.

Note that this version of the document collection only provides article meta-data. To get the full text, use cord19/fulltext.

docsCitationMetadata
193K docs

Language: en

Document type:
Cord19Doc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. doi: str
  4. date: str
  5. abstract: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("cord19")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, doi, date, abstract>

You can find more details about the Python API here.


"cord19/fulltext"

Version of cord19 dataset that includes article full texts. This dataset takes longer to load than the version that only includes article meata-data.

docsCitationMetadata
193K docs

Language: en

Document type:
Cord19FullTextDoc: (namedtuple)
  1. doc_id: str
  2. title: str
  3. doi: str
  4. date: str
  5. abstract: str
  6. body: Tuple[
    Cord19FullTextSection: (namedtuple)
    1. title: str
    2. text: str
    , ...]

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("cord19/fulltext")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, title, doi, date, abstract, body>

You can find more details about the Python API here.


"cord19/fulltext/trec-covid"

Version of cord19/trec-covid dataset that includes article full texts. This dataset takes longer to load than the version that only includes article meata-data.

Queries and qrels are the same as cord19/trec-covid; it just uses the extended documents from cord19/fulltext.

queriesdocsqrelsCitationMetadata
50 queries

Language: en

Query type:
TrecQuery: (namedtuple)
  1. query_id: str
  2. title: str
  3. description: str
  4. narrative: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("cord19/fulltext/trec-covid")
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, description, narrative>

You can find more details about the Python API here.


"cord19/trec-covid"

The Complete TREC COVID collection. Queries related to COVID-19, including deep relevance judgments.

queriesdocsqrelsCitationMetadata
50 queries

Language: en

Query type:
TrecQuery: (namedtuple)
  1. query_id: str
  2. title: str
  3. description: str
  4. narrative: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("cord19/trec-covid")
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, description, narrative>

You can find more details about the Python API here.


"cord19/trec-covid/round1"

Round 1 of the TREC COVID task. Includes 30 queries related to COVID-19. This uses the "2020-04-10" version of the collection.

queriesdocsqrelsCitationMetadata
30 queries

Language: en

Query type:
TrecQuery: (namedtuple)
  1. query_id: str
  2. title: str
  3. description: str
  4. narrative: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("cord19/trec-covid/round1")
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, description, narrative>

You can find more details about the Python API here.


"cord19/trec-covid/round2"

Round 2 of the TREC COVID task. Includes 35 queries related to COVID-19. This uses the "2020-05-01" version of the collection.

Note that the qrels do not contain results from the prior round(s). Use the "complete" version for this setting (cord19/trec-covid).

queriesdocsqrelsCitationMetadata
35 queries

Language: en

Query type:
TrecQuery: (namedtuple)
  1. query_id: str
  2. title: str
  3. description: str
  4. narrative: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("cord19/trec-covid/round2")
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, description, narrative>

You can find more details about the Python API here.


"cord19/trec-covid/round3"

Round 3 of the TREC COVID task. Includes 40 queries related to COVID-19. This uses the "2020-05-19" version of the collection.

Note that the qrels do not contain results from the prior round(s). Use the "complete" version for this setting (cord19/trec-covid).

queriesdocsqrelsCitationMetadata
40 queries

Language: en

Query type:
TrecQuery: (namedtuple)
  1. query_id: str
  2. title: str
  3. description: str
  4. narrative: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("cord19/trec-covid/round3")
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, description, narrative>

You can find more details about the Python API here.


"cord19/trec-covid/round4"

Round 4 of the TREC COVID task. Includes 45 queries related to COVID-19. This uses the "2020-06-19" version of the collection.

Note that the qrels do not contain results from the prior round(s). Use the "complete" version for this setting (cord19/trec-covid).

queriesdocsqrelsCitationMetadata
45 queries

Language: en

Query type:
TrecQuery: (namedtuple)
  1. query_id: str
  2. title: str
  3. description: str
  4. narrative: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("cord19/trec-covid/round4")
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, description, narrative>

You can find more details about the Python API here.


"cord19/trec-covid/round5"

Round 5 of the TREC COVID task. Includes 50 queries related to COVID-19. This uses the "2020-07-16" version of the collection.

Note that the qrels do not contain results from the prior round(s). Use the "complete" version for this setting (cord19/trec-covid).

queriesdocsqrelsCitationMetadata
50 queries

Inherits queries from cord19/trec-covid

Language: en

Query type:
TrecQuery: (namedtuple)
  1. query_id: str
  2. title: str
  3. description: str
  4. narrative: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("cord19/trec-covid/round5")
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, description, narrative>

You can find more details about the Python API here.