ir_datasets
: CORD-19Collection of scientific articles related to COVID-19.
Uses the 2020-07-16 version of the dataset, corresponding to the "complete" collection used for TREC COVID.
Note that this version of the document collection only provides article meta-data. To get the full text, use cord19/fulltext.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("cord19")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, doi, date, abstract>
You can find more details about the Python API here.
Version of cord19 dataset that includes article full texts. This dataset takes longer to load than the version that only includes article meata-data.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("cord19/fulltext")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, doi, date, abstract, body>
You can find more details about the Python API here.
Version of cord19/trec-covid dataset that includes article full texts. This dataset takes longer to load than the version that only includes article meata-data.
Queries and qrels are the same as cord19/trec-covid; it just uses the extended documents from cord19/fulltext.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("cord19/fulltext/trec-covid")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, description, narrative>
You can find more details about the Python API here.
The Complete TREC COVID collection. Queries related to COVID-19, including deep relevance judgments.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("cord19/trec-covid")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, description, narrative>
You can find more details about the Python API here.
Round 1 of the TREC COVID task. Includes 30 queries related to COVID-19. This uses the "2020-04-10" version of the collection.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("cord19/trec-covid/round1")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, description, narrative>
You can find more details about the Python API here.
Round 2 of the TREC COVID task. Includes 35 queries related to COVID-19. This uses the "2020-05-01" version of the collection.
Note that the qrels do not contain results from the prior round(s). Use the "complete" version for this setting (cord19/trec-covid).
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("cord19/trec-covid/round2")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, description, narrative>
You can find more details about the Python API here.
Round 3 of the TREC COVID task. Includes 40 queries related to COVID-19. This uses the "2020-05-19" version of the collection.
Note that the qrels do not contain results from the prior round(s). Use the "complete" version for this setting (cord19/trec-covid).
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("cord19/trec-covid/round3")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, description, narrative>
You can find more details about the Python API here.
Round 4 of the TREC COVID task. Includes 45 queries related to COVID-19. This uses the "2020-06-19" version of the collection.
Note that the qrels do not contain results from the prior round(s). Use the "complete" version for this setting (cord19/trec-covid).
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("cord19/trec-covid/round4")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, description, narrative>
You can find more details about the Python API here.
Round 5 of the TREC COVID task. Includes 50 queries related to COVID-19. This uses the "2020-07-16" version of the collection.
Note that the qrels do not contain results from the prior round(s). Use the "complete" version for this setting (cord19/trec-covid).
Inherits queries from cord19/trec-covid
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("cord19/trec-covid/round5")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, description, narrative>
You can find more details about the Python API here.