ir_datasets
: CORD-19Collection of scientific articles related to COVID-19.
Uses the 2020-07-16 version of the dataset, corresponding to the "complete" collection used for TREC COVID.
Note that this version of the document collection only provides article meta-data. To get the full text, use cord19/fulltext.
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('cord19')
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, doi, date, abstract>
Version of cord19 dataset that includes article full texts. This dataset takes longer to load than the version that only includes article meata-data.
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('cord19/fulltext')
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, doi, date, abstract, body>
Version of cord19/trec-covid dataset that includes article full texts. This dataset takes longer to load than the version that only includes article meata-data.
Queries and qrels are the same as cord19/trec-covid; it just uses the extended documents from cord19/fulltext.
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('cord19/fulltext/trec-covid')
for query in dataset.queries_iter():
query # namedtuple<query_id, title, description, narrative>
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('cord19/fulltext/trec-covid')
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, doi, date, abstract, body>
Relevance levels
Rel. | Definition |
---|---|
0 | Not Relevant: everything else. |
1 | Partially Relevant: the article answers part of the question but would need to be combined with other information to get a complete answer. |
2 | Relevant: the article is fully responsive to the information need as expressed by the topic, i.e. answers the Question in the topic. The article need not contain all information on the topic, but must, on its own, provide an answer to the question. |
Example
import ir_datasets
dataset = ir_datasets.load('cord19/fulltext/trec-covid')
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>
The TREC COVID collection. Queries related to COVID-19, including deep relevance judgments.
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('cord19/trec-covid')
for query in dataset.queries_iter():
query # namedtuple<query_id, title, description, narrative>
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('cord19/trec-covid')
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, title, doi, date, abstract>
Relevance levels
Rel. | Definition |
---|---|
0 | Not Relevant: everything else. |
1 | Partially Relevant: the article answers part of the question but would need to be combined with other information to get a complete answer. |
2 | Relevant: the article is fully responsive to the information need as expressed by the topic, i.e. answers the Question in the topic. The article need not contain all information on the topic, but must, on its own, provide an answer to the question. |
Example
import ir_datasets
dataset = ir_datasets.load('cord19/trec-covid')
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, relevance, iteration>