ir_datasets
: Highwire (TREC Genomics 2006-07)Medical document collection from Highwire Press. Includes 162,259 scientific articles from 49 journals.
This dataset is used for the TREC 2006-07 TREC Genomics track.
Note that these documents are split into passages based on paragraph tags in the HTML.
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('highwire')
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, journal, title, spans>
The TREC Genomics Track 2006 benchmark. Contains 28 queries with passage-level relevance judgments.
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('highwire/trec-genomics-2006')
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('highwire/trec-genomics-2006')
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, journal, title, spans>
Relevance levels
Rel. | Definition |
---|---|
0 | NOT |
1 | POSSIBLY |
2 | DEFINITELY |
Example
import ir_datasets
dataset = ir_datasets.load('highwire/trec-genomics-2006')
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, start, length, relevance>
The TREC Genomics Track 2007 benchmark. Contains 36 queries with passage-level relevance judgments.
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('highwire/trec-genomics-2007')
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
Language: en
Example
import ir_datasets
dataset = ir_datasets.load('highwire/trec-genomics-2007')
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, journal, title, spans>
Relevance levels
Rel. | Definition |
---|---|
0 | NOT_RELEVANT |
1 | RELEVANT |
Example
import ir_datasets
dataset = ir_datasets.load('highwire/trec-genomics-2007')
for qrel in dataset.qrels_iter():
qrel # namedtuple<query_id, doc_id, start, length, relevance>