← home
Github: datasets/gov.py

ir_datasets: GOV

Index
  1. gov
  2. gov/trec-web-2002
  3. gov/trec-web-2002/named-page
  4. gov/trec-web-2003
  5. gov/trec-web-2003/named-page
  6. gov/trec-web-2004

"gov"

GOV web document collection. Used for early TREC Web Tracks. Not to be confused with gov2.

The dataset is obtained for a fee from UoG, and is shipped as a hard drive. More information is provided here.

docs

Language: en

Document type:
GovDoc: (namedtuple)
  1. doc_id: str
  2. url: str
  3. http_headers: str
  4. body: bytes
  5. body_content_type: str

Example

import ir_datasets
dataset = ir_datasets.load('gov')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, http_headers, body, body_content_type>

"gov/trec-web-2002"

The TREC Web Track 2002 ad-hoc ranking benchmark.

queries

Language: en

Query type:
TrecQuery: (namedtuple)
  1. query_id: str
  2. title: str
  3. description: str
  4. narrative: str

Example

import ir_datasets
dataset = ir_datasets.load('gov/trec-web-2002')
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, description, narrative>
docs

Language: en

Document type:
GovDoc: (namedtuple)
  1. doc_id: str
  2. url: str
  3. http_headers: str
  4. body: bytes
  5. body_content_type: str

Example

import ir_datasets
dataset = ir_datasets.load('gov/trec-web-2002')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, http_headers, body, body_content_type>
qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
0Not Relevant
1Relevant

Example

import ir_datasets
dataset = ir_datasets.load('gov/trec-web-2002')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
Citation
bibtex: @inproceedings{Craswell2002TrecWeb, title={Overview of the TREC-2002 Web Track}, author={Nick Craswell and David Hawking}, booktitle={TREC}, year={2002} }

"gov/trec-web-2002/named-page"

The TREC Web Track 2002 named page ranking benchmark.

queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('gov/trec-web-2002/named-page')
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
docs

Language: en

Document type:
GovDoc: (namedtuple)
  1. doc_id: str
  2. url: str
  3. http_headers: str
  4. body: bytes
  5. body_content_type: str

Example

import ir_datasets
dataset = ir_datasets.load('gov/trec-web-2002/named-page')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, http_headers, body, body_content_type>
qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
1Name refers to this page

Example

import ir_datasets
dataset = ir_datasets.load('gov/trec-web-2002/named-page')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
Citation
bibtex: @inproceedings{Craswell2002TrecWeb, title={Overview of the TREC-2002 Web Track}, author={Nick Craswell and David Hawking}, booktitle={TREC}, year={2002} }

"gov/trec-web-2003"

The TREC Web Track 2003 ad-hoc ranking benchmark.

queries

Language: en

Query type:
GovWeb02Query: (namedtuple)
  1. query_id: str
  2. title: str
  3. description: str

Example

import ir_datasets
dataset = ir_datasets.load('gov/trec-web-2003')
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, description>
docs

Language: en

Document type:
GovDoc: (namedtuple)
  1. doc_id: str
  2. url: str
  3. http_headers: str
  4. body: bytes
  5. body_content_type: str

Example

import ir_datasets
dataset = ir_datasets.load('gov/trec-web-2003')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, http_headers, body, body_content_type>
qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
0Not Relevant
1Relevant

Example

import ir_datasets
dataset = ir_datasets.load('gov/trec-web-2003')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
Citation
bibtex: @inproceedings{Craswell2003TrecWeb, title={Overview of the TREC 2003 Web Track}, author={Nick Craswell and David Hawking and Ross Wilkinson and Mingfang Wu}, booktitle={TREC}, year={2003} }

"gov/trec-web-2003/named-page"

The TREC Web Track 2003 named page ranking benchmark.

queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('gov/trec-web-2003/named-page')
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
docs

Language: en

Document type:
GovDoc: (namedtuple)
  1. doc_id: str
  2. url: str
  3. http_headers: str
  4. body: bytes
  5. body_content_type: str

Example

import ir_datasets
dataset = ir_datasets.load('gov/trec-web-2003/named-page')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, http_headers, body, body_content_type>
qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
1Name refers to this page

Example

import ir_datasets
dataset = ir_datasets.load('gov/trec-web-2003/named-page')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
Citation
bibtex: @inproceedings{Craswell2003TrecWeb, title={Overview of the TREC 2003 Web Track}, author={Nick Craswell and David Hawking and Ross Wilkinson and Mingfang Wu}, booktitle={TREC}, year={2003} }

"gov/trec-web-2004"

The TREC Web Track 2004 ad-hoc ranking benchmark.

Queries include a combination of topic distillation, homepage finding, and named page finding.

queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('gov/trec-web-2004')
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
docs

Language: en

Document type:
GovDoc: (namedtuple)
  1. doc_id: str
  2. url: str
  3. http_headers: str
  4. body: bytes
  5. body_content_type: str

Example

import ir_datasets
dataset = ir_datasets.load('gov/trec-web-2004')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, http_headers, body, body_content_type>
qrels
Query relevance judgment type:
TrecQrel: (namedtuple)
  1. query_id: str
  2. doc_id: str
  3. relevance: int
  4. iteration: str

Relevance levels

Rel.Definition
0Not Relevant
1Relevant

Example

import ir_datasets
dataset = ir_datasets.load('gov/trec-web-2004')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
Citation
bibtex: @inproceedings{Craswell2004TrecWeb, title={Overview of the TREC-2004 Web Track}, author={Nick Craswell and David Hawking}, booktitle={TREC}, year={2004} }