ir_datasets : GOV2

import ir_datasets
dataset = ir_datasets.load("gov2")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, http_headers, body, body_content_type>

You can find more details about the Python API here.

CLI

ir_datasets export gov2 docs



[doc_id]    [url]    [http_headers]    [body]    [body_content_type]
...

You can find more details about the CLI here.

No example available for PyTerrier

{
  "docs": {
    "count": 25205179,
    "fields": {
      "doc_id": {
        "max_len": 17,
        "common_prefix": "GX"
      }
    }
  }
}

`"gov2/trec-mq-2007"`

TREC 2007 Million Query track.

10K queries

Language: multiple/other/unknown

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("gov2/trec-mq-2007")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export gov2/trec-mq-2007 queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

docs

25M docs

Inherits docs from gov2

Language: en

Document type:

Gov2Doc: (namedtuple)

doc_id: str
url: str
http_headers: str
body: bytes
body_content_type: str

Examples:

import ir_datasets
dataset = ir_datasets.load("gov2/trec-mq-2007")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, http_headers, body, body_content_type>

You can find more details about the Python API here.

CLI

ir_datasets export gov2/trec-mq-2007 docs



[doc_id]    [url]    [http_headers]    [body]    [body_content_type]
...

You can find more details about the CLI here.

No example available for PyTerrier

qrels

73K qrels

Query relevance judgment type:

TrecPrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
method: int
iprob: float

Relevance levels

Rel.	Definition	Count	%
0	Not Relevant	`54K`	74.4%
1	Relevant	`15K`	20.1%
2	Highly Relevant	`4.0K`	5.5%

Examples:

import ir_datasets
dataset = ir_datasets.load("gov2/trec-mq-2007")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, method, iprob>

You can find more details about the Python API here.

CLI

ir_datasets export gov2/trec-mq-2007 qrels --format tsv



[query_id]    [doc_id]    [relevance]    [method]    [iprob]
...

You can find more details about the CLI here.

No example available for PyTerrier

\cite{Allen2007MQ}

Bibtex:

@inproceedings{Allen2007MQ, title={Million Query Track 2007 Overview}, author={James Allan and Ben Carterette and Javed A. Aslam and Virgil Pavlu and Blagovest Dachev and Evangelos Kanoulas}, booktitle={TREC}, year={2007} }

{
  "docs": {
    "count": 25205179,
    "fields": {
      "doc_id": {
        "max_len": 17,
        "common_prefix": "GX"
      }
    }
  },
  "queries": {
    "count": 10000
  },
  "qrels": {
    "count": 73015,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "0": 54333,
          "1": 14689,
          "2": 3993
        }
      }
    }
  }
}

`"gov2/trec-mq-2008"`

TREC 2008 Million Query track.

10K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("gov2/trec-mq-2008")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export gov2/trec-mq-2008 queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

docs

25M docs

Inherits docs from gov2

Language: en

Document type:

Gov2Doc: (namedtuple)

doc_id: str
url: str
http_headers: str
body: bytes
body_content_type: str

Examples:

import ir_datasets
dataset = ir_datasets.load("gov2/trec-mq-2008")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, http_headers, body, body_content_type>

You can find more details about the Python API here.

CLI

ir_datasets export gov2/trec-mq-2008 docs



[doc_id]    [url]    [http_headers]    [body]    [body_content_type]
...

You can find more details about the CLI here.

No example available for PyTerrier

qrels

15K qrels

Query relevance judgment type:

TrecPrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
method: int
iprob: float

Relevance levels

Rel.	Definition	Count	%
0	Not Relevant	`12K`	80.7%
1	Relevant	`2.9K`	19.3%
2	Highly Relevant	`0`	0.0%

Examples:

import ir_datasets
dataset = ir_datasets.load("gov2/trec-mq-2008")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, method, iprob>

You can find more details about the Python API here.

CLI

ir_datasets export gov2/trec-mq-2008 qrels --format tsv



[query_id]    [doc_id]    [relevance]    [method]    [iprob]
...

You can find more details about the CLI here.

No example available for PyTerrier

\cite{Allen2008MQ}

Bibtex:

@inproceedings{Allen2008MQ, title={Million Query Track 2008 Overview}, author={James Allan and Javed A. Aslam and Ben Carterette and Virgil Pavlu and Evangelos Kanoulas}, booktitle={TREC}, year={2008} }

{
  "docs": {
    "count": 25205179,
    "fields": {
      "doc_id": {
        "max_len": 17,
        "common_prefix": "GX"
      }
    }
  },
  "queries": {
    "count": 10000
  },
  "qrels": {
    "count": 15211,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "0": 12279,
          "1": 2932
        }
      }
    }
  }
}

`"gov2/trec-tb-2004"`

The TREC Terabyte Track 2004 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

50 queries

Language: en

Query type:

TrecQuery: (namedtuple)

query_id: str
title: str
description: str
narrative: str

Examples:

import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2004")
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, description, narrative>

You can find more details about the Python API here.

CLI

ir_datasets export gov2/trec-tb-2004 queries



[query_id]    [title]    [description]    [narrative]
...

You can find more details about the CLI here.

No example available for PyTerrier

docs

25M docs

Inherits docs from gov2

Language: en

Document type:

Gov2Doc: (namedtuple)

doc_id: str
url: str
http_headers: str
body: bytes
body_content_type: str

Examples:

import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2004")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, http_headers, body, body_content_type>

You can find more details about the Python API here.

CLI

ir_datasets export gov2/trec-tb-2004 docs



[doc_id]    [url]    [http_headers]    [body]    [body_content_type]
...

You can find more details about the CLI here.

No example available for PyTerrier

qrels

58K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
0	Not Relevant	`47K`	81.7%
1	Relevant	`9.3K`	16.1%
2	Highly Relevant	`1.3K`	2.2%

Examples:

import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2004")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export gov2/trec-tb-2004 qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

\cite{Clarke2004TrecTerabyte}

Bibtex:

@inproceedings{Clarke2004TrecTerabyte, title={Overview of the TREC 2004 Terabyte Track}, author={Charles Clarke and Nick Craswell and Ian Soboroff}, booktitle={TREC}, year={2004} }

{
  "docs": {
    "count": 25205179,
    "fields": {
      "doc_id": {
        "max_len": 17,
        "common_prefix": "GX"
      }
    }
  },
  "queries": {
    "count": 50
  },
  "qrels": {
    "count": 58077,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "0": 47460,
          "1": 9327,
          "2": 1290
        }
      }
    }
  }
}

`"gov2/trec-tb-2005"`

The TREC Terabyte Track 2005 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

50 queries

Language: en

Query type:

TrecQuery: (namedtuple)

query_id: str
title: str
description: str
narrative: str

Examples:

import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2005")
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, description, narrative>

You can find more details about the Python API here.

CLI

ir_datasets export gov2/trec-tb-2005 queries



[query_id]    [title]    [description]    [narrative]
...

You can find more details about the CLI here.

No example available for PyTerrier

docs

25M docs

Inherits docs from gov2

Language: en

Document type:

Gov2Doc: (namedtuple)

doc_id: str
url: str
http_headers: str
body: bytes
body_content_type: str

Examples:

import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2005")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, http_headers, body, body_content_type>

You can find more details about the Python API here.

CLI

ir_datasets export gov2/trec-tb-2005 docs



[doc_id]    [url]    [http_headers]    [body]    [body_content_type]
...

You can find more details about the CLI here.

No example available for PyTerrier

qrels

45K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
0	Not Relevant	`35K`	77.0%
1	Relevant	`7.8K`	17.2%
2	Highly Relevant	`2.6K`	5.8%

Examples:

import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2005")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export gov2/trec-tb-2005 qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

\cite{Clarke2005TrecTerabyte}

Bibtex:

@inproceedings{Clarke2005TrecTerabyte, title={The TREC 2005 Terabyte Track}, author={Charles L. A. Clark and Falk Scholer and Ian Soboroff}, booktitle={TREC}, year={2005} }

{
  "docs": {
    "count": 25205179,
    "fields": {
      "doc_id": {
        "max_len": 17,
        "common_prefix": "GX"
      }
    }
  },
  "queries": {
    "count": 50
  },
  "qrels": {
    "count": 45291,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "0": 34884,
          "1": 7772,
          "2": 2635
        }
      }
    }
  }
}

`"gov2/trec-tb-2005/efficiency"`

The TREC Terabyte Track 2005 efficiency ranking benchmark. Contains 50,000 queries from a search engine, including the 50 topics from gov2/trec-tb-2005. Only the 50 topics have judgments.

50K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2005/efficiency")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export gov2/trec-tb-2005/efficiency queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

docs

25M docs

Inherits docs from gov2

Language: en

Document type:

Gov2Doc: (namedtuple)

doc_id: str
url: str
http_headers: str
body: bytes
body_content_type: str

Examples:

import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2005/efficiency")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, http_headers, body, body_content_type>

You can find more details about the Python API here.

CLI

ir_datasets export gov2/trec-tb-2005/efficiency docs



[doc_id]    [url]    [http_headers]    [body]    [body_content_type]
...

You can find more details about the CLI here.

No example available for PyTerrier

qrels

45K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
0	Not Relevant	`35K`	77.0%
1	Relevant	`7.8K`	17.2%
2	Highly Relevant	`2.6K`	5.8%

Examples:

import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2005/efficiency")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export gov2/trec-tb-2005/efficiency qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

\cite{Clarke2005TrecTerabyte}

Bibtex:

@inproceedings{Clarke2005TrecTerabyte, title={The TREC 2005 Terabyte Track}, author={Charles L. A. Clark and Falk Scholer and Ian Soboroff}, booktitle={TREC}, year={2005} }

{
  "docs": {
    "count": 25205179,
    "fields": {
      "doc_id": {
        "max_len": 17,
        "common_prefix": "GX"
      }
    }
  },
  "queries": {
    "count": 50000
  },
  "qrels": {
    "count": 45291,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "0": 34884,
          "1": 7772,
          "2": 2635
        }
      }
    }
  }
}

`"gov2/trec-tb-2005/named-page"`

The TREC Terabyte Track 2005 named page ranking benchmark. Contains 252 queries with titles that resemble bookmark labels. Relevance judgments include near-duplicate pages and other pages that may satisfy the bookmark label.

252 queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2005/named-page")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export gov2/trec-tb-2005/named-page queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

docs

25M docs

Inherits docs from gov2

Language: en

Document type:

Gov2Doc: (namedtuple)

doc_id: str
url: str
http_headers: str
body: bytes
body_content_type: str

Examples:

import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2005/named-page")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, http_headers, body, body_content_type>

You can find more details about the Python API here.

CLI

ir_datasets export gov2/trec-tb-2005/named-page docs



[doc_id]    [url]    [http_headers]    [body]    [body_content_type]
...

You can find more details about the CLI here.

No example available for PyTerrier

qrels

12K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
0	Not Relevant	`0`	0.0%
1	Relevant	`12K`	100.0%

Examples:

import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2005/named-page")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export gov2/trec-tb-2005/named-page qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

\cite{Clarke2005TrecTerabyte}

Bibtex:

@inproceedings{Clarke2005TrecTerabyte, title={The TREC 2005 Terabyte Track}, author={Charles L. A. Clark and Falk Scholer and Ian Soboroff}, booktitle={TREC}, year={2005} }

{
  "docs": {
    "count": 25205179,
    "fields": {
      "doc_id": {
        "max_len": 17,
        "common_prefix": "GX"
      }
    }
  },
  "queries": {
    "count": 252
  },
  "qrels": {
    "count": 11729,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 11729
        }
      }
    }
  }
}

`"gov2/trec-tb-2006"`

The TREC Terabyte Track 2006 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

50 queries

Language: en

Query type:

TrecQuery: (namedtuple)

query_id: str
title: str
description: str
narrative: str

Examples:

import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2006")
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, description, narrative>

You can find more details about the Python API here.

CLI

ir_datasets export gov2/trec-tb-2006 queries



[query_id]    [title]    [description]    [narrative]
...

You can find more details about the CLI here.

No example available for PyTerrier

docs

25M docs

Inherits docs from gov2

Language: en

Document type:

Gov2Doc: (namedtuple)

doc_id: str
url: str
http_headers: str
body: bytes
body_content_type: str

Examples:

import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2006")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, http_headers, body, body_content_type>

You can find more details about the Python API here.

CLI

ir_datasets export gov2/trec-tb-2006 docs



[doc_id]    [url]    [http_headers]    [body]    [body_content_type]
...

You can find more details about the CLI here.

No example available for PyTerrier

qrels

32K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
0	Not Relevant	`26K`	81.6%
1	Relevant	`5.5K`	17.1%
2	Highly Relevant	`426`	1.3%

Examples:

import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2006")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export gov2/trec-tb-2006 qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

\cite{Buttcher2006TrecTerabyte}

Bibtex:

@inproceedings{Buttcher2006TrecTerabyte, title={The TREC 2006 Terabyte Track}, author={Stefan B\"uttcher and Charles L. A. Clarke and Ian Soboroff}, booktitle={TREC}, year={2006} }

{
  "docs": {
    "count": 25205179,
    "fields": {
      "doc_id": {
        "max_len": 17,
        "common_prefix": "GX"
      }
    }
  },
  "queries": {
    "count": 50
  },
  "qrels": {
    "count": 31984,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "0": 26091,
          "1": 5467,
          "2": 426
        }
      }
    }
  }
}

`"gov2/trec-tb-2006/efficiency"`

The TREC Terabyte Track 2006 efficiency ranking benchmark. Contains 100,000 queries from a search engine, including the 50 topics from gov2/trec-tb-2006. Only the 50 topics have judgments.

100K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2006/efficiency")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export gov2/trec-tb-2006/efficiency queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

docs

25M docs

Inherits docs from gov2

Language: en

Document type:

Gov2Doc: (namedtuple)

doc_id: str
url: str
http_headers: str
body: bytes
body_content_type: str

Examples:

import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2006/efficiency")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, http_headers, body, body_content_type>

You can find more details about the Python API here.

CLI

ir_datasets export gov2/trec-tb-2006/efficiency docs



[doc_id]    [url]    [http_headers]    [body]    [body_content_type]
...

You can find more details about the CLI here.

No example available for PyTerrier

qrels

32K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
0	Not Relevant	`26K`	81.6%
1	Relevant	`5.5K`	17.1%
2	Highly Relevant	`426`	1.3%

Examples:

import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2006/efficiency")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export gov2/trec-tb-2006/efficiency qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

\cite{Buttcher2006TrecTerabyte}

Bibtex:

@inproceedings{Buttcher2006TrecTerabyte, title={The TREC 2006 Terabyte Track}, author={Stefan B\"uttcher and Charles L. A. Clarke and Ian Soboroff}, booktitle={TREC}, year={2006} }

{
  "docs": {
    "count": 25205179,
    "fields": {
      "doc_id": {
        "max_len": 17,
        "common_prefix": "GX"
      }
    }
  },
  "queries": {
    "count": 100000
  },
  "qrels": {
    "count": 31984,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "0": 26091,
          "1": 5467,
          "2": 426
        }
      }
    }
  }
}

`"gov2/trec-tb-2006/efficiency/10k"`

Small stream from gov2/trec-tb-2006/efficiency, with 10,000 queries.

10K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2006/efficiency/10k")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export gov2/trec-tb-2006/efficiency/10k queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

docs

25M docs

Inherits docs from gov2

Language: en

Document type:

Gov2Doc: (namedtuple)

doc_id: str
url: str
http_headers: str
body: bytes
body_content_type: str

Examples:

import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2006/efficiency/10k")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, http_headers, body, body_content_type>

You can find more details about the Python API here.

CLI

ir_datasets export gov2/trec-tb-2006/efficiency/10k docs



[doc_id]    [url]    [http_headers]    [body]    [body_content_type]
...

You can find more details about the CLI here.

No example available for PyTerrier

\cite{Buttcher2006TrecTerabyte}

Bibtex:

@inproceedings{Buttcher2006TrecTerabyte, title={The TREC 2006 Terabyte Track}, author={Stefan B\"uttcher and Charles L. A. Clarke and Ian Soboroff}, booktitle={TREC}, year={2006} }

{
  "docs": {
    "count": 25205179,
    "fields": {
      "doc_id": {
        "max_len": 17,
        "common_prefix": "GX"
      }
    }
  },
  "queries": {
    "count": 10000
  }
}

`"gov2/trec-tb-2006/efficiency/stream1"`

Stream 1 of gov2/trec-tb-2006/efficiency (25,000 queries).

25K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2006/efficiency/stream1")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export gov2/trec-tb-2006/efficiency/stream1 queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

docs

25M docs

Inherits docs from gov2

Language: en

Document type:

Gov2Doc: (namedtuple)

doc_id: str
url: str
http_headers: str
body: bytes
body_content_type: str

Examples:

import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2006/efficiency/stream1")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, http_headers, body, body_content_type>

You can find more details about the Python API here.

CLI

ir_datasets export gov2/trec-tb-2006/efficiency/stream1 docs



[doc_id]    [url]    [http_headers]    [body]    [body_content_type]
...

You can find more details about the CLI here.

No example available for PyTerrier

\cite{Buttcher2006TrecTerabyte}

Bibtex:

@inproceedings{Buttcher2006TrecTerabyte, title={The TREC 2006 Terabyte Track}, author={Stefan B\"uttcher and Charles L. A. Clarke and Ian Soboroff}, booktitle={TREC}, year={2006} }

{
  "docs": {
    "count": 25205179,
    "fields": {
      "doc_id": {
        "max_len": 17,
        "common_prefix": "GX"
      }
    }
  },
  "queries": {
    "count": 25000
  }
}

`"gov2/trec-tb-2006/efficiency/stream2"`

Stream 2 of gov2/trec-tb-2006/efficiency (25,000 queries).

25K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2006/efficiency/stream2")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export gov2/trec-tb-2006/efficiency/stream2 queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

docs

25M docs

Inherits docs from gov2

Language: en

Document type:

Gov2Doc: (namedtuple)

doc_id: str
url: str
http_headers: str
body: bytes
body_content_type: str

Examples:

import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2006/efficiency/stream2")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, http_headers, body, body_content_type>

You can find more details about the Python API here.

CLI

ir_datasets export gov2/trec-tb-2006/efficiency/stream2 docs



[doc_id]    [url]    [http_headers]    [body]    [body_content_type]
...

You can find more details about the CLI here.

No example available for PyTerrier

\cite{Buttcher2006TrecTerabyte}

Bibtex:

@inproceedings{Buttcher2006TrecTerabyte, title={The TREC 2006 Terabyte Track}, author={Stefan B\"uttcher and Charles L. A. Clarke and Ian Soboroff}, booktitle={TREC}, year={2006} }

{
  "docs": {
    "count": 25205179,
    "fields": {
      "doc_id": {
        "max_len": 17,
        "common_prefix": "GX"
      }
    }
  },
  "queries": {
    "count": 25000
  }
}

`"gov2/trec-tb-2006/efficiency/stream3"`

Stream 3 of gov2/trec-tb-2006/efficiency (25,000 queries).

25K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2006/efficiency/stream3")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export gov2/trec-tb-2006/efficiency/stream3 queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

docs

25M docs

Inherits docs from gov2

Language: en

Document type:

Gov2Doc: (namedtuple)

doc_id: str
url: str
http_headers: str
body: bytes
body_content_type: str

Examples:

import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2006/efficiency/stream3")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, http_headers, body, body_content_type>

You can find more details about the Python API here.

CLI

ir_datasets export gov2/trec-tb-2006/efficiency/stream3 docs



[doc_id]    [url]    [http_headers]    [body]    [body_content_type]
...

You can find more details about the CLI here.

No example available for PyTerrier

qrels

32K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
0	Not Relevant	`26K`	81.6%
1	Relevant	`5.5K`	17.1%
2	Highly Relevant	`426`	1.3%

Examples:

import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2006/efficiency/stream3")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export gov2/trec-tb-2006/efficiency/stream3 qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

\cite{Buttcher2006TrecTerabyte}

Bibtex:

@inproceedings{Buttcher2006TrecTerabyte, title={The TREC 2006 Terabyte Track}, author={Stefan B\"uttcher and Charles L. A. Clarke and Ian Soboroff}, booktitle={TREC}, year={2006} }

{
  "docs": {
    "count": 25205179,
    "fields": {
      "doc_id": {
        "max_len": 17,
        "common_prefix": "GX"
      }
    }
  },
  "queries": {
    "count": 25000
  },
  "qrels": {
    "count": 31984,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "0": 26091,
          "1": 5467,
          "2": 426
        }
      }
    }
  }
}

`"gov2/trec-tb-2006/efficiency/stream4"`

Stream 4 of gov2/trec-tb-2006/efficiency (25,000 queries).

25K queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2006/efficiency/stream4")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export gov2/trec-tb-2006/efficiency/stream4 queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

docs

25M docs

Inherits docs from gov2

Language: en

Document type:

Gov2Doc: (namedtuple)

doc_id: str
url: str
http_headers: str
body: bytes
body_content_type: str

Examples:

import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2006/efficiency/stream4")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, http_headers, body, body_content_type>

You can find more details about the Python API here.

CLI

ir_datasets export gov2/trec-tb-2006/efficiency/stream4 docs



[doc_id]    [url]    [http_headers]    [body]    [body_content_type]
...

You can find more details about the CLI here.

No example available for PyTerrier

\cite{Buttcher2006TrecTerabyte}

Bibtex:

@inproceedings{Buttcher2006TrecTerabyte, title={The TREC 2006 Terabyte Track}, author={Stefan B\"uttcher and Charles L. A. Clarke and Ian Soboroff}, booktitle={TREC}, year={2006} }

{
  "docs": {
    "count": 25205179,
    "fields": {
      "doc_id": {
        "max_len": 17,
        "common_prefix": "GX"
      }
    }
  },
  "queries": {
    "count": 25000
  }
}

`"gov2/trec-tb-2006/named-page"`

The TREC Terabyte Track 2006 named page ranking benchmark. Contains 181 queries with titles that resemble bookmark labels. Relevance judgments include near-duplicate pages and other pages that may satisfy the bookmark label.

181 queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2006/named-page")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export gov2/trec-tb-2006/named-page queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

docs

25M docs

Inherits docs from gov2

Language: en

Document type:

Gov2Doc: (namedtuple)

doc_id: str
url: str
http_headers: str
body: bytes
body_content_type: str

Examples:

import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2006/named-page")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, http_headers, body, body_content_type>

You can find more details about the Python API here.

CLI

ir_datasets export gov2/trec-tb-2006/named-page docs



[doc_id]    [url]    [http_headers]    [body]    [body_content_type]
...

You can find more details about the CLI here.

No example available for PyTerrier

qrels

2.4K qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition	Count	%
0	Not Relevant	`1.6K`	65.8%
1	Relevant	`807`	34.2%

Examples:

import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2006/named-page")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export gov2/trec-tb-2006/named-page qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

\cite{Buttcher2006TrecTerabyte}

Bibtex:

@inproceedings{Buttcher2006TrecTerabyte, title={The TREC 2006 Terabyte Track}, author={Stefan B\"uttcher and Charles L. A. Clarke and Ian Soboroff}, booktitle={TREC}, year={2006} }