ir_datasets: GOV2To use this dataset, you need a copy of GOV2, provided by the University of Glasgow.
Your organization may already have a copy. If this is the case, you may only need to complete a new "Individual Argeement". Otherwise, your organization will need to file the "Organizational agreement" and pay a fee to UoG to get a copy. The data are provided as hard drives that are shipped to you.
Once you have the data, ir_datasets will need the GOV2_data directory.
ir_datasets expects the above directory to be copied/linked under ~/.ir_datasets/gov/corpus.
GOV2 web document collection. Used for the TREC Terabyte Track.
The dataset is obtained for a fee from UoG, and is shipped as a hard drive. More information is provided here.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("gov2")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, http_headers, body, body_content_type>
You can find more details about the Python API here.
ir_datasets export gov2 docs
[doc_id]    [url]    [http_headers]    [body]    [body_content_type]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.gov2')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
{
  "docs": {
    "count": 25205179,
    "fields": {
      "doc_id": {
        "max_len": 17,
        "common_prefix": "GX"
      }
    }
  }
}
TREC 2007 Million Query track.
Language: multiple/other/unknown
Examples:
import ir_datasets
dataset = ir_datasets.load("gov2/trec-mq-2007")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export gov2/trec-mq-2007 queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.gov2.trec-mq-2007.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from gov2
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("gov2/trec-mq-2007")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, http_headers, body, body_content_type>
You can find more details about the Python API here.
ir_datasets export gov2/trec-mq-2007 docs
[doc_id]    [url]    [http_headers]    [body]    [body_content_type]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.gov2.trec-mq-2007')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 0 | Not Relevant | 54K | 74.4% | 
| 1 | Relevant | 15K | 20.1% | 
| 2 | Highly Relevant | 4.0K | 5.5% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("gov2/trec-mq-2007")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, method, iprob>
You can find more details about the Python API here.
ir_datasets export gov2/trec-mq-2007 qrels --format tsv
[query_id]    [doc_id]    [relevance]    [method]    [iprob]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.gov2.trec-mq-2007.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{Allen2007MQ, title={Million Query Track 2007 Overview}, author={James Allan and Ben Carterette and Javed A. Aslam and Virgil Pavlu and Blagovest Dachev and Evangelos Kanoulas}, booktitle={TREC}, year={2007} }{
  "docs": {
    "count": 25205179,
    "fields": {
      "doc_id": {
        "max_len": 17,
        "common_prefix": "GX"
      }
    }
  },
  "queries": {
    "count": 10000
  },
  "qrels": {
    "count": 73015,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "0": 54333,
          "1": 14689,
          "2": 3993
        }
      }
    }
  }
}
TREC 2008 Million Query track.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("gov2/trec-mq-2008")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export gov2/trec-mq-2008 queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.gov2.trec-mq-2008.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from gov2
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("gov2/trec-mq-2008")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, http_headers, body, body_content_type>
You can find more details about the Python API here.
ir_datasets export gov2/trec-mq-2008 docs
[doc_id]    [url]    [http_headers]    [body]    [body_content_type]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.gov2.trec-mq-2008')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 0 | Not Relevant | 12K | 80.7% | 
| 1 | Relevant | 2.9K | 19.3% | 
| 2 | Highly Relevant | 0 | 0.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("gov2/trec-mq-2008")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, method, iprob>
You can find more details about the Python API here.
ir_datasets export gov2/trec-mq-2008 qrels --format tsv
[query_id]    [doc_id]    [relevance]    [method]    [iprob]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.gov2.trec-mq-2008.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{Allen2008MQ, title={Million Query Track 2008 Overview}, author={James Allan and Javed A. Aslam and Ben Carterette and Virgil Pavlu and Evangelos Kanoulas}, booktitle={TREC}, year={2008} }{
  "docs": {
    "count": 25205179,
    "fields": {
      "doc_id": {
        "max_len": 17,
        "common_prefix": "GX"
      }
    }
  },
  "queries": {
    "count": 10000
  },
  "qrels": {
    "count": 15211,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "0": 12279,
          "1": 2932
        }
      }
    }
  }
}
The TREC Terabyte Track 2004 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2004")
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, description, narrative>
You can find more details about the Python API here.
ir_datasets export gov2/trec-tb-2004 queries
[query_id]    [title]    [description]    [narrative]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.gov2.trec-tb-2004.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from gov2
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2004")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, http_headers, body, body_content_type>
You can find more details about the Python API here.
ir_datasets export gov2/trec-tb-2004 docs
[doc_id]    [url]    [http_headers]    [body]    [body_content_type]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.gov2.trec-tb-2004')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 0 | Not Relevant | 47K | 81.7% | 
| 1 | Relevant | 9.3K | 16.1% | 
| 2 | Highly Relevant | 1.3K | 2.2% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2004")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export gov2/trec-tb-2004 qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.gov2.trec-tb-2004.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{Clarke2004TrecTerabyte, title={Overview of the TREC 2004 Terabyte Track}, author={Charles Clarke and Nick Craswell and Ian Soboroff}, booktitle={TREC}, year={2004} }{
  "docs": {
    "count": 25205179,
    "fields": {
      "doc_id": {
        "max_len": 17,
        "common_prefix": "GX"
      }
    }
  },
  "queries": {
    "count": 50
  },
  "qrels": {
    "count": 58077,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "0": 47460,
          "1": 9327,
          "2": 1290
        }
      }
    }
  }
}
The TREC Terabyte Track 2005 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2005")
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, description, narrative>
You can find more details about the Python API here.
ir_datasets export gov2/trec-tb-2005 queries
[query_id]    [title]    [description]    [narrative]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.gov2.trec-tb-2005.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from gov2
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2005")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, http_headers, body, body_content_type>
You can find more details about the Python API here.
ir_datasets export gov2/trec-tb-2005 docs
[doc_id]    [url]    [http_headers]    [body]    [body_content_type]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.gov2.trec-tb-2005')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 0 | Not Relevant | 35K | 77.0% | 
| 1 | Relevant | 7.8K | 17.2% | 
| 2 | Highly Relevant | 2.6K | 5.8% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2005")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export gov2/trec-tb-2005 qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.gov2.trec-tb-2005.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{Clarke2005TrecTerabyte, title={The TREC 2005 Terabyte Track}, author={Charles L. A. Clark and Falk Scholer and Ian Soboroff}, booktitle={TREC}, year={2005} }{
  "docs": {
    "count": 25205179,
    "fields": {
      "doc_id": {
        "max_len": 17,
        "common_prefix": "GX"
      }
    }
  },
  "queries": {
    "count": 50
  },
  "qrels": {
    "count": 45291,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "0": 34884,
          "1": 7772,
          "2": 2635
        }
      }
    }
  }
}
The TREC Terabyte Track 2005 efficiency ranking benchmark. Contains 50,000 queries from a search engine, including the 50 topics from gov2/trec-tb-2005. Only the 50 topics have judgments.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2005/efficiency")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export gov2/trec-tb-2005/efficiency queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.gov2.trec-tb-2005.efficiency.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from gov2
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2005/efficiency")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, http_headers, body, body_content_type>
You can find more details about the Python API here.
ir_datasets export gov2/trec-tb-2005/efficiency docs
[doc_id]    [url]    [http_headers]    [body]    [body_content_type]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.gov2.trec-tb-2005.efficiency')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 0 | Not Relevant | 35K | 77.0% | 
| 1 | Relevant | 7.8K | 17.2% | 
| 2 | Highly Relevant | 2.6K | 5.8% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2005/efficiency")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export gov2/trec-tb-2005/efficiency qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.gov2.trec-tb-2005.efficiency.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{Clarke2005TrecTerabyte, title={The TREC 2005 Terabyte Track}, author={Charles L. A. Clark and Falk Scholer and Ian Soboroff}, booktitle={TREC}, year={2005} }{
  "docs": {
    "count": 25205179,
    "fields": {
      "doc_id": {
        "max_len": 17,
        "common_prefix": "GX"
      }
    }
  },
  "queries": {
    "count": 50000
  },
  "qrels": {
    "count": 45291,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "0": 34884,
          "1": 7772,
          "2": 2635
        }
      }
    }
  }
}
The TREC Terabyte Track 2005 named page ranking benchmark. Contains 252 queries with titles that resemble bookmark labels. Relevance judgments include near-duplicate pages and other pages that may satisfy the bookmark label.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2005/named-page")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export gov2/trec-tb-2005/named-page queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.gov2.trec-tb-2005.named-page.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from gov2
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2005/named-page")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, http_headers, body, body_content_type>
You can find more details about the Python API here.
ir_datasets export gov2/trec-tb-2005/named-page docs
[doc_id]    [url]    [http_headers]    [body]    [body_content_type]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.gov2.trec-tb-2005.named-page')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 0 | Not Relevant | 0 | 0.0% | 
| 1 | Relevant | 12K | 100.0% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2005/named-page")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export gov2/trec-tb-2005/named-page qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.gov2.trec-tb-2005.named-page.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{Clarke2005TrecTerabyte, title={The TREC 2005 Terabyte Track}, author={Charles L. A. Clark and Falk Scholer and Ian Soboroff}, booktitle={TREC}, year={2005} }{
  "docs": {
    "count": 25205179,
    "fields": {
      "doc_id": {
        "max_len": 17,
        "common_prefix": "GX"
      }
    }
  },
  "queries": {
    "count": 252
  },
  "qrels": {
    "count": 11729,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 11729
        }
      }
    }
  }
}
The TREC Terabyte Track 2006 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2006")
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, description, narrative>
You can find more details about the Python API here.
ir_datasets export gov2/trec-tb-2006 queries
[query_id]    [title]    [description]    [narrative]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.gov2.trec-tb-2006.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from gov2
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2006")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, http_headers, body, body_content_type>
You can find more details about the Python API here.
ir_datasets export gov2/trec-tb-2006 docs
[doc_id]    [url]    [http_headers]    [body]    [body_content_type]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.gov2.trec-tb-2006')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 0 | Not Relevant | 26K | 81.6% | 
| 1 | Relevant | 5.5K | 17.1% | 
| 2 | Highly Relevant | 426 | 1.3% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2006")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export gov2/trec-tb-2006 qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.gov2.trec-tb-2006.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{Buttcher2006TrecTerabyte, title={The TREC 2006 Terabyte Track}, author={Stefan B\"uttcher and Charles L. A. Clarke and Ian Soboroff}, booktitle={TREC}, year={2006} }{
  "docs": {
    "count": 25205179,
    "fields": {
      "doc_id": {
        "max_len": 17,
        "common_prefix": "GX"
      }
    }
  },
  "queries": {
    "count": 50
  },
  "qrels": {
    "count": 31984,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "0": 26091,
          "1": 5467,
          "2": 426
        }
      }
    }
  }
}
The TREC Terabyte Track 2006 efficiency ranking benchmark. Contains 100,000 queries from a search engine, including the 50 topics from gov2/trec-tb-2006. Only the 50 topics have judgments.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2006/efficiency")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export gov2/trec-tb-2006/efficiency queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.gov2.trec-tb-2006.efficiency.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from gov2
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2006/efficiency")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, http_headers, body, body_content_type>
You can find more details about the Python API here.
ir_datasets export gov2/trec-tb-2006/efficiency docs
[doc_id]    [url]    [http_headers]    [body]    [body_content_type]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.gov2.trec-tb-2006.efficiency')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 0 | Not Relevant | 26K | 81.6% | 
| 1 | Relevant | 5.5K | 17.1% | 
| 2 | Highly Relevant | 426 | 1.3% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2006/efficiency")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export gov2/trec-tb-2006/efficiency qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.gov2.trec-tb-2006.efficiency.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{Buttcher2006TrecTerabyte, title={The TREC 2006 Terabyte Track}, author={Stefan B\"uttcher and Charles L. A. Clarke and Ian Soboroff}, booktitle={TREC}, year={2006} }{
  "docs": {
    "count": 25205179,
    "fields": {
      "doc_id": {
        "max_len": 17,
        "common_prefix": "GX"
      }
    }
  },
  "queries": {
    "count": 100000
  },
  "qrels": {
    "count": 31984,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "0": 26091,
          "1": 5467,
          "2": 426
        }
      }
    }
  }
}
Small stream from gov2/trec-tb-2006/efficiency, with 10,000 queries.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2006/efficiency/10k")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export gov2/trec-tb-2006/efficiency/10k queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.gov2.trec-tb-2006.efficiency.10k.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from gov2
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2006/efficiency/10k")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, http_headers, body, body_content_type>
You can find more details about the Python API here.
ir_datasets export gov2/trec-tb-2006/efficiency/10k docs
[doc_id]    [url]    [http_headers]    [body]    [body_content_type]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.gov2.trec-tb-2006.efficiency.10k')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@inproceedings{Buttcher2006TrecTerabyte, title={The TREC 2006 Terabyte Track}, author={Stefan B\"uttcher and Charles L. A. Clarke and Ian Soboroff}, booktitle={TREC}, year={2006} }{
  "docs": {
    "count": 25205179,
    "fields": {
      "doc_id": {
        "max_len": 17,
        "common_prefix": "GX"
      }
    }
  },
  "queries": {
    "count": 10000
  }
}
Stream 1 of gov2/trec-tb-2006/efficiency (25,000 queries).
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2006/efficiency/stream1")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export gov2/trec-tb-2006/efficiency/stream1 queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.gov2.trec-tb-2006.efficiency.stream1.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from gov2
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2006/efficiency/stream1")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, http_headers, body, body_content_type>
You can find more details about the Python API here.
ir_datasets export gov2/trec-tb-2006/efficiency/stream1 docs
[doc_id]    [url]    [http_headers]    [body]    [body_content_type]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.gov2.trec-tb-2006.efficiency.stream1')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@inproceedings{Buttcher2006TrecTerabyte, title={The TREC 2006 Terabyte Track}, author={Stefan B\"uttcher and Charles L. A. Clarke and Ian Soboroff}, booktitle={TREC}, year={2006} }{
  "docs": {
    "count": 25205179,
    "fields": {
      "doc_id": {
        "max_len": 17,
        "common_prefix": "GX"
      }
    }
  },
  "queries": {
    "count": 25000
  }
}
Stream 2 of gov2/trec-tb-2006/efficiency (25,000 queries).
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2006/efficiency/stream2")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export gov2/trec-tb-2006/efficiency/stream2 queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.gov2.trec-tb-2006.efficiency.stream2.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from gov2
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2006/efficiency/stream2")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, http_headers, body, body_content_type>
You can find more details about the Python API here.
ir_datasets export gov2/trec-tb-2006/efficiency/stream2 docs
[doc_id]    [url]    [http_headers]    [body]    [body_content_type]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.gov2.trec-tb-2006.efficiency.stream2')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@inproceedings{Buttcher2006TrecTerabyte, title={The TREC 2006 Terabyte Track}, author={Stefan B\"uttcher and Charles L. A. Clarke and Ian Soboroff}, booktitle={TREC}, year={2006} }{
  "docs": {
    "count": 25205179,
    "fields": {
      "doc_id": {
        "max_len": 17,
        "common_prefix": "GX"
      }
    }
  },
  "queries": {
    "count": 25000
  }
}
Stream 3 of gov2/trec-tb-2006/efficiency (25,000 queries).
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2006/efficiency/stream3")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export gov2/trec-tb-2006/efficiency/stream3 queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.gov2.trec-tb-2006.efficiency.stream3.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from gov2
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2006/efficiency/stream3")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, http_headers, body, body_content_type>
You can find more details about the Python API here.
ir_datasets export gov2/trec-tb-2006/efficiency/stream3 docs
[doc_id]    [url]    [http_headers]    [body]    [body_content_type]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.gov2.trec-tb-2006.efficiency.stream3')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 0 | Not Relevant | 26K | 81.6% | 
| 1 | Relevant | 5.5K | 17.1% | 
| 2 | Highly Relevant | 426 | 1.3% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2006/efficiency/stream3")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export gov2/trec-tb-2006/efficiency/stream3 qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.gov2.trec-tb-2006.efficiency.stream3.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{Buttcher2006TrecTerabyte, title={The TREC 2006 Terabyte Track}, author={Stefan B\"uttcher and Charles L. A. Clarke and Ian Soboroff}, booktitle={TREC}, year={2006} }{
  "docs": {
    "count": 25205179,
    "fields": {
      "doc_id": {
        "max_len": 17,
        "common_prefix": "GX"
      }
    }
  },
  "queries": {
    "count": 25000
  },
  "qrels": {
    "count": 31984,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "0": 26091,
          "1": 5467,
          "2": 426
        }
      }
    }
  }
}
Stream 4 of gov2/trec-tb-2006/efficiency (25,000 queries).
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2006/efficiency/stream4")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export gov2/trec-tb-2006/efficiency/stream4 queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.gov2.trec-tb-2006.efficiency.stream4.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from gov2
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2006/efficiency/stream4")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, http_headers, body, body_content_type>
You can find more details about the Python API here.
ir_datasets export gov2/trec-tb-2006/efficiency/stream4 docs
[doc_id]    [url]    [http_headers]    [body]    [body_content_type]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.gov2.trec-tb-2006.efficiency.stream4')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Bibtex:
@inproceedings{Buttcher2006TrecTerabyte, title={The TREC 2006 Terabyte Track}, author={Stefan B\"uttcher and Charles L. A. Clarke and Ian Soboroff}, booktitle={TREC}, year={2006} }{
  "docs": {
    "count": 25205179,
    "fields": {
      "doc_id": {
        "max_len": 17,
        "common_prefix": "GX"
      }
    }
  },
  "queries": {
    "count": 25000
  }
}
The TREC Terabyte Track 2006 named page ranking benchmark. Contains 181 queries with titles that resemble bookmark labels. Relevance judgments include near-duplicate pages and other pages that may satisfy the bookmark label.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2006/named-page")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>
You can find more details about the Python API here.
ir_datasets export gov2/trec-tb-2006/named-page queries
[query_id]    [text]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
topics = prepare_dataset('irds.gov2.trec-tb-2006.named-page.queries')  # AdhocTopics
for topic in topics.iter():
    print(topic)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocTopics.
Inherits docs from gov2
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2006/named-page")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, http_headers, body, body_content_type>
You can find more details about the Python API here.
ir_datasets export gov2/trec-tb-2006/named-page docs
[doc_id]    [url]    [http_headers]    [body]    [body_content_type]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
dataset = prepare_dataset('irds.gov2.trec-tb-2006.named-page')
for doc in dataset.iter_documents():
    print(doc)  # an AdhocDocumentStore
    break
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocDocumentStore
Relevance levels
| Rel. | Definition | Count | % | 
|---|---|---|---|
| 0 | Not Relevant | 1.6K | 65.8% | 
| 1 | Relevant | 807 | 34.2% | 
Examples:
import ir_datasets
dataset = ir_datasets.load("gov2/trec-tb-2006/named-page")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>
You can find more details about the Python API here.
ir_datasets export gov2/trec-tb-2006/named-page qrels --format tsv
[query_id]    [doc_id]    [relevance]    [iteration]
...
You can find more details about the CLI here.
No example available for PyTerrier
from datamaestro import prepare_dataset
qrels = prepare_dataset('irds.gov2.trec-tb-2006.named-page.qrels')  # AdhocAssessments
for topic_qrels in qrels.iter():
    print(topic_qrels)  # An AdhocTopic
This examples requires that experimaestro-ir be installed. For more information about the returned object, see the documentation about AdhocAssessments.
Bibtex:
@inproceedings{Buttcher2006TrecTerabyte, title={The TREC 2006 Terabyte Track}, author={Stefan B\"uttcher and Charles L. A. Clarke and Ian Soboroff}, booktitle={TREC}, year={2006} }{
  "docs": {
    "count": 25205179,
    "fields": {
      "doc_id": {
        "max_len": 17,
        "common_prefix": "GX"
      }
    }
  },
  "queries": {
    "count": 181
  },
  "qrels": {
    "count": 2361,
    "fields": {
      "relevance": {
        "counts_by_value": {
          "1": 807,
          "0": 1554
        }
      }
    }
  }
}