ir_datasets : ClueWeb12

import ir_datasets
dataset = ir_datasets.load("clueweb12")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, date, http_headers, body, body_content_type>

You can find more details about the Python API here.

CLI

ir_datasets export clueweb12 docs



[doc_id]    [url]    [date]    [http_headers]    [body]    [body_content_type]
...

You can find more details about the CLI here.

No example available for PyTerrier

`"clueweb12/b13"`

Official subset of the ClueWeb12 datasets with 52M web pages.

docs

Language: en

Document type:

WarcDoc: (namedtuple)

doc_id: str
url: str
date: str
http_headers: bytes
body: bytes
body_content_type: str

Examples:

import ir_datasets
dataset = ir_datasets.load("clueweb12/b13")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, date, http_headers, body, body_content_type>

You can find more details about the Python API here.

CLI

ir_datasets export clueweb12/b13 docs



[doc_id]    [url]    [date]    [http_headers]    [body]    [body_content_type]
...

You can find more details about the CLI here.

No example available for PyTerrier

`"clueweb12/b13/clef-ehealth"`

The CLEF eHealth 2016-17 IR dataset. Contains consumer health queries and judgments containing trustworthiness and understandability scores, in addition to the normal relevance assessments.

This dataset contains the combined 2016 and 2017 relevance judgments, since the same queries were used in the two year. The assessment year can be distinguished using iteration (2016 is iteration 0, 2017 is iteration 1).

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("clueweb12/b13/clef-ehealth")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export clueweb12/b13/clef-ehealth queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

docs

Language: en

Note: Uses docs from clueweb12/b13

Document type:

WarcDoc: (namedtuple)

doc_id: str
url: str
date: str
http_headers: bytes
body: bytes
body_content_type: str

Examples:

import ir_datasets
dataset = ir_datasets.load("clueweb12/b13/clef-ehealth")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, date, http_headers, body, body_content_type>

You can find more details about the Python API here.

CLI

ir_datasets export clueweb12/b13/clef-ehealth docs



[doc_id]    [url]    [date]    [http_headers]    [body]    [body_content_type]
...

You can find more details about the CLI here.

No example available for PyTerrier

qrels

Query relevance judgment type:

EhealthQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
trustworthiness: int
understandability: int
iteration: str

Relevance levels

Rel.	Definition
0	Not relevant
1	Somewhat relevant
2	Highly relevant

Examples:

import ir_datasets
dataset = ir_datasets.load("clueweb12/b13/clef-ehealth")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, trustworthiness, understandability, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export clueweb12/b13/clef-ehealth qrels --format tsv



[query_id]    [doc_id]    [relevance]    [trustworthiness]    [understandability]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

bibtex: @inproceedings{Zuccon2016TheIT, title={The IR Task at the CLEF eHealth Evaluation Lab 2016: User-centred Health Information Retrieval}, author={Guido Zuccon and Joao Palotti and Lorraine Goeuriot and Liadh Kelly and Mihai Lupu and Pavel Pecina and Henning M{\"u}ller and Julie Budaher and Anthony Deacon}, booktitle={CLEF}, year={2016} } @inproceedings{Palotti2017CLEF, title={CLEF 2017 Task Overview: The IR Task at the eHealth Evaluation Lab - Evaluating Retrieval Methods for Consumer Health Search}, author={Joao Palotti and Guido Zuccon and Jimmy and Pavel Pecina and Mihai Lupu and Lorraine Goeuriot and Liadh Kelly and Allan Hanbury}, booktitle={CLEF}, year={2017} }

`"clueweb12/b13/clef-ehealth/cs"`

The CLEF eHealth 2016-17 IR dataset, with queries professionally translataed to Czech. See clueweb12/b13/clef-ehealth for more details.

Language: cs

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("clueweb12/b13/clef-ehealth/cs")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export clueweb12/b13/clef-ehealth/cs queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

docs

Language: en

Note: Uses docs from clueweb12/b13

Document type:

WarcDoc: (namedtuple)

doc_id: str
url: str
date: str
http_headers: bytes
body: bytes
body_content_type: str

Examples:

import ir_datasets
dataset = ir_datasets.load("clueweb12/b13/clef-ehealth/cs")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, date, http_headers, body, body_content_type>

You can find more details about the Python API here.

CLI

ir_datasets export clueweb12/b13/clef-ehealth/cs docs



[doc_id]    [url]    [date]    [http_headers]    [body]    [body_content_type]
...

You can find more details about the CLI here.

No example available for PyTerrier

qrels

Query relevance judgment type:

EhealthQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
trustworthiness: int
understandability: int
iteration: str

Relevance levels

Rel.	Definition
0	Not relevant
1	Somewhat relevant
2	Highly relevant

Examples:

import ir_datasets
dataset = ir_datasets.load("clueweb12/b13/clef-ehealth/cs")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, trustworthiness, understandability, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export clueweb12/b13/clef-ehealth/cs qrels --format tsv



[query_id]    [doc_id]    [relevance]    [trustworthiness]    [understandability]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

bibtex: @inproceedings{Zuccon2016TheIT, title={The IR Task at the CLEF eHealth Evaluation Lab 2016: User-centred Health Information Retrieval}, author={Guido Zuccon and Joao Palotti and Lorraine Goeuriot and Liadh Kelly and Mihai Lupu and Pavel Pecina and Henning M{\"u}ller and Julie Budaher and Anthony Deacon}, booktitle={CLEF}, year={2016} } @inproceedings{Palotti2017CLEF, title={CLEF 2017 Task Overview: The IR Task at the eHealth Evaluation Lab - Evaluating Retrieval Methods for Consumer Health Search}, author={Joao Palotti and Guido Zuccon and Jimmy and Pavel Pecina and Mihai Lupu and Lorraine Goeuriot and Liadh Kelly and Allan Hanbury}, booktitle={CLEF}, year={2017} }

`"clueweb12/b13/clef-ehealth/de"`

The CLEF eHealth 2016-17 IR dataset, with queries professionally translataed to German. See clueweb12/b13/clef-ehealth for more details.

Language: de

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("clueweb12/b13/clef-ehealth/de")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export clueweb12/b13/clef-ehealth/de queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

docs

Language: en

Note: Uses docs from clueweb12/b13

Document type:

WarcDoc: (namedtuple)

doc_id: str
url: str
date: str
http_headers: bytes
body: bytes
body_content_type: str

Examples:

import ir_datasets
dataset = ir_datasets.load("clueweb12/b13/clef-ehealth/de")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, date, http_headers, body, body_content_type>

You can find more details about the Python API here.

CLI

ir_datasets export clueweb12/b13/clef-ehealth/de docs



[doc_id]    [url]    [date]    [http_headers]    [body]    [body_content_type]
...

You can find more details about the CLI here.

No example available for PyTerrier

qrels

Query relevance judgment type:

EhealthQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
trustworthiness: int
understandability: int
iteration: str

Relevance levels

Rel.	Definition
0	Not relevant
1	Somewhat relevant
2	Highly relevant

Examples:

import ir_datasets
dataset = ir_datasets.load("clueweb12/b13/clef-ehealth/de")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, trustworthiness, understandability, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export clueweb12/b13/clef-ehealth/de qrels --format tsv



[query_id]    [doc_id]    [relevance]    [trustworthiness]    [understandability]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

bibtex: @inproceedings{Zuccon2016TheIT, title={The IR Task at the CLEF eHealth Evaluation Lab 2016: User-centred Health Information Retrieval}, author={Guido Zuccon and Joao Palotti and Lorraine Goeuriot and Liadh Kelly and Mihai Lupu and Pavel Pecina and Henning M{\"u}ller and Julie Budaher and Anthony Deacon}, booktitle={CLEF}, year={2016} } @inproceedings{Palotti2017CLEF, title={CLEF 2017 Task Overview: The IR Task at the eHealth Evaluation Lab - Evaluating Retrieval Methods for Consumer Health Search}, author={Joao Palotti and Guido Zuccon and Jimmy and Pavel Pecina and Mihai Lupu and Lorraine Goeuriot and Liadh Kelly and Allan Hanbury}, booktitle={CLEF}, year={2017} }

`"clueweb12/b13/clef-ehealth/fr"`

The CLEF eHealth 2016-17 IR dataset, with queries professionally translataed to French. See clueweb12/b13/clef-ehealth for more details.

Language: fr

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("clueweb12/b13/clef-ehealth/fr")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export clueweb12/b13/clef-ehealth/fr queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

docs

Language: en

Note: Uses docs from clueweb12/b13

Document type:

WarcDoc: (namedtuple)

doc_id: str
url: str
date: str
http_headers: bytes
body: bytes
body_content_type: str

Examples:

import ir_datasets
dataset = ir_datasets.load("clueweb12/b13/clef-ehealth/fr")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, date, http_headers, body, body_content_type>

You can find more details about the Python API here.

CLI

ir_datasets export clueweb12/b13/clef-ehealth/fr docs



[doc_id]    [url]    [date]    [http_headers]    [body]    [body_content_type]
...

You can find more details about the CLI here.

No example available for PyTerrier

qrels

Query relevance judgment type:

EhealthQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
trustworthiness: int
understandability: int
iteration: str

Relevance levels

Rel.	Definition
0	Not relevant
1	Somewhat relevant
2	Highly relevant

Examples:

import ir_datasets
dataset = ir_datasets.load("clueweb12/b13/clef-ehealth/fr")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, trustworthiness, understandability, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export clueweb12/b13/clef-ehealth/fr qrels --format tsv



[query_id]    [doc_id]    [relevance]    [trustworthiness]    [understandability]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

bibtex: @inproceedings{Zuccon2016TheIT, title={The IR Task at the CLEF eHealth Evaluation Lab 2016: User-centred Health Information Retrieval}, author={Guido Zuccon and Joao Palotti and Lorraine Goeuriot and Liadh Kelly and Mihai Lupu and Pavel Pecina and Henning M{\"u}ller and Julie Budaher and Anthony Deacon}, booktitle={CLEF}, year={2016} } @inproceedings{Palotti2017CLEF, title={CLEF 2017 Task Overview: The IR Task at the eHealth Evaluation Lab - Evaluating Retrieval Methods for Consumer Health Search}, author={Joao Palotti and Guido Zuccon and Jimmy and Pavel Pecina and Mihai Lupu and Lorraine Goeuriot and Liadh Kelly and Allan Hanbury}, booktitle={CLEF}, year={2017} }

`"clueweb12/b13/clef-ehealth/hu"`

The CLEF eHealth 2016-17 IR dataset, with queries professionally translataed to Hungarian. See clueweb12/b13/clef-ehealth for more details.

Language: hu

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("clueweb12/b13/clef-ehealth/hu")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export clueweb12/b13/clef-ehealth/hu queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

docs

Language: en

Note: Uses docs from clueweb12/b13

Document type:

WarcDoc: (namedtuple)

doc_id: str
url: str
date: str
http_headers: bytes
body: bytes
body_content_type: str

Examples:

import ir_datasets
dataset = ir_datasets.load("clueweb12/b13/clef-ehealth/hu")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, date, http_headers, body, body_content_type>

You can find more details about the Python API here.

CLI

ir_datasets export clueweb12/b13/clef-ehealth/hu docs



[doc_id]    [url]    [date]    [http_headers]    [body]    [body_content_type]
...

You can find more details about the CLI here.

No example available for PyTerrier

qrels

Query relevance judgment type:

EhealthQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
trustworthiness: int
understandability: int
iteration: str

Relevance levels

Rel.	Definition
0	Not relevant
1	Somewhat relevant
2	Highly relevant

Examples:

import ir_datasets
dataset = ir_datasets.load("clueweb12/b13/clef-ehealth/hu")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, trustworthiness, understandability, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export clueweb12/b13/clef-ehealth/hu qrels --format tsv



[query_id]    [doc_id]    [relevance]    [trustworthiness]    [understandability]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

bibtex: @inproceedings{Zuccon2016TheIT, title={The IR Task at the CLEF eHealth Evaluation Lab 2016: User-centred Health Information Retrieval}, author={Guido Zuccon and Joao Palotti and Lorraine Goeuriot and Liadh Kelly and Mihai Lupu and Pavel Pecina and Henning M{\"u}ller and Julie Budaher and Anthony Deacon}, booktitle={CLEF}, year={2016} } @inproceedings{Palotti2017CLEF, title={CLEF 2017 Task Overview: The IR Task at the eHealth Evaluation Lab - Evaluating Retrieval Methods for Consumer Health Search}, author={Joao Palotti and Guido Zuccon and Jimmy and Pavel Pecina and Mihai Lupu and Lorraine Goeuriot and Liadh Kelly and Allan Hanbury}, booktitle={CLEF}, year={2017} }

`"clueweb12/b13/clef-ehealth/pl"`

The CLEF eHealth 2016-17 IR dataset, with queries professionally translataed to Polish. See clueweb12/b13/clef-ehealth for more details.

Language: pl

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("clueweb12/b13/clef-ehealth/pl")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export clueweb12/b13/clef-ehealth/pl queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

docs

Language: en

Note: Uses docs from clueweb12/b13

Document type:

WarcDoc: (namedtuple)

doc_id: str
url: str
date: str
http_headers: bytes
body: bytes
body_content_type: str

Examples:

import ir_datasets
dataset = ir_datasets.load("clueweb12/b13/clef-ehealth/pl")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, date, http_headers, body, body_content_type>

You can find more details about the Python API here.

CLI

ir_datasets export clueweb12/b13/clef-ehealth/pl docs



[doc_id]    [url]    [date]    [http_headers]    [body]    [body_content_type]
...

You can find more details about the CLI here.

No example available for PyTerrier

qrels

Query relevance judgment type:

EhealthQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
trustworthiness: int
understandability: int
iteration: str

Relevance levels

Rel.	Definition
0	Not relevant
1	Somewhat relevant
2	Highly relevant

Examples:

import ir_datasets
dataset = ir_datasets.load("clueweb12/b13/clef-ehealth/pl")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, trustworthiness, understandability, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export clueweb12/b13/clef-ehealth/pl qrels --format tsv



[query_id]    [doc_id]    [relevance]    [trustworthiness]    [understandability]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

bibtex: @inproceedings{Zuccon2016TheIT, title={The IR Task at the CLEF eHealth Evaluation Lab 2016: User-centred Health Information Retrieval}, author={Guido Zuccon and Joao Palotti and Lorraine Goeuriot and Liadh Kelly and Mihai Lupu and Pavel Pecina and Henning M{\"u}ller and Julie Budaher and Anthony Deacon}, booktitle={CLEF}, year={2016} } @inproceedings{Palotti2017CLEF, title={CLEF 2017 Task Overview: The IR Task at the eHealth Evaluation Lab - Evaluating Retrieval Methods for Consumer Health Search}, author={Joao Palotti and Guido Zuccon and Jimmy and Pavel Pecina and Mihai Lupu and Lorraine Goeuriot and Liadh Kelly and Allan Hanbury}, booktitle={CLEF}, year={2017} }

`"clueweb12/b13/clef-ehealth/sv"`

The CLEF eHealth 2016-17 IR dataset, with queries professionally translataed to Swedish. See clueweb12/b13/clef-ehealth for more details.

Language: sv

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("clueweb12/b13/clef-ehealth/sv")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export clueweb12/b13/clef-ehealth/sv queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

docs

Language: en

Note: Uses docs from clueweb12/b13

Document type:

WarcDoc: (namedtuple)

doc_id: str
url: str
date: str
http_headers: bytes
body: bytes
body_content_type: str

Examples:

import ir_datasets
dataset = ir_datasets.load("clueweb12/b13/clef-ehealth/sv")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, date, http_headers, body, body_content_type>

You can find more details about the Python API here.

CLI

ir_datasets export clueweb12/b13/clef-ehealth/sv docs



[doc_id]    [url]    [date]    [http_headers]    [body]    [body_content_type]
...

You can find more details about the CLI here.

No example available for PyTerrier

qrels

Query relevance judgment type:

EhealthQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
trustworthiness: int
understandability: int
iteration: str

Relevance levels

Rel.	Definition
0	Not relevant
1	Somewhat relevant
2	Highly relevant

Examples:

import ir_datasets
dataset = ir_datasets.load("clueweb12/b13/clef-ehealth/sv")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, trustworthiness, understandability, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export clueweb12/b13/clef-ehealth/sv qrels --format tsv



[query_id]    [doc_id]    [relevance]    [trustworthiness]    [understandability]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

bibtex: @inproceedings{Zuccon2016TheIT, title={The IR Task at the CLEF eHealth Evaluation Lab 2016: User-centred Health Information Retrieval}, author={Guido Zuccon and Joao Palotti and Lorraine Goeuriot and Liadh Kelly and Mihai Lupu and Pavel Pecina and Henning M{\"u}ller and Julie Budaher and Anthony Deacon}, booktitle={CLEF}, year={2016} } @inproceedings{Palotti2017CLEF, title={CLEF 2017 Task Overview: The IR Task at the eHealth Evaluation Lab - Evaluating Retrieval Methods for Consumer Health Search}, author={Joao Palotti and Guido Zuccon and Jimmy and Pavel Pecina and Mihai Lupu and Lorraine Goeuriot and Liadh Kelly and Allan Hanbury}, booktitle={CLEF}, year={2017} }

`"clueweb12/b13/ntcir-www-1"`

The NTCIR-13 We Want Web (WWW) 1 ad-hoc ranking benchmark. Contains 100 queries with deep relevance judgments (avg 255 per query). Judgments aggregated from two assessors. Note that the qrels contain additional judgments from the NTCIR-14 CENTRE track.

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Examples:

import ir_datasets
dataset = ir_datasets.load("clueweb12/b13/ntcir-www-1")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.

CLI

ir_datasets export clueweb12/b13/ntcir-www-1 queries



[query_id]    [text]
...

You can find more details about the CLI here.

No example available for PyTerrier

docs

Language: en

Note: Uses docs from clueweb12/b13

Document type:

WarcDoc: (namedtuple)

doc_id: str
url: str
date: str
http_headers: bytes
body: bytes
body_content_type: str

Examples:

import ir_datasets
dataset = ir_datasets.load("clueweb12/b13/ntcir-www-1")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, date, http_headers, body, body_content_type>

You can find more details about the Python API here.

CLI

ir_datasets export clueweb12/b13/ntcir-www-1 docs



[doc_id]    [url]    [date]    [http_headers]    [body]    [body_content_type]
...

You can find more details about the CLI here.

No example available for PyTerrier

qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition
0	Two annotators rated as non-relevant
1	One annotator rated as relevant, one as non-relevant
2	Two annotators rated as relevant, OR one rates as highly relevant and one as non-relevant
3	One annotator rated as highly relevant, one as relevant
4	Two annotators rated as highly relevant

Examples:

import ir_datasets
dataset = ir_datasets.load("clueweb12/b13/ntcir-www-1")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export clueweb12/b13/ntcir-www-1 qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

bibtex: @inproceedings{Luo2017OverviewNtcirWww1, title={Overview of the NTCIR-13 We Want Web Task}, author={Cheng Luo and Tetsuya Sakai and Yiqun Liu and Zhicheng Dou and Chenyan Xiong and Jingfang Xu}, booktitle={NTCIR}, year={2017} }

`"clueweb12/b13/ntcir-www-2"`

The NTCIR-14 We Want Web (WWW) 2 ad-hoc ranking benchmark. Contains 80 queries with deep relevance judgments (avg 345 per query). Judgments aggregated from two assessors.

Language: en

Query type:

NtcirQuery: (namedtuple)

query_id: str
title: str
description: str

Examples:

import ir_datasets
dataset = ir_datasets.load("clueweb12/b13/ntcir-www-2")
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, description>

You can find more details about the Python API here.

CLI

ir_datasets export clueweb12/b13/ntcir-www-2 queries



[query_id]    [title]    [description]
...

You can find more details about the CLI here.

No example available for PyTerrier

docs

Language: en

Note: Uses docs from clueweb12/b13

Document type:

WarcDoc: (namedtuple)

doc_id: str
url: str
date: str
http_headers: bytes
body: bytes
body_content_type: str

Examples:

import ir_datasets
dataset = ir_datasets.load("clueweb12/b13/ntcir-www-2")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, date, http_headers, body, body_content_type>

You can find more details about the Python API here.

CLI

ir_datasets export clueweb12/b13/ntcir-www-2 docs



[doc_id]    [url]    [date]    [http_headers]    [body]    [body_content_type]
...

You can find more details about the CLI here.

No example available for PyTerrier

qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition
0	Two annotators rated as non-relevant
1	One annotator rated as relevant, one as non-relevant
2	Two annotators rated as relevant, OR one rates as highly relevant and one as non-relevant
3	One annotator rated as highly relevant, one as relevant
4	Two annotators rated as highly relevant

Examples:

import ir_datasets
dataset = ir_datasets.load("clueweb12/b13/ntcir-www-2")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export clueweb12/b13/ntcir-www-2 qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

bibtex: @inproceedings{Mao2018OverviewNtcirWww2, title={Overview of the NTCIR-14 We Want Web Task}, author={Jiaxin Mao and Tetsuya Sakai and Cheng Luo and Peng Xiao and Yiqun Liu and Zhicheng Dou}, booktitle={NTCIR}, year={2018} }

`"clueweb12/b13/ntcir-www-3"`

The NTCIR-15 We Want Web (WWW) 3 ad-hoc ranking benchmark. Contains 160 queries with deep relevance judgments (to be released). 80 of the queries are from clueweb12/b13/ntcir-www-2.

Shared task site

Language: en

Query type:

NtcirQuery: (namedtuple)

query_id: str
title: str
description: str

Examples:

import ir_datasets
dataset = ir_datasets.load("clueweb12/b13/ntcir-www-3")
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, description>

You can find more details about the Python API here.

CLI

ir_datasets export clueweb12/b13/ntcir-www-3 queries



[query_id]    [title]    [description]
...

You can find more details about the CLI here.

No example available for PyTerrier

docs

Language: en

Note: Uses docs from clueweb12/b13

Document type:

WarcDoc: (namedtuple)

doc_id: str
url: str
date: str
http_headers: bytes
body: bytes
body_content_type: str

Examples:

import ir_datasets
dataset = ir_datasets.load("clueweb12/b13/ntcir-www-3")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, date, http_headers, body, body_content_type>

You can find more details about the Python API here.

CLI

ir_datasets export clueweb12/b13/ntcir-www-3 docs



[doc_id]    [url]    [date]    [http_headers]    [body]    [body_content_type]
...

You can find more details about the CLI here.

No example available for PyTerrier

`"clueweb12/b13/trec-misinfo-2019"`

The TREC Medical Misinformation 2019 dataset.

Language: en

Query type:

MisinfoQuery: (namedtuple)

query_id: str
title: str
cochranedoi: str
description: str
narrative: str

Examples:

import ir_datasets
dataset = ir_datasets.load("clueweb12/b13/trec-misinfo-2019")
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, cochranedoi, description, narrative>

You can find more details about the Python API here.

CLI

ir_datasets export clueweb12/b13/trec-misinfo-2019 queries



[query_id]    [title]    [cochranedoi]    [description]    [narrative]
...

You can find more details about the CLI here.

No example available for PyTerrier

docs

Language: en

Note: Uses docs from clueweb12/b13

Document type:

WarcDoc: (namedtuple)

doc_id: str
url: str
date: str
http_headers: bytes
body: bytes
body_content_type: str

Examples:

import ir_datasets
dataset = ir_datasets.load("clueweb12/b13/trec-misinfo-2019")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, date, http_headers, body, body_content_type>

You can find more details about the Python API here.

CLI

ir_datasets export clueweb12/b13/trec-misinfo-2019 docs



[doc_id]    [url]    [date]    [http_headers]    [body]    [body_content_type]
...

You can find more details about the CLI here.

No example available for PyTerrier

qrels

Query relevance judgment type:

MisinfoQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
effectiveness: int
redibility: int

Relevance levels

Rel.	Definition
0	Not relevant
1	Relevant
2	Highly relevant

Examples:

import ir_datasets
dataset = ir_datasets.load("clueweb12/b13/trec-misinfo-2019")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, effectiveness, redibility>

You can find more details about the Python API here.

CLI

ir_datasets export clueweb12/b13/trec-misinfo-2019 qrels --format tsv



[query_id]    [doc_id]    [relevance]    [effectiveness]    [redibility]
...

You can find more details about the CLI here.

No example available for PyTerrier

bibtex: @inproceedings{Abualsaud2019OverviewTrec2019Decision, title={Overview of the TREC 2019 Decision Track}, author={Mustafa Abualsaud and Christina Lioma and Maria Maistro and Mark D. Smucker and Guido Zuccon}, booktitle={TREC}, year={2019} }

`"clueweb12/trec-web-2013"`

The TREC Web Track 2013 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Language: en

Query type:

TrecWebTrackQuery: (namedtuple)

query_id: str
query: str
description: str
type: str
subtopics: Tuple[
TrecSubtopic: (namedtuple)
1. number: str
2. text: str
3. type: str
, ...]

Examples:

import ir_datasets
dataset = ir_datasets.load("clueweb12/trec-web-2013")
for query in dataset.queries_iter():
    query # namedtuple<query_id, query, description, type, subtopics>

You can find more details about the Python API here.

CLI

ir_datasets export clueweb12/trec-web-2013 queries



[query_id]    [query]    [description]    [type]    [subtopics]
...

You can find more details about the CLI here.

No example available for PyTerrier

docs

Language: en

Note: Uses docs from clueweb12

Document type:

WarcDoc: (namedtuple)

doc_id: str
url: str
date: str
http_headers: bytes
body: bytes
body_content_type: str

Examples:

import ir_datasets
dataset = ir_datasets.load("clueweb12/trec-web-2013")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, date, http_headers, body, body_content_type>

You can find more details about the Python API here.

CLI

ir_datasets export clueweb12/trec-web-2013 docs



[doc_id]    [url]    [date]    [http_headers]    [body]    [body_content_type]
...

You can find more details about the CLI here.

No example available for PyTerrier

qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition
-2	Junk: This page does not appear to be useful for any reasonable purpose; it may be spam or junk
0	Non: The content of this page does not provide useful information on the topic, but may provide useful information on other topics, including other interpretations of the same query.
1	Rel: The content of this page provides some information on the topic, which may be minimal; the relevant information must be on that page, not just promising-looking anchor text pointing to a possibly useful page.
2	HRel: The content of this page provides substantial information on the topic.
3	Key: This page or site is dedicated to the topic; authoritative and comprehensive, it is worthy of being a top result in a web search engine.
4	Nav: This page represents a home page of an entity directly named by the query; the user may be searching for this specific page or site.

Examples:

import ir_datasets
dataset = ir_datasets.load("clueweb12/trec-web-2013")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export clueweb12/trec-web-2013 qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier

bibtex: @inproceedings{CollinsThompson2013TrecWeb, title={TREC 2013 Web Track Overview}, author={Kevyn Collins-Thompson and Paul Bennett and Fernando Diaz and Charles L. A. Clarke and Ellen M. Voorhees}, booktitle={TREC}, year={2013} }

`"clueweb12/trec-web-2014"`

The TREC Web Track 2014 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Language: en

Query type:

TrecWebTrackQuery: (namedtuple)

query_id: str
query: str
description: str
type: str
subtopics: Tuple[
TrecSubtopic: (namedtuple)
1. number: str
2. text: str
3. type: str
, ...]

Examples:

import ir_datasets
dataset = ir_datasets.load("clueweb12/trec-web-2014")
for query in dataset.queries_iter():
    query # namedtuple<query_id, query, description, type, subtopics>

You can find more details about the Python API here.

CLI

ir_datasets export clueweb12/trec-web-2014 queries



[query_id]    [query]    [description]    [type]    [subtopics]
...

You can find more details about the CLI here.

No example available for PyTerrier

docs

Language: en

Note: Uses docs from clueweb12

Document type:

WarcDoc: (namedtuple)

doc_id: str
url: str
date: str
http_headers: bytes
body: bytes
body_content_type: str

Examples:

import ir_datasets
dataset = ir_datasets.load("clueweb12/trec-web-2014")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, date, http_headers, body, body_content_type>

You can find more details about the Python API here.

CLI

ir_datasets export clueweb12/trec-web-2014 docs



[doc_id]    [url]    [date]    [http_headers]    [body]    [body_content_type]
...

You can find more details about the CLI here.

No example available for PyTerrier

qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition
-2	Junk: This page does not appear to be useful for any reasonable purpose; it may be spam or junk
0	Non: The content of this page does not provide useful information on the topic, but may provide useful information on other topics, including other interpretations of the same query.
1	Rel: The content of this page provides some information on the topic, which may be minimal; the relevant information must be on that page, not just promising-looking anchor text pointing to a possibly useful page.
2	HRel: The content of this page provides substantial information on the topic.
3	Key: This page or site is dedicated to the topic; authoritative and comprehensive, it is worthy of being a top result in a web search engine.
4	Nav: This page represents a home page of an entity directly named by the query; the user may be searching for this specific page or site.

Examples:

import ir_datasets
dataset = ir_datasets.load("clueweb12/trec-web-2014")
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

You can find more details about the Python API here.

CLI

ir_datasets export clueweb12/trec-web-2014 qrels --format tsv



[query_id]    [doc_id]    [relevance]    [iteration]
...

You can find more details about the CLI here.

No example available for PyTerrier