`ir_datasets`: NFCorpus (NutritionFacts)

Index

nfcorpus
nfcorpus/dev
nfcorpus/dev/nontopic
nfcorpus/dev/video
nfcorpus/test
nfcorpus/test/nontopic
nfcorpus/test/video
nfcorpus/train
nfcorpus/train/nontopic
nfcorpus/train/video

`"nfcorpus"`

"NFCorpus is a full-text English retrieval data set for Medical Information Retrieval. It contains a total of 3,244 natural language queries (written in non-technical English, harvested from the NutritionFacts.org site) with 169,756 automatically extracted relevance judgments for 9,964 medical documents (written in a complex terminology-heavy language), mostly from PubMed."

Dataset website

Dataset paper

docs

Language: en

Document type:

NfCorpusDoc: (namedtuple)

doc_id: str
url: str
title: str
abstract: str

Example


import ir_datasets
dataset = ir_datasets.load('nfcorpus')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, title, abstract>

Citation

bibtex: @inproceedings{boteva16full, title="A Full-Text Learning to Rank Dataset for Medical Information Retrieval", author = "Vera Boteva and Demian Gholipour and Artem Sokolov and Stefan Riezler", booktitle = "Proceedings of the European Conference on Information Retrieval ({ECIR})", location = "Padova, Italy", publisher = "Springer", year = 2016 }

`"nfcorpus/dev"`

Official dev set. Queries include both title and combinted "all" text field (titles, descriptions, topics, transcripts and comments)

queries

Language: en

Query type:

NfCorpusQuery: (namedtuple)

query_id: str
title: str
all: str

Example


import ir_datasets
dataset = ir_datasets.load('nfcorpus/dev')
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, all>

docs

Language: en

Document type:

NfCorpusDoc: (namedtuple)

doc_id: str
url: str
title: str
abstract: str

Example


import ir_datasets
dataset = ir_datasets.load('nfcorpus/dev')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, title, abstract>

qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition
0	Marginally relevant, based on topic containment.
1	A link exists from the query to another query that directly links to the document.
2	A direct link from the query to the document the cited sources section of a page.

Example


import ir_datasets
dataset = ir_datasets.load('nfcorpus/dev')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

`"nfcorpus/dev/nontopic"`

Official dev set, filtered to exclude queries from topic pages.

queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Example


import ir_datasets
dataset = ir_datasets.load('nfcorpus/dev/nontopic')
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

docs

Language: en

Document type:

NfCorpusDoc: (namedtuple)

doc_id: str
url: str
title: str
abstract: str

Example


import ir_datasets
dataset = ir_datasets.load('nfcorpus/dev/nontopic')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, title, abstract>

qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition
0	Marginally relevant, based on topic containment.
1	A link exists from the query to another query that directly links to the document.
2	A direct link from the query to the document the cited sources section of a page.

Example


import ir_datasets
dataset = ir_datasets.load('nfcorpus/dev/nontopic')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

`"nfcorpus/dev/video"`

Official dev set, filtered to only include queries from video pages.

queries

Language: en

Query type:

NfCorpusVideoQuery: (namedtuple)

query_id: str
title: str
desc: str

Example


import ir_datasets
dataset = ir_datasets.load('nfcorpus/dev/video')
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, desc>

docs

Language: en

Document type:

NfCorpusDoc: (namedtuple)

doc_id: str
url: str
title: str
abstract: str

Example


import ir_datasets
dataset = ir_datasets.load('nfcorpus/dev/video')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, title, abstract>

qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition
0	Marginally relevant, based on topic containment.
1	A link exists from the query to another query that directly links to the document.
2	A direct link from the query to the document the cited sources section of a page.

Example


import ir_datasets
dataset = ir_datasets.load('nfcorpus/dev/video')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

`"nfcorpus/test"`

Official test set. Queries include both title and combinted "all" text field (titles, descriptions, topics, transcripts and comments)

queries

Language: en

Query type:

NfCorpusQuery: (namedtuple)

query_id: str
title: str
all: str

Example


import ir_datasets
dataset = ir_datasets.load('nfcorpus/test')
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, all>

docs

Language: en

Document type:

NfCorpusDoc: (namedtuple)

doc_id: str
url: str
title: str
abstract: str

Example


import ir_datasets
dataset = ir_datasets.load('nfcorpus/test')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, title, abstract>

qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition
0	Marginally relevant, based on topic containment.
1	A link exists from the query to another query that directly links to the document.
2	A direct link from the query to the document the cited sources section of a page.

Example


import ir_datasets
dataset = ir_datasets.load('nfcorpus/test')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

`"nfcorpus/test/nontopic"`

Official test set, filtered to exclude queries from topic pages.

queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Example


import ir_datasets
dataset = ir_datasets.load('nfcorpus/test/nontopic')
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

docs

Language: en

Document type:

NfCorpusDoc: (namedtuple)

doc_id: str
url: str
title: str
abstract: str

Example


import ir_datasets
dataset = ir_datasets.load('nfcorpus/test/nontopic')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, title, abstract>

qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition
0	Marginally relevant, based on topic containment.
1	A link exists from the query to another query that directly links to the document.
2	A direct link from the query to the document the cited sources section of a page.

Example


import ir_datasets
dataset = ir_datasets.load('nfcorpus/test/nontopic')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

`"nfcorpus/test/video"`

Official test set, filtered to only include queries from video pages.

queries

Language: en

Query type:

NfCorpusVideoQuery: (namedtuple)

query_id: str
title: str
desc: str

Example


import ir_datasets
dataset = ir_datasets.load('nfcorpus/test/video')
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, desc>

docs

Language: en

Document type:

NfCorpusDoc: (namedtuple)

doc_id: str
url: str
title: str
abstract: str

Example


import ir_datasets
dataset = ir_datasets.load('nfcorpus/test/video')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, title, abstract>

qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition
0	Marginally relevant, based on topic containment.
1	A link exists from the query to another query that directly links to the document.
2	A direct link from the query to the document the cited sources section of a page.

Example


import ir_datasets
dataset = ir_datasets.load('nfcorpus/test/video')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

`"nfcorpus/train"`

Official train set. Queries include both title and combinted "all" text field (titles, descriptions, topics, transcripts and comments)

queries

Language: en

Query type:

NfCorpusQuery: (namedtuple)

query_id: str
title: str
all: str

Example


import ir_datasets
dataset = ir_datasets.load('nfcorpus/train')
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, all>

docs

Language: en

Document type:

NfCorpusDoc: (namedtuple)

doc_id: str
url: str
title: str
abstract: str

Example


import ir_datasets
dataset = ir_datasets.load('nfcorpus/train')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, title, abstract>

qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition
0	Marginally relevant, based on topic containment.
1	A link exists from the query to another query that directly links to the document.
2	A direct link from the query to the document the cited sources section of a page.

Example


import ir_datasets
dataset = ir_datasets.load('nfcorpus/train')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

`"nfcorpus/train/nontopic"`

Official train set, filtered to exclude queries from topic pages.

queries

Language: en

Query type:

GenericQuery: (namedtuple)

query_id: str
text: str

Example


import ir_datasets
dataset = ir_datasets.load('nfcorpus/train/nontopic')
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

docs

Language: en

Document type:

NfCorpusDoc: (namedtuple)

doc_id: str
url: str
title: str
abstract: str

Example


import ir_datasets
dataset = ir_datasets.load('nfcorpus/train/nontopic')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, title, abstract>

qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition
0	Marginally relevant, based on topic containment.
1	A link exists from the query to another query that directly links to the document.
2	A direct link from the query to the document the cited sources section of a page.

Example


import ir_datasets
dataset = ir_datasets.load('nfcorpus/train/nontopic')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

`"nfcorpus/train/video"`

Official train set, filtered to only include queries from video pages.

queries

Language: en

Query type:

NfCorpusVideoQuery: (namedtuple)

query_id: str
title: str
desc: str

Example


import ir_datasets
dataset = ir_datasets.load('nfcorpus/train/video')
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, desc>

docs

Language: en

Document type:

NfCorpusDoc: (namedtuple)

doc_id: str
url: str
title: str
abstract: str

Example


import ir_datasets
dataset = ir_datasets.load('nfcorpus/train/video')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, title, abstract>

qrels

Query relevance judgment type:

TrecQrel: (namedtuple)

query_id: str
doc_id: str
relevance: int
iteration: str

Relevance levels

Rel.	Definition
0	Marginally relevant, based on topic containment.
1	A link exists from the query to another query that directly links to the document.
2	A direct link from the query to the document the cited sources section of a page.

Example


import ir_datasets
dataset = ir_datasets.load('nfcorpus/train/video')
for qrel in dataset.qrels_iter():
    qrel # namedtuple<query_id, doc_id, relevance, iteration>

ir_datasets: NFCorpus (NutritionFacts)

"nfcorpus"

"nfcorpus/dev"

"nfcorpus/dev/nontopic"

"nfcorpus/dev/video"

"nfcorpus/test"

"nfcorpus/test/nontopic"

"nfcorpus/test/video"

"nfcorpus/train"

"nfcorpus/train/nontopic"

"nfcorpus/train/video"

`ir_datasets`: NFCorpus (NutritionFacts)

`"nfcorpus"`

`"nfcorpus/dev"`

`"nfcorpus/dev/nontopic"`

`"nfcorpus/dev/video"`

`"nfcorpus/test"`

`"nfcorpus/test/nontopic"`

`"nfcorpus/test/video"`

`"nfcorpus/train"`

`"nfcorpus/train/nontopic"`

`"nfcorpus/train/video"`