← home
Github: datasets/nfcorpus.py

ir_datasets: NFCorpus (NutritionFacts)

Index
  1. nfcorpus
  2. nfcorpus/dev
  3. nfcorpus/dev/nontopic
  4. nfcorpus/dev/video
  5. nfcorpus/test
  6. nfcorpus/test/nontopic
  7. nfcorpus/test/video
  8. nfcorpus/train
  9. nfcorpus/train/nontopic
  10. nfcorpus/train/video

"nfcorpus"

"NFCorpus is a full-text English retrieval data set for Medical Information Retrieval. It contains a total of 3,244 natural language queries (written in non-technical English, harvested from the NutritionFacts.org site) with 169,756 automatically extracted relevance judgments for 9,964 medical documents (written in a complex terminology-heavy language), mostly from PubMed."

docsCitation

Language: en

Document type:
NfCorpusDoc: (namedtuple)
  1. doc_id: str
  2. url: str
  3. title: str
  4. abstract: str

Example

import ir_datasets
dataset = ir_datasets.load('nfcorpus')
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, title, abstract>

"nfcorpus/dev"

Official dev set. Queries include both title and combinted "all" text field (titles, descriptions, topics, transcripts and comments)

queriesdocsqrels

Language: en

Query type:
NfCorpusQuery: (namedtuple)
  1. query_id: str
  2. title: str
  3. all: str

Example

import ir_datasets
dataset = ir_datasets.load('nfcorpus/dev')
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, all>

"nfcorpus/dev/nontopic"

Official dev set, filtered to exclude queries from topic pages.

queriesdocsqrels

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('nfcorpus/dev/nontopic')
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

"nfcorpus/dev/video"

Official dev set, filtered to only include queries from video pages.

queriesdocsqrels

Language: en

Query type:
NfCorpusVideoQuery: (namedtuple)
  1. query_id: str
  2. title: str
  3. desc: str

Example

import ir_datasets
dataset = ir_datasets.load('nfcorpus/dev/video')
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, desc>

"nfcorpus/test"

Official test set. Queries include both title and combinted "all" text field (titles, descriptions, topics, transcripts and comments)

queriesdocsqrels

Language: en

Query type:
NfCorpusQuery: (namedtuple)
  1. query_id: str
  2. title: str
  3. all: str

Example

import ir_datasets
dataset = ir_datasets.load('nfcorpus/test')
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, all>

"nfcorpus/test/nontopic"

Official test set, filtered to exclude queries from topic pages.

queriesdocsqrels

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('nfcorpus/test/nontopic')
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

"nfcorpus/test/video"

Official test set, filtered to only include queries from video pages.

queriesdocsqrels

Language: en

Query type:
NfCorpusVideoQuery: (namedtuple)
  1. query_id: str
  2. title: str
  3. desc: str

Example

import ir_datasets
dataset = ir_datasets.load('nfcorpus/test/video')
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, desc>

"nfcorpus/train"

Official train set. Queries include both title and combinted "all" text field (titles, descriptions, topics, transcripts and comments)

queriesdocsqrels

Language: en

Query type:
NfCorpusQuery: (namedtuple)
  1. query_id: str
  2. title: str
  3. all: str

Example

import ir_datasets
dataset = ir_datasets.load('nfcorpus/train')
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, all>

"nfcorpus/train/nontopic"

Official train set, filtered to exclude queries from topic pages.

queriesdocsqrels

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Example

import ir_datasets
dataset = ir_datasets.load('nfcorpus/train/nontopic')
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

"nfcorpus/train/video"

Official train set, filtered to only include queries from video pages.

queriesdocsqrels

Language: en

Query type:
NfCorpusVideoQuery: (namedtuple)
  1. query_id: str
  2. title: str
  3. desc: str

Example

import ir_datasets
dataset = ir_datasets.load('nfcorpus/train/video')
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, desc>