← home
Github: datasets/nfcorpus.py

ir_datasets: NFCorpus (NutritionFacts)

Index
  1. nfcorpus
  2. nfcorpus/dev
  3. nfcorpus/dev/nontopic
  4. nfcorpus/dev/video
  5. nfcorpus/test
  6. nfcorpus/test/nontopic
  7. nfcorpus/test/video
  8. nfcorpus/train
  9. nfcorpus/train/nontopic
  10. nfcorpus/train/video

"nfcorpus"

"NFCorpus is a full-text English retrieval data set for Medical Information Retrieval. It contains a total of 3,244 natural language queries (written in non-technical English, harvested from the NutritionFacts.org site) with 169,756 automatically extracted relevance judgments for 9,964 medical documents (written in a complex terminology-heavy language), mostly from PubMed."

docsCitationMetadata
5.4K docs

Language: en

Document type:
NfCorpusDoc: (namedtuple)
  1. doc_id: str
  2. url: str
  3. title: str
  4. abstract: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("nfcorpus")
for doc in dataset.docs_iter():
    doc # namedtuple<doc_id, url, title, abstract>

You can find more details about the Python API here.


"nfcorpus/dev"

Official dev set. Queries include both title and combinted "all" text field (titles, descriptions, topics, transcripts and comments)

queriesdocsqrelsCitationMetadata
325 queries

Language: en

Query type:
NfCorpusQuery: (namedtuple)
  1. query_id: str
  2. title: str
  3. all: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("nfcorpus/dev")
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, all>

You can find more details about the Python API here.


"nfcorpus/dev/nontopic"

Official dev set, filtered to exclude queries from topic pages.

queriesdocsqrelsCitationMetadata
144 queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("nfcorpus/dev/nontopic")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"nfcorpus/dev/video"

Official dev set, filtered to only include queries from video pages.

queriesdocsqrelsCitationMetadata
102 queries

Language: en

Query type:
NfCorpusVideoQuery: (namedtuple)
  1. query_id: str
  2. title: str
  3. desc: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("nfcorpus/dev/video")
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, desc>

You can find more details about the Python API here.


"nfcorpus/test"

Official test set. Queries include both title and combinted "all" text field (titles, descriptions, topics, transcripts and comments)

queriesdocsqrelsCitationMetadata
325 queries

Language: en

Query type:
NfCorpusQuery: (namedtuple)
  1. query_id: str
  2. title: str
  3. all: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("nfcorpus/test")
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, all>

You can find more details about the Python API here.


"nfcorpus/test/nontopic"

Official test set, filtered to exclude queries from topic pages.

queriesdocsqrelsCitationMetadata
144 queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("nfcorpus/test/nontopic")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"nfcorpus/test/video"

Official test set, filtered to only include queries from video pages.

queriesdocsqrelsCitationMetadata
102 queries

Language: en

Query type:
NfCorpusVideoQuery: (namedtuple)
  1. query_id: str
  2. title: str
  3. desc: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("nfcorpus/test/video")
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, desc>

You can find more details about the Python API here.


"nfcorpus/train"

Official train set. Queries include both title and combinted "all" text field (titles, descriptions, topics, transcripts and comments)

queriesdocsqrelsCitationMetadata
2.6K queries

Language: en

Query type:
NfCorpusQuery: (namedtuple)
  1. query_id: str
  2. title: str
  3. all: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("nfcorpus/train")
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, all>

You can find more details about the Python API here.


"nfcorpus/train/nontopic"

Official train set, filtered to exclude queries from topic pages.

queriesdocsqrelsCitationMetadata
1.1K queries

Language: en

Query type:
GenericQuery: (namedtuple)
  1. query_id: str
  2. text: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("nfcorpus/train/nontopic")
for query in dataset.queries_iter():
    query # namedtuple<query_id, text>

You can find more details about the Python API here.


"nfcorpus/train/video"

Official train set, filtered to only include queries from video pages.

queriesdocsqrelsCitationMetadata
812 queries

Language: en

Query type:
NfCorpusVideoQuery: (namedtuple)
  1. query_id: str
  2. title: str
  3. desc: str

Examples:

Python APICLIPyTerrier
import ir_datasets
dataset = ir_datasets.load("nfcorpus/train/video")
for query in dataset.queries_iter():
    query # namedtuple<query_id, title, desc>

You can find more details about the Python API here.