ir_datasets
: NFCorpus (NutritionFacts)"NFCorpus is a full-text English retrieval data set for Medical Information Retrieval. It contains a total of 3,244 natural language queries (written in non-technical English, harvested from the NutritionFacts.org site) with 169,756 automatically extracted relevance judgments for 9,964 medical documents (written in a complex terminology-heavy language), mostly from PubMed."
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("nfcorpus")
for doc in dataset.docs_iter():
doc # namedtuple<doc_id, url, title, abstract>
You can find more details about the Python API here.
Official dev set. Queries include both title and combinted "all" text field (titles, descriptions, topics, transcripts and comments)
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("nfcorpus/dev")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, all>
You can find more details about the Python API here.
Official dev set, filtered to exclude queries from topic pages.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("nfcorpus/dev/nontopic")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Official dev set, filtered to only include queries from video pages.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("nfcorpus/dev/video")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, desc>
You can find more details about the Python API here.
Official test set. Queries include both title and combinted "all" text field (titles, descriptions, topics, transcripts and comments)
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("nfcorpus/test")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, all>
You can find more details about the Python API here.
Official test set, filtered to exclude queries from topic pages.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("nfcorpus/test/nontopic")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Official test set, filtered to only include queries from video pages.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("nfcorpus/test/video")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, desc>
You can find more details about the Python API here.
Official train set. Queries include both title and combinted "all" text field (titles, descriptions, topics, transcripts and comments)
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("nfcorpus/train")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, all>
You can find more details about the Python API here.
Official train set, filtered to exclude queries from topic pages.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("nfcorpus/train/nontopic")
for query in dataset.queries_iter():
query # namedtuple<query_id, text>
You can find more details about the Python API here.
Official train set, filtered to only include queries from video pages.
Language: en
Examples:
import ir_datasets
dataset = ir_datasets.load("nfcorpus/train/video")
for query in dataset.queries_iter():
query # namedtuple<query_id, title, desc>
You can find more details about the Python API here.