Skip to content

Datasets API

amid.amos.dataset.AMOS

AMOS provides 500 CT and 100 MRI scans collected from multi-center, multi-vendor, multi-modality, multi-phase, multi-disease patients, each with voxel-level annotations of 15 abdominal organs, providing challenging examples and test-bed for studying robust segmentation algorithms under diverse targets and scenarios. [1]

Parameters:

Name Type Description Default
root (str, Path)

Absolute path to the root containing the downloaded archive and meta. If not provided, the cache is assumed to be already populated.

required
Notes

Download link: https://zenodo.org/record/7262581/files/amos22.zip

Examples:

>>> # Download the archive and meta to any folder and pass the path to the constructor:
>>> ds = AMOS(root='/path/to/the/downloaded/files')
>>> print(len(ds.ids))
# 961
>>> print(ds.image(ds.ids[0]).shape)
# (768, 768, 90)
>>> print(ds.mask(ds.ids[26]).shape)
# (512, 512, 124)
References

.. [1] JI YUANFENG. (2022). Amos: A large-scale abdominal multi-organ benchmark for versatile medical image segmentation [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7262581

birth_date(id: str)

sex(id: str)

age(id: str)

manufacturer_model(id: str)

manufacturer(id: str)

acquisition_date(id: str)

site(id: str)

ids()

image(id: str)

Corresponding 3D image.

affine(id: str)

The 4x4 matrix that gives the image's spatial orientation.

mask(id: str)

image_modality(id: str)

Returns image modality, CT or MRI.

amid.bimcv.BIMCVCovid19

BIMCV COVID-19 Dataset, CT-images only It includes BIMCV COVID-19 positive partition (https://arxiv.org/pdf/2006.01174.pdf) and negative partion (https://ieee-dataport.org/open-access/bimcv-covid-19-large-annotated-dataset-rx-and-ct-images-covid-19-patients-0)

PCR tests are not used

GitHub page: https://github.com/BIMCV-CSUSP/BIMCV-COVID-19

Parameters:

Name Type Description Default
root (str, Path)

path to the folder containing the downloaded and parsed data.

required
Notes

Dataset has 2 partitions: bimcv-covid19-positive and bimcv-covid19-positive Each partition is spread over the 81 different tgz archives. The archives includes metadata about subject, sessions, and labels. Also there are some tgz archives for nifty images in nii.gz format

Examples:

>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = BIMCVCovid19(root='/path/to/downloaded/data/folder/')
>>> print(len(ds.ids))
# 201
>>> print(ds.image(ds.ids[0]).shape)
# (512, 512, 163)
>>> print(ds.is_positive(ds.ids[0]))
# True
>>> print(ds.subject_info[80])
# {'modality_dicom': "['CT']",
#  'body_parts': "[['chest']]",
#  'age': '[80]',
#  'gender': 'M'}
References

.. [1] Maria De La Iglesia Vayá, Jose Manuel Saborit, Joaquim Angel Montell, Antonio Pertusa, Aurelia Bustos, Miguel Cazorla, Joaquin Galant, Xavier Barber, Domingo Orozco-Beltrán, Francisco Garcia, Marisa Caparrós, Germán González, and Jose María Salinas. BIMCV COVID-19+: a large annotated dataset of RX and CT images from COVID-19 patients. arXiv:2006.01174, 2020. .. [2] Maria de la Iglesia Vayá, Jose Manuel Saborit-Torres, Joaquim Angel Montell Serrano, Elena Oliver-Garcia, Antonio Pertusa, Aurelia Bustos, Miguel Cazorla, Joaquin Galant, Xavier Barber, Domingo Orozco-Beltrán, Francisco García-García, Marisa Caparrós, Germán González, Jose María Salinas, 2021. BIMCV COVID-19-: a large annotated dataset of RX and CT images from COVID-19 patients. Available at: https://dx.doi.org/10.21227/m4j2-ap59.

ids()

session_id(id: str)

subject_id(id: str)

is_positive(id: str)

image(id: str)

affine(id: str)

tags(id: str) -> dict

dicom tags

label_info(id: str) -> dict

labelCUIS, Report, LocalizationsCUIS etc.

subject_info(id: str) -> dict

modality_dicom (=[CT]), body_parts(=[chest]), age, gender

age(id: str) -> int

Minimum of (possibly two) available ages. The maximum difference between max and min age for every patient is 1 year.

sex(id: str) -> str

session_info(id: str) -> dict

study_date, medical_evaluation

amid.brats2021.BraTS2021

Parameters:

Name Type Description Default
root (str, Path)

path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated.

required
version str

the data version. Only has effect if the library was installed from a cloned git repository.

required
Notes

Download links: 2021: http://www.braintumorsegmentation.org/

Examples:

>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = BraTS2021(root='/path/to/archives/root')
>>> print(len(ds.ids))
# 5880
>>> print(ds.image(ds.ids[0]).shape)
# (240, 240, 155)
References

ids()

fold(id: str)

mapping21_17(id: str) -> pd.DataFrame

subject_id(id: str) -> str

modality(id: str) -> str

image(id: str)

mask(id: str)

spacing(id: str)

Returns the voxel spacing along axes (x, y, z).

affine(id: str)

Returns 4x4 matrix that gives the image's spatial orientation.

amid.cc359.dataset.CC359

A (C)algary-(C)ampinas public brain MR dataset with (359) volumetric images [1]_.

There are three segmentation tasks on this dataset: (i) brain, (ii) hippocampus, and (iii) White-Matter (WM), Gray-Matter (WM), and Cerebrospinal Fluid (CSF) segmentation.

Parameters:

Name Type Description Default
root (str, Path)

path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated.

required
version str

the data version. Only has effect if the library was installed from a cloned git repository.

required
Notes

homepage (upd): https://sites.google.com/view/calgary-campinas-dataset/home homepage (old): https://miclab.fee.unicamp.br/calgary-campinas-359-updated-05092017

To obtain MR images and brain and hippocampus segmentation masks, please, follow the instructions at the download platform: https://portal.conp.ca/dataset?id=projects/calgary-campinas.

Via datalad lib you need to download three zip archives: - Original.zip (the original MR images) - hippocampus_staple.zip (Silver-standard hippocampus masks generated using STAPLE) - Silver-standard-machine-learning.zip (Silver-standard brain masks generated using a machine learning method)

To the current date, WM, GM, and CSF mask could be downloaded only from the google drive: https://drive.google.com/drive/u/0/folders/0BxLb0NB2MjVZNm9JY1pWNFp6WTA?resourcekey=0-2sXMr8q-n2Nn6iY3PbBAdA.

Here you need to manually download a folder (from the google drive root above) CC359/Reconstructed/CC359/WM-GM-CSF/

So the root folder to pass to this dataset class should contain four objects: - three zip archives (Original.zip, hippocampus_staple.zip, and Silver-standard-machine-learning.zip) - one folder WM-GM-CSF with the original structure: <...>/WM-GM-CSF/CC0319_ge_3_45_M.nii.gz <...>/WM-GM-CSF/CC0324_ge_3_56_M.nii.gz ...

Examples:

>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> cc359 = CC359(root='/path/to/downloaded/data/folder/')
>>> print(len(cc359.ids))
# 359
>>> print(cc359.image(cc359.ids[0]).shape)
# (171, 256, 256)
>>> print(cc359.wm_gm_csf(cc359.ids[80]).shape)
# (180, 240, 240)
References

.. [1] Souza, Roberto, et al. "An open, multi-vendor, multi-field-strength brain MR dataset and analysis of publicly available skull stripping methods agreement." NeuroImage 170 (2018): 482-494. https://www.sciencedirect.com/science/article/pii/S1053811917306687

ids()

vendor(id: str)

field(id: str)

age(id: str)

sex(id: str)

image(id: str)

affine(id: str)

The 4x4 matrix that gives the image's spatial orientation.

voxel_spacing(id: str)

spacing(id: str)

Returns voxel spacing along axes (x, y, z).

brain(id: str)

hippocampus(id: str)

wm_gm_csf(id: str)

amid.cl_detection.CLDetection2023

The data for the "Cephalometric Landmark Detection in Lateral X-ray Images" Challenge, held with the MICCAI-2023 conference.

Notes

The data can only be obtained by contacting the organizers by email. See the challenge home page for details.

Parameters:

Name Type Description Default
root (str, Path)

path to the folder containing the raw downloaded and unarchived data. If not provided, the cache is assumed to be already populated.

required
version str

the data version. Only has effect if the library was installed from a cloned git repository.

required

Examples:

>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = CLDetection2023(root='/path/to/data/root/folder')
>>> print(len(ds.ids))
# 400
>>> print(ds.image(ds.ids[0]).shape)
# (2400, 1935)

ids()

image(id: str)

points(id: str)

spacing(id: str)

amid.crlm.CRLM

Parameters:

Name Type Description Default
root (str, Path)

path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated.

required
version str

the data version. Only has effect if the library was installed from a cloned git repository.

required
Notes

Download links: https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=89096268#89096268b2cc35fce0664a2b875b5ec675ba9446

This collection consists of DICOM images and DICOM Segmentation Objects (DSOs) for 197 patients with Colorectal Liver Metastases (CRLM). Comprised of Original DICOM CTs and Segmentations for each subject. The segmentations include 'Liver', 'Liver_Remnant' (liver that will remain after surgery based on a preoperative CT plan), 'Hepatic' and 'Portal' veins, and 'Tumor_x', where 'x' denotes the various tumor occurrences in the case

Examples:

>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = CRLM(root='/path/to/archives/root')
>>> print(len(ds.ids))
# 197
>>> print(ds.image(ds.ids[0]).shape)
# (512, 512, 52)
References

ids()

image(id: str)

mask(id: str) -> Dict[str, np.ndarray]

Returns dict: {'liver': ..., 'hepatic': ..., 'tumor_x': ...}

spacing(id: str)

Returns the voxel spacing along axes (x, y, z).

slice_locations(id: str)

affine(id: str)

Returns 4x4 matrix that gives the image's spatial orientation.

amid.ct_ich.CT_ICH

(C)omputed (T)omography Images for (I)ntracranial (H)emorrhage Detection and (S)egmentation.

This dataset contains 75 head CT scans including 36 scans for patients diagnosed with intracranial hemorrhage with the following types: Intraventricular, Intraparenchymal, Subarachnoid, Epidural and Subdural.

Parameters:

Name Type Description Default
root (str, Path)

path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated.

required
version str

the data version. Only has effect if the library was installed from a cloned git repository.

required
Notes

Data can be downloaded here: https://physionet.org/content/ct-ich/1.3.1/. Then, the folder with raw downloaded data should contain folders ct_scans and masks along with other files.

Examples:

>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = CT_ICH(root='/path/to/downloaded/data/folder/')
>>> print(len(ds.ids))
# 75
>>> print(ds.image(ds.ids[0]).shape)
# (512, 512, 39)
>>> print(ds.mask(ds.ids[0]).shape)
# (512, 512, 39)

ids()

image(id: str)

mask(id: str)

affine(id: str)

The 4x4 matrix that gives the image's spatial orientation.

voxel_spacing(id: str)

spacing(id: str)

Returns voxel spacing along axes (x, y, z).

age(id: str) -> float

sex(id: str) -> str

intraventricular_hemorrhage(id: str)

Returns True if hemorrhage exists and its type is intraventricular.

intraparenchymal_hemorrhage(id: str)

Returns True if hemorrhage was diagnosed and its type is intraparenchymal.

subarachnoid_hemorrhage(id: str)

Returns True if hemorrhage was diagnosed and its type is subarachnoid.

epidural_hemorrhage(id: str)

Returns True if hemorrhage was diagnosed and its type is epidural.

subdural_hemorrhage(id: str)

Returns True if hemorrhage was diagnosed and its type is subdural.

fracture(id: str)

Returns True if skull fracture was diagnosed.

notes(id: str)

Returns special notes if they exist.

hemorrhage_diagnosis_raw_metadata(id: str)

amid.crossmoda.CrossMoDA

Parameters:

Name Type Description Default
root (str, Path)

path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated.

required
version str

the data version. Only has effect if the library was installed from a cloned git repository.

required
Notes

Download links: 2021 & 2022: https://zenodo.org/record/6504722#.YsgwnNJByV4

Examples:

>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = CrossMoDA(root='/path/to/archives/root')
>>> print(len(ds.ids))
# 484
>>> print(ds.image(ds.ids[0]).shape)
# (512, 512, 214)
References

ids()

train_source_df(id: str)

image(id: str) -> Union[np.ndarray, None]

pixel_spacing(id: str)

spacing(id: str)

Returns pixel spacing along axes (x, y, z)

affine(id: str)

The 4x4 matrix that gives the image's spatial orientation

split(id: str) -> str

The split in which this entry is contained: training_source, training_target, validation

year(id: str) -> int

The year in which this entry was published: 2021 or 2022

masks(id: str) -> Union[np.ndarray, None]

Combined mask of schwannoma and cochlea (1 and 2 respectively)

koos_grade(id: str)

VS Tumour characteristic according to Koos grading scale: [1..4] or (-1 - post operative)

amid.deeplesion.DeepLesion

DeepLesion is composed of 33,688 bookmarked radiology images from 10,825 studies of 4,477 unique patients. For every bookmarked image, a bound- ing box is created to cover the target lesion based on its measured diameters [1].

Parameters:

Name Type Description Default
root (str, Path)

path to the folder containing DL_info.csv file and a subfolder Images_nifti with 20094 nii.gz files.

required
Notes

Dataset is available at https://nihcc.app.box.com/v/DeepLesion

To download the data we recommend using a Python script provided by the authors batch_download_zips.py. Once you download the data and unarchive all 56 zip archives, you should run DL_save_nifti.py provided by the authors to convert 2D PNGs into 20094 nii.gz files.

Example

ds = DeepLesion(root='/path/to/folder') print(len(ds.ids))

20094

References

.. [1] Yan, Ke, Xiaosong Wang, Le Lu, and Ronald M. Summers. "Deeplesion: Automated deep mining, categorization and detection of significant radiology image findings using large-scale clinical lesion annotations." arXiv preprint arXiv:1710.01766 (2017).

ids()

patient_id(id: str)

study_id(id: str)

series_id(id: str)

sex(id: str)

age(id: str)

Patient Age might be different for different studies (dataset contains longitudinal records).

ct_window(id: str)

CT window extracted from DICOMs. Recall, that it is min-max values for windowing, not width-level.

affine(id: str)

spacing(id: str)

image(id: str)

Some 3D volumes are stored as separate subvolumes, e.g. ds.ids[15000] and ds.ids[15001].

train_val_test(id: str)

Authors' defined randomly generated patient-level data split, train=1, validation=2, test=3, 70/15/15 ratio.

lesion_position(id: str)

Lesion measurements as it appear in DL_info.csv, for details see https://nihcc.app.box.com/v/DeepLesion/file/306056134060 .

mask(id: str)

Mask of provided bounding boxes. Recall that bboxes annotation is very coarse, it only covers a single 2D slice.

amid.egd.EGD

The Erasmus Glioma Database (EGD): Structural MRI scans, WHO 2016 subtypes, and segmentations of 774 patients with glioma [1]_.

Parameters:

Name Type Description Default
root (str, Path)

path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated.

required
version str

the data version. Only has effect if the library was installed from a cloned git repository.

required
Notes

The access to the dataset could be requested at XNAT portal [https://xnat.bmia.nl/data/archive/projects/egd].

To download the data in the compatible structure we recommend to use egd-downloader script [https://zenodo.org/record/4761089#.YtZpLtJBxhF]. Please, refer to its README for further information.

Examples:

>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> egd = EGD(root='/path/to/downloaded/data/folder/')
>>> print(len(egd.ids))
# 774
>>> print(egd.t1gd(egd.ids[215]).shape)
# (197, 233, 189)
>>> print(egd.manufacturer(egd.ids[444]))
# Philips Medical Systems
References

.. [1] van der Voort, Sebastian R., et al. "The Erasmus Glioma Database (EGD): Structural MRI scans, WHO 2016 subtypes, and segmentations of 774 patients with glioma." Data in brief 37 (2021): 107191. https://www.sciencedirect.com/science/article/pii/S2352340921004753

ids()

brain_mask(id: str)

deface_mask(id: str)

modality(id: str)

subject_id(id: str)

affine(id: str)

voxel_spacing(id: str)

spacing(id: str)

image(id: str)

genetic_and_histological_label_idh(id: str)

genetic_and_histological_label_1p19q(id: str)

genetic_and_histological_label_grade(id: str)

age(id: str)

sex(id: str)

observer(id: str)

original_scan(id: str)

manufacturer(id: str)

system(id: str)

field(id: str)

mask(id: str)

amid.flare2022.FLARE2022

An abdominal organ segmentation dataset for semi-supervised learning [1]_.

The dataset was used at the MICCAI FLARE 2022 challenge.

Parameters:

Name Type Description Default
root (str, Path)

path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated.

required
Notes

Download link: https://flare22.grand-challenge.org/Dataset/

The root folder should contain the two downloaded folders, namely: "Training" and "Validation".

Examples:

>>> # Place the downloaded folders in any folder and pass the path to the constructor:
>>> ds = FLARE2022(root='/path/to/downloaded/data/folder/')
>>> print(len(ds.ids))
# 2100
>>> print(ds.image(ds.ids[0]).shape)
# (512, 512, 110)
>>> print(ds.mask(ds.ids[25]).shape)
# (512, 512, 104)
References

.. [1] Ma, Jun, et al. "Fast and Low-GPU-memory abdomen CT organ segmentation: The FLARE challenge." Medical Image Analysis 82 (2022): 102616.

ids()

image(id: str)

affine(id: str)

The 4x4 matrix that gives the image's spatial orientation

mask(id: str)

amid.hcp.HCP

ids()

image(id: str)

affine(id: str)

spacing(id: str)

amid.kits.KiTS23

Kidney and Kidney Tumor Segmentation Challenge, The 2023 Kidney and Kidney Tumor Segmentation challenge (abbreviated KiTS23) is a competition in which teams compete to develop the best system for automatic semantic segmentation of kidneys, renal tumors, and renal cysts.

Competition page is https://kits-challenge.org/kits23/, official competition repository is https://github.com/neheller/kits23/.

For usage, clone the repository https://github.com/neheller/kits23/, install and run kits23_download_data.

Parameters:

Name Type Description Default
root
required
Example

ids()

image(id: str)

mask(id: str)

affine(id: str)

The 4x4 matrix that gives the image's spatial orientation.

amid.lidc.dataset.LIDC

The (L)ung (I)mage (D)atabase (C)onsortium image collection (LIDC-IDRI) [1]_ consists of diagnostic and lung cancer screening thoracic computed tomography (CT) scans with marked-up annotated lesions and lung nodules segmentation task. Scans contains multiple expert annotations.

Number of CT scans: 1018.

Parameters:

Name Type Description Default
root (str, Path)

path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated.

required
version str

the data version. Only has effect if the library was installed from a cloned git repository.

required
Notes

Follow the download instructions at https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=1966254.

Then, the folder with raw downloaded data should contain folder LIDC-IDRI, which contains folders LIDC-IDRI-*.

Examples:

>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = LIDC(root='/path/to/downloaded/data/folder/')
>>> print(len(ds.ids))
# 1018
>>> print(ds.image(ds.ids[0]).shape)
# (512, 512, 194)
>>> print(ds.cancer(ds.ids[0]).shape)
# (512, 512, 194)
References

.. [1] Armato III, McLennan, et al. "The lung image database consortium (lidc) and image database resource initiative (idri): a completed reference database of lung nodules on ct scans." Medical physics 38(2) (2011): 915–931. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3041807/

ids()

image(id: str)

study_uid(id: str)

series_uid(id: str)

patient_id(id: str)

sop_uids(id: str)

pixel_spacing(id: str)

slice_locations(id: str)

voxel_spacing(id: str)

Returns voxel spacing along axes (x, y, z).

spacing(id: str)

Volumetric spacing of the image. The maximum relative difference in slice_locations < 1e-3 (except 4 images listed below), so we allow ourselves to use the common spacing for the whole 3D image.

Note

The slice_locations attribute typically (but not always!) has the constant step. In LIDC dataset, only 4 images have difference in slice_locations > 1e-3: 1.3.6.1.4.1.14519.5.2.1.6279.6001.526570782606728516388531252230 1.3.6.1.4.1.14519.5.2.1.6279.6001.329334252028672866365623335798 1.3.6.1.4.1.14519.5.2.1.6279.6001.245181799370098278918756923992 1.3.6.1.4.1.14519.5.2.1.6279.6001.103115201714075993579787468219 And these differences appear in the maximum of 3 slices. Therefore, we consider their impact negligible.

contrast_used(id: str)

If the DICOM file for the scan had any Contrast tag, this is marked as True.

is_from_initial(id: str)

Indicates whether or not this PatientID was tagged as part of the initial 399 release.

orientation_matrix(id: str)

sex(id: str)

age(id: str)

conv_kernel(id: str)

kvp(id: str)

tube_current(id: str)

study_date(id: str)

accession_number(id: str)

nodules(id: str)

nodules_masks(id: str)

cancer(id: str)

amid.lits.dataset.LiTS

A (Li)ver (T)umor (S)egmentation dataset [1] from Medical Segmentation Decathlon [2]

There are two segmentation tasks on this dataset: liver and liver tumor segmentation.

Parameters:

Name Type Description Default
root (str, Path)

path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated.

required
version str

the data version. Only has effect if the library was installed from a cloned git repository.

required
Notes

Follow the download instructions at https://competitions.codalab.org/competitions/17094.

Then, the folder with raw downloaded data should contain two zip archives with the train data (Training_Batch1.zip and Training_Batch2.zip) and a folder with the test data (LITS-Challenge-Test-Data).

The folder with test data should have original structure: <...>/LITS-Challenge-Test-Data/test-volume-0.nii <...>/LITS-Challenge-Test-Data/test-volume-1.nii ...

P.S. Organs boxes are also provided from a separate source https://github.com/superxuang/caffe_3d_faster_rcnn.

Examples:

>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = LiTS(root='/path/to/downloaded/data/folder/')
>>> print(len(ds.ids))
# 201
>>> print(ds.image(ds.ids[0]).shape)
# (512, 512, 163)
>>> print(ds.tumor_mask(ds.ids[80]).shape)
# (512, 512, 771)
References

.. [1] Bilic, Patrick, et al. "The liver tumor segmentation benchmark (lits)." arXiv preprint arXiv:1901.04056 (2019). .. [2] Antonelli, Michela, et al. "The medical segmentation decathlon." arXiv preprint arXiv:2106.05735 (2021).

ids()

fold(id: str)

image(id: str)

affine(id: str)

The 4x4 matrix that gives the image's spatial orientation.

voxel_spacing(id: str)

spacing(id: str)

Returns voxel spacing along axes (x, y, z).

mask(id: str)

amid.liver_medseg.LiverMedseg

LiverMedseg is a public CT segmentation dataset with 50 annotated images. Case collection of 50 livers with their segments. Images obtained from Decathlon Medical Segmentation competition

Parameters:

Name Type Description Default
root (str, Path)

path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated.

required
version str

the data version. Only has effect if the library was installed from a cloned git repository.

required
Notes

Download links: https://www.medseg.ai/database/liver-segments-50-cases

Examples:

>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = LiverMedseg(root='/path/to/archives/root')
>>> print(len(ds.ids))
# 50
>>> print(ds.image(ds.ids[0]).shape)
# (512, 512, 38)
References

ids()

image(id: str) -> np.ndarray

affine(id: str) -> np.ndarray

The 4x4 matrix that gives the image's spatial orientation.

voxel_spacing(id: str) -> tuple

spacing(id: str) -> tuple

mask(id: str) -> np.ndarray

amid.midrc.MIDRC

MIDRC-RICORD dataset 1a is a public COVID-19 CT segmentation dataset with 120 scans.

Parameters:

Name Type Description Default
root (str, Path)

path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated.

required
version str

the data version. Only has effect if the library was installed from a cloned git repository.

required
Notes

Follow the download instructions at https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=80969742 Download both Images and Annotations to the same folder

Then, the folder with downloaded data should contain two paths with the data

The folder should have this structure: <...>//MIDRC-RICORD-1A <...>//MIDRC-RICORD-1a_annotations_labelgroup_all_2020-Dec-8.json

Examples:

>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = MIDRC(root='/path/to/downloaded/data/folder/')
>>> print(len(ds.ids))
 155
>>> print(ds.image(ds.ids[0]).shape)
 (512, 512, 112)
>>> print(ds.mask(ds.ids[80]).shape)
 (6, 512, 512, 450)
References

ids()

image(id: str)

image_meta(id: str)

spacing(id: str)

labels(id: str)

mask(id: str)

amid.mood.MOOD

A (M)edival (O)ut-(O)f-(D)istribution analysis challenge [1]_

This dataset contains raw brain MRI and abdominal CT images.

Number of training samples: - Brain: 800 scans ( 256 x 256 x 256 ) - Abdominal: 550 scans ( 512 x 512 x 512 )

For each setup there are 4 toy test samples with OOD cases.

Parameters:

Name Type Description Default
root (str, Path)

path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated.

required
version str

the data version. Only has effect if the library was installed from a cloned git repository.

required
Notes

Follow the download instructions at https://www.synapse.org/#!Synapse:syn21343101/wiki/599515.

Then, the folder with raw downloaded data should contain four zip archives with data (abdom_toy.zip, abdom_train.zip, brain_toy.zip and brain_train.zip).

Examples:

>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = MOOD(root='/path/to/downloaded/data/folder/')
>>> print(len(ds.ids))
# 1358
>>> print(ds.image(ds.ids[0]).shape)
# (512, 512, 512)
>>> print(ds.pixel_label(ds.ids[0]).shape)
# (512, 512, 512)
References

.. [1] Zimmerer, Petersen, et al. "Medical Out-of-Distribution Analysis Challenge 2022." doi: 10.5281/zenodo.6362313 (2022).

ids()

fold(id: str)

Returns fold: train or toy (test).

task(id: str)

Returns task: brain (MRI) or abdominal (CT).

image(id: str)

affine(id: str)

The 4x4 matrix that gives the image's spatial orientation.

voxel_spacing(id: str)

spacing(id: str)

Returns voxel spacing along axes (x, y, z).

sample_label(id: str)

Returns sample-level OOD score for toy examples and None otherwise. 0 indicates no abnormality and 1 indicates abnormal input.

pixel_label(id: str)

Returns voxel-level OOD scores for toy examples and None otherwise. 0 indicates no abnormality and 1 indicates abnormal input.

amid.msd.MSD

MSD is a Medical Segmentaton Decathlon Challenge with 10 tasks.

Parameters:

Name Type Description Default
root (str, Path)

path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated.

required
version str

the data version. Only has effect if the library was installed from a cloned git repository.

required
Notes

Data can be downloaded here:http://medicaldecathlon.com/ or here: https://msd-for-monai.s3-us-west-2.amazonaws.com/ or here: https://drive.google.com/drive/folders/1HqEgzS8BV2c7xYNrZdEAnrHk7osJJ--2/ Then, the folder with raw downloaded data should contain tar archive with data and masks (Task03_Liver.tar).

ids() -> tuple

train_test(id: str) -> str

task(id: str) -> str

image(id: str)

affine(id: str)

The 4x4 matrix that gives the image's spatial orientation.

image_modality(id: str) -> str

segmentation_labels(id: str) -> dict

Returns segmentation labels for the task

mask(id: str)

amid.mslub.dataset.MSLUB

ids()

image(id: str)

mask(id: str)

patient(id: str)

affine(id: str)

amid.medseg9.Medseg9

Medseg9 is a public COVID-19 CT segmentation dataset with 9 annotated images.

Parameters:

Name Type Description Default
root (str, Path)

path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated.

required
version str

the data version. Only has effect if the library was installed from a cloned git repository.

required
Notes

Data can be downloaded here: http://medicalsegmentation.com/covid19/.

Then, the folder with raw downloaded data should contain three zip archives with data and masks (rp_im.zip, rp_lung_msk.zip, rp_msk.zip).

Examples:

>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = Medseg9(root='/path/to/downloaded/data/folder/')
>>> print(len(ds.ids))
# 9
>>> print(ds.image(ds.ids[0]).shape)
# (630, 630, 45)
>>> print(ds.covid(ds.ids[0]).shape)
# (630, 630, 45)

ids()

image(id: str)

affine(id: str)

The 4x4 matrix that gives the image's spatial orientation.

voxel_spacing(id: str)

spacing(id: str)

Returns voxel spacing along axes (x, y, z).

lungs(id: str)

covid(id: str)

int16 mask. 0 - normal, 1 - ground-glass opacities (матовое стекло), 2 - consolidation (консолидация).

amid.cancer_500.dataset.MoscowCancer500

The Moscow Radiology Cancer-500 dataset.

Parameters:

Name Type Description Default
root (str, Path)

path to the folder containing the raw downloaded files. If not provided, the cache is assumed to be already populated.

required
version str

the data version. Only has effect if the library was installed from a cloned git repository.

required
Notes

Download links: https://mosmed.ai/en/datasets/mosmeddata-kt-s-priznakami-raka-legkogo-tip-viii/ After pressing the download button you will have to provide an email address to which further instructions will be sent.

Examples:

>>> # Place the downloaded files in any folder and pass the path to the constructor:
>>> ds = MoscowCancer500(root='/path/to/files/root')
>>> print(len(ds.ids))
# 979
>>> print(ds.image(ds.ids[0]).shape)
# (512, 512, 67)

ids()

image(id: str)

study_uid(id: str)

series_uid(id: str)

sop_uids(id: str)

pixel_spacing(id: str)

slice_locations(id: str)

orientation_matrix(id: str)

instance_numbers(id: str)

conv_kernel(id: str)

kvp(id: str)

patient_id(id: str)

study_date(id: str)

accession_number(id: str)

nodules(id: str)

amid.covid_1110.MoscowCovid1110

The Moscow Radiology COVID-19 dataset.

Parameters:

Name Type Description Default
root (str, Path)

path to the folder containing the raw downloaded files. If not provided, the cache is assumed to be already populated.

required
version str

the data version. Only has effect if the library was installed from a cloned git repository.

required
Notes

Download links: https://mosmed.ai/en/datasets/covid191110/

Examples:

>>> # Place the downloaded files in any folder and pass the path to the constructor:
>>> ds = MoscowCovid1110(root='/path/to/files/root')
>>> print(len(ds.ids))
# 1110
>>> print(ds.image(ds.ids[0]).shape)
# (512, 512, 43)

ids()

image(id: str)

affine(id: str)

label(id: str)

mask(id: str)

amid.nlst.NLST

Dataset with low-dose CT scans of 26,254 patients acquired during National Lung Screening Trial.

Parameters:

Name Type Description Default
root (str, Path)

path to the folder (usually called NLST) containing the patient subfolders (like 101426). If not provided, the cache is assumed to be already populated.

required
version str

the data version. Only has effect if the library was installed from a cloned git repository.

required
Notes

Follow the download instructions at https://wiki.cancerimagingarchive.net/display/NLST/National+Lung+Screening+Trial. The dicoms should be placed under the following folders' structure: <...>//////*.dcm

Examples:

>>> ds = NLST(root='/path/to/NLST/')
>>> print(len(ds.ids))
 ...
>>> print(ds.image(ds.ids[0]).shape)
 ...
>>> print(ds.mask(ds.ids[80]).shape)
 ...
References

ids()

image(id: str)

study_uid(id: str)

series_uid(id: str)

sop_uids(id: str)

pixel_spacing(id: str)

slice_locations(id: str)

orientation_matrix(id: str)

conv_kernel(id: str)

kvp(id: str)

patient_id(id: str)

study_date(id: str)

accession_number(id: str)

amid.nsclc.NSCLC

NSCLC-Radiomics is a public cell lung cancer segmentation dataset with 422 patients.

Parameters:

Name Type Description Default
root (str, Path)

path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated.

required
version str

the data version. Only has effect if the library was installed from a cloned git repository.

required
Notes

Follow the download instructions at https://wiki.cancerimagingarchive.net/display/Public/NSCLC-Radiomics

The folder with downloaded data should contain two paths

The folder should have this structure: <...>//NSCLC-Radiomics/LUNG1-XXX

Examples:

>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = NSCLC(root='/path/to/downloaded/data/folder/')
>>> print(len(ds.ids))
 422
>>> print(ds.image(ds.ids[0]).shape)
 (512, 512, 134)
>>> print(ds.mask(ds.ids[80]).shape)
 (512, 512, 108)
References

ids()

image(id: str)

image_meta(id: str)

sex(id: str) -> str

Sex of the patient.

age(id: str) -> Union[int, None]

Age of the patient, dataset contains 97 patients with unknown Age.

spacing(id: str)

mask(id: str)

lung_left(id: str)

lung_right(id: str)

lungs_total(id: str)

heart(id: str)

esophagus(id: str)

spinal_cord(id: str)

amid.rsna_bc.dataset.RSNABreastCancer

site_id(id: str)

patient_id(id: str)

image_id(id: str)

laterality(id: str)

view(id: str)

age(id: str)

cancer(id: str)

biopsy(id: str)

invasive(id: str)

BIRADS(id: str)

implant(id: str)

density(id: str)

machine_id(id: str)

prediction_id(id: str)

difficult_negative_case(id: str)

ids()

image(id: str)

padding_value(id: str)

intensity_sign(id: str)

amid.ribfrac.dataset.RibFrac

RibFrac dataset is a benchmark for developping algorithms on rib fracture detection, segmentation and classification. We hope this large-scale dataset could facilitate both clinical research for automatic rib fracture detection and diagnoses, and engineering research for 3D detection, segmentation and classification.

Parameters:

Name Type Description Default
root (str, Path)

path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated.

required
Notes

Data downloaded from here: https://doi.org/10.5281/zenodo.3893507 -- train Part1 (300 images) https://doi.org/10.5281/zenodo.3893497 -- train Part2 (120 images) https://doi.org/10.5281/zenodo.3893495 -- val (80 images) https://zenodo.org/record/3993380 -- test (160 images without annotation)

References

Jiancheng Yang, Liang Jin, Bingbing Ni, & Ming Li. (2020). RibFrac Dataset: A Benchmark for Rib Fracture Detection, Segmentation and Classification

ids()

image(id: str)

label(id: str)

affine(id: str)

The 4x4 matrix that gives the image's spatial orientation

amid.stanford_coca.StanfordCoCa

A Stanford AIMI's Co(ronary) Ca(lcium) dataset.

Parameters:

Name Type Description Default
root (str, Path)

path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated.

required
version str

the data version. Only has effect if the library was installed from a cloned git repository.

required
Notes

Follow the download instructions at https://stanfordaimi.azurewebsites.net/datasets/e8ca74dc-8dd4-4340-815a-60b41f6cb2aa. You'll need to register and accept the terms of use. After that, copy the files from Azure:

azcopy copy 'some-generated-access-link' /path/to/downloaded/data/ --recursive=true

Then, the folder with raw downloaded data should contain two subfolders - a subset with gated coronary CT scans and corresponding coronary calcium segmentation masks (Gated_release_final) and a folder with the non-gated CT scans with corresponding coronary with coronary artery calcium scores (deidentified_nongated).

The folder with gated data should have original structure: ./Gated_release_final/patient/0/folder-with-dcms/ ./Gated_release_final/calcium_xml/0.xml ...

The folder with nongated data should have original structure: ./deidentified_nongated/0/folder-with-dcms/ ...

Examples:

>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = StanfordCoCa(root='/path/to/downloaded/data/folder/')
>>> print(len(ds.ids))
# 971
>>> print(ds.image(ds.ids[0]).shape)
# (512, 512, 57)

ids()

image(id: str)

series_uid(id: str)

study_uid(id: str)

pixel_spacing(id: str)

slice_locations(id: str)

orientation_matrix(id: str)

calcifications(id: str)

Returns list of Calcifications

score(id: str)

amid.tbad.TBAD

A dataset of 3D Computed Tomography (CT) images for Type-B Aortic Dissection segmentation.

Notes

The data can only be obtained by contacting the authors by email. See the dataset home page for details.

Parameters:

Name Type Description Default
root (str, Path)

path to the folder containing the raw downloaded files. If not provided, the cache is assumed to be already populated.

required
version str

the data version. Only has effect if the library was installed from a cloned git repository.

required

Examples:

>>> # Place the downloaded files in any folder and pass the path to the constructor:
>>> ds = TBAD(root='/path/to/files/root')
>>> print(len(ds.ids))
# 100
>>> print(ds.image(ds.ids[0]).shape)
# (512, 512, 327)
References

.. [1] Yao, Zeyang & Xie, Wen & Zhang, Jiawei & Dong, Yuhao & Qiu, Hailong & Haiyun, Yuan & Jia, Qianjun & Tianchen, Wang & Shi, Yiyi & Zhuang, Jian & Que, Lifeng & Xu, Xiaowei & Huang, Meiping. (2021). ImageTBAD: A 3D Computed Tomography Angiography Image Dataset for Automatic Segmentation of Type-B Aortic Dissection. Frontiers in Physiology. 12. 732711. 10.3389/fphys.2021.732711.

ids()

image(id: str)

affine(id: str)

The 4x4 matrix that gives the image's spatial orientation.

mask(id: str)

amid.totalsegmentator.dataset.Totalsegmentator

In 1204 CT images we segmented 104 anatomical structures (27 organs, 59 bones, 10 muscles, 8 vessels) covering a majority of relevant classes for most use cases.

The CT images were randomly sampled from clinical routine, thus representing a real world dataset which generalizes to clinical application.

The dataset contains a wide range of different pathologies, scanners, sequences and institutions. [1]

Parameters:

Name Type Description Default
root (str, Path)

absolute path to the downloaded archive. If not provided, the cache is assumed to be already populated.

required
Notes

Download link: https://zenodo.org/record/6802614/files/Totalsegmentator_dataset.zip

Examples:

>>> # Download the archive to any folder and pass the path to the constructor:
>>> ds = Totalsegmentator(root='/path/to/the/downloaded/archive')
>>> print(len(ds.ids))
# 1204
>>> print(ds.image(ds.ids[0]).shape)
# (294, 192, 179)
>>> print(ds.aorta(ds.ids[25]).shape)
# (320, 320, 145)
References

.. [1] Jakob Wasserthal (2022) Dataset with segmentations of 104 important anatomical structures in 1204 CT images. Available at: https://zenodo.org/record/6802614#.Y6M2MxXP1D8

ids()

image(id: str)

affine(id: str)

The 4x4 matrix that gives the image's spatial orientation

amid.upenn_gbm.upenn_gbm.UPENN_GBM

Multi-parametric magnetic resonance imaging (mpMRI) scans for de novo Glioblastoma (GBM) patients from the University of Pennsylvania Health System (UPENN-GBM). Dataset contains 630 patients.

All samples are registered to a common atlas (SRI) using a uniform preprocessing and the segmentation are aligned with them.

Parameters:

Name Type Description Default
root (str, Path)

path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated.

required
Notes

Follow the download instructions at https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=70225642 Download to the root folder nifti images and metadata. Organise folder as folows:

<...>//NIfTI-files/images_segm/UPENN-GBM-00054_11_segm.nii.gz <...>//NIfTI-files/...

<...>//UPENN-GBM_clinical_info_v1.0.csv <...>//UPENN-GBM_acquisition.csv

Examples:

>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = UPENN_GBM(root='/path/to/downloaded/data/folder/')
>>> print(len(ds.ids))
# 671
>>> print(ds.image(ds.ids[215]).shape)
# (4, 240, 240, 155)
>>> print(d.acqusition_info(d.ids[215]).manufacturer)
# SIEMENS
References

.. [1] Bakas, S., Sako, C., Akbari, H., Bilello, M., Sotiras, A., Shukla, G., Rudie, J. D., Flores Santamaria, N., Fathi Kazerooni, A., Pati, S., Rathore, S., Mamourian, E., Ha, S. M., Parker, W., Doshi, J., Baid, U., Bergman, M., Binder, Z. A., Verma, R., … Davatzikos, C. (2021). Multi-parametric magnetic resonance imaging (mpMRI) scans for de novo Glioblastoma (GBM) patients from the University of Pennsylvania Health System (UPENN-GBM) (Version 2) [Data set]. The Cancer Imaging Archive. https://doi.org/10.7937/TCIA.709X-DN49

ids()

modalities(id: str)

dsc_modalities(id: str)

dti_modalities(id: str)

mask(id: str)

is_mask_automated(id: str)

image(id: str)

image_unstripped(id: str)

image_DTI(id: str)

image_DSC(id: str)

clinical_info(id: str) -> ClinicalInfo

acqusition_info(id: str) -> AcquisitionInfo

subject_id(id: str)

affine(id: str)

spacing(id: str)

amid.vs_seg.dataset.VSSEG

Segmentation of vestibular schwannoma from MRI, an open annotated dataset ... (VS-SEG) [1]_.

The dataset contains 250 pairs of T1c and T2 images of the brain with the vestibular schwannoma segmentation task.

Parameters:

Name Type Description Default
root (str, Path)

path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated.

required
version str

the data version. Only has effect if the library was installed from a cloned git repository.

required
Notes

The dataset and corresponding metadata could be downloaded at the TCIA page: https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=70229053.

To download DICOM images using .tcia file, we used public build of TCIA downloader: https://github.com/ygidtu/NBIA_data_retriever_CLI.

Then, download the rest of metadata from TCIA page: - DirectoryNamesMappingModality.csv - Vestibular-Schwannoma-SEG_matrices Mar 2021.zip - Vestibular-Schwannoma-SEG contours Mar 2021.zip

and unzip the latter two .zip archives.

So the root folder should contain 3 folders and 1 .csv file: <...>/DirectoryNamesMappingModality.csv <...>/Vestibular-Schwannoma-SEG/ ├── VS-SEG-001/... ├── VS-SEG-002/... └── ... <...>/contours/ <...>/registration_matrices/

Examples:

>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = VSSEG(root='/path/to/downloaded/data/folder/')
>>> print(len(ds.ids))
# 484
>>> print(ds.image(ds.ids[0]).shape)
# (512, 512, 120)
>>> print(ds.schwannoma(ds.ids[1]).shape)
# (384, 384, 80)
References

.. [1] Shapey, Jonathan, et al. "Segmentation of vestibular schwannoma from MRI, an open annotated dataset and baseline algorithm." Scientific Data 8.1 (2021): 1-6. https://www.nature.com/articles/s41597-021-01064-w

ids()

modality(id: str)

subject_id(id: str)

image(id: str)

spacing(id: str)

The maximum relative difference in slice_locations < 1e-12, so we allow ourselves to use the common spacing for the whole 3D image.

schwannoma(id: str)

cochlea(id: str)

meningioma(id: str)

study_uid(id: str)

series_uid(id: str)

patient_id(id: str)

study_date(id: str)

amid.verse.VerSe

A Vertebral Segmentation Dataset with Fracture Grading [1]_

The dataset was used in the MICCAI-2019 and MICCAI-2020 Vertebrae Segmentation Challenges.

Parameters:

Name Type Description Default
root (str, Path)

path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated.

required
version str

the data version. Only has effect if the library was installed from a cloned git repository.

required
Notes

Download links: 2019: https://osf.io/jtfa5/ 2020: https://osf.io/4skx2/

Examples:

>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = VerSe(root='/path/to/archives/root')
>>> print(len(ds.ids))
# 374
>>> print(ds.image(ds.ids[0]).shape)
# (512, 512, 214)
References

.. [1] Löffler MT, Sekuboyina A, Jacob A, et al. A Vertebral Segmentation Dataset with Fracture Grading. Radiol Artif Intell. 2020;2(4):e190138. Published 2020 Jul 29. doi:10.1148/ryai.2020190138

ids()

image(id: str)

affine(id: str)

The 4x4 matrix that gives the image's spatial orientation

split(id: str)

The split in which this entry is contained: training, validate, test

patient(id: str)

The unique patient id

year(id: str)

The year in which this entry was published: 2019, 2020

centers(id: str)

Vertebrae centers in format {label: [x, y, z]}

masks(id: str) -> Union[np.ndarray, None]

Vertebrae masks