Datasets API
amid.amos.dataset.AMOS
AMOS provides 500 CT and 100 MRI scans collected from multi-center, multi-vendor, multi-modality, multi-phase, multi-disease patients, each with voxel-level annotations of 15 abdominal organs, providing challenging examples and test-bed for studying robust segmentation algorithms under diverse targets and scenarios. [1]
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
(str, Path)
|
Absolute path to the root containing the downloaded archive and meta. If not provided, the cache is assumed to be already populated. |
required |
Notes
Download link: https://zenodo.org/record/7262581/files/amos22.zip
Examples:
>>> # Download the archive and meta to any folder and pass the path to the constructor:
>>> ds = AMOS(root='/path/to/the/downloaded/files')
>>> print(len(ds.ids))
# 961
>>> print(ds.image(ds.ids[0]).shape)
# (768, 768, 90)
>>> print(ds.mask(ds.ids[26]).shape)
# (512, 512, 124)
References
.. [1] JI YUANFENG. (2022). Amos: A large-scale abdominal multi-organ benchmark for versatile medical image segmentation [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7262581
birth_date(id: str)
sex(id: str)
age(id: str)
manufacturer_model(id: str)
manufacturer(id: str)
acquisition_date(id: str)
site(id: str)
ids()
image(id: str)
Corresponding 3D image.
affine(id: str)
The 4x4 matrix that gives the image's spatial orientation.
mask(id: str)
image_modality(id: str)
Returns image modality, CT
or MRI
.
amid.bimcv.BIMCVCovid19
BIMCV COVID-19 Dataset, CT-images only It includes BIMCV COVID-19 positive partition (https://arxiv.org/pdf/2006.01174.pdf) and negative partion (https://ieee-dataport.org/open-access/bimcv-covid-19-large-annotated-dataset-rx-and-ct-images-covid-19-patients-0)
PCR tests are not used
GitHub page: https://github.com/BIMCV-CSUSP/BIMCV-COVID-19
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
(str, Path)
|
path to the folder containing the downloaded and parsed data. |
required |
Notes
Dataset has 2 partitions: bimcv-covid19-positive and bimcv-covid19-positive Each partition is spread over the 81 different tgz archives. The archives includes metadata about subject, sessions, and labels. Also there are some tgz archives for nifty images in nii.gz format
Examples:
>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = BIMCVCovid19(root='/path/to/downloaded/data/folder/')
>>> print(len(ds.ids))
# 201
>>> print(ds.image(ds.ids[0]).shape)
# (512, 512, 163)
>>> print(ds.is_positive(ds.ids[0]))
# True
>>> print(ds.subject_info[80])
# {'modality_dicom': "['CT']",
# 'body_parts': "[['chest']]",
# 'age': '[80]',
# 'gender': 'M'}
References
.. [1] Maria De La Iglesia Vayá, Jose Manuel Saborit, Joaquim Angel Montell, Antonio Pertusa, Aurelia Bustos, Miguel Cazorla, Joaquin Galant, Xavier Barber, Domingo Orozco-Beltrán, Francisco Garcia, Marisa Caparrós, Germán González, and Jose María Salinas. BIMCV COVID-19+: a large annotated dataset of RX and CT images from COVID-19 patients. arXiv:2006.01174, 2020. .. [2] Maria de la Iglesia Vayá, Jose Manuel Saborit-Torres, Joaquim Angel Montell Serrano, Elena Oliver-Garcia, Antonio Pertusa, Aurelia Bustos, Miguel Cazorla, Joaquin Galant, Xavier Barber, Domingo Orozco-Beltrán, Francisco García-García, Marisa Caparrós, Germán González, Jose María Salinas, 2021. BIMCV COVID-19-: a large annotated dataset of RX and CT images from COVID-19 patients. Available at: https://dx.doi.org/10.21227/m4j2-ap59.
ids()
session_id(id: str)
subject_id(id: str)
is_positive(id: str)
image(id: str)
affine(id: str)
tags(id: str) -> dict
dicom tags
label_info(id: str) -> dict
labelCUIS, Report, LocalizationsCUIS etc.
subject_info(id: str) -> dict
modality_dicom (=[CT]), body_parts(=[chest]), age, gender
age(id: str) -> int
Minimum of (possibly two) available ages. The maximum difference between max and min age for every patient is 1 year.
sex(id: str) -> str
session_info(id: str) -> dict
study_date, medical_evaluation
amid.brats2021.BraTS2021
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
(str, Path)
|
path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated. |
required |
version |
str
|
the data version. Only has effect if the library was installed from a cloned git repository. |
required |
Notes
Download links: 2021: http://www.braintumorsegmentation.org/
Examples:
>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = BraTS2021(root='/path/to/archives/root')
>>> print(len(ds.ids))
# 5880
>>> print(ds.image(ds.ids[0]).shape)
# (240, 240, 155)
References
ids()
fold(id: str)
mapping21_17(id: str) -> pd.DataFrame
subject_id(id: str) -> str
modality(id: str) -> str
image(id: str)
mask(id: str)
spacing(id: str)
Returns the voxel spacing along axes (x, y, z).
affine(id: str)
Returns 4x4 matrix that gives the image's spatial orientation.
amid.cc359.dataset.CC359
A (C)algary-(C)ampinas public brain MR dataset with (359) volumetric images [1]_.
There are three segmentation tasks on this dataset: (i) brain, (ii) hippocampus, and (iii) White-Matter (WM), Gray-Matter (WM), and Cerebrospinal Fluid (CSF) segmentation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
(str, Path)
|
path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated. |
required |
version |
str
|
the data version. Only has effect if the library was installed from a cloned git repository. |
required |
Notes
homepage (upd): https://sites.google.com/view/calgary-campinas-dataset/home homepage (old): https://miclab.fee.unicamp.br/calgary-campinas-359-updated-05092017
To obtain MR images and brain and hippocampus segmentation masks, please, follow the instructions at the download platform: https://portal.conp.ca/dataset?id=projects/calgary-campinas.
Via datalad
lib you need to download three zip archives:
- Original.zip
(the original MR images)
- hippocampus_staple.zip
(Silver-standard hippocampus masks generated using STAPLE)
- Silver-standard-machine-learning.zip
(Silver-standard brain masks generated using a machine learning method)
To the current date, WM, GM, and CSF mask could be downloaded only from the google drive: https://drive.google.com/drive/u/0/folders/0BxLb0NB2MjVZNm9JY1pWNFp6WTA?resourcekey=0-2sXMr8q-n2Nn6iY3PbBAdA.
Here you need to manually download a folder (from the google drive root above)
CC359/Reconstructed/CC359/WM-GM-CSF/
So the root folder to pass to this dataset class should contain four objects:
- three zip archives (Original.zip
, hippocampus_staple.zip
, and Silver-standard-machine-learning.zip
)
- one folder WM-GM-CSF
with the original structure:
<...>/WM-GM-CSF/CC0319_ge_3_45_M.nii.gz
<...>/WM-GM-CSF/CC0324_ge_3_56_M.nii.gz
...
Examples:
>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> cc359 = CC359(root='/path/to/downloaded/data/folder/')
>>> print(len(cc359.ids))
# 359
>>> print(cc359.image(cc359.ids[0]).shape)
# (171, 256, 256)
>>> print(cc359.wm_gm_csf(cc359.ids[80]).shape)
# (180, 240, 240)
References
.. [1] Souza, Roberto, et al. "An open, multi-vendor, multi-field-strength brain MR dataset and analysis of publicly available skull stripping methods agreement." NeuroImage 170 (2018): 482-494. https://www.sciencedirect.com/science/article/pii/S1053811917306687
ids()
vendor(id: str)
field(id: str)
age(id: str)
sex(id: str)
image(id: str)
affine(id: str)
The 4x4 matrix that gives the image's spatial orientation.
voxel_spacing(id: str)
spacing(id: str)
Returns voxel spacing along axes (x, y, z).
brain(id: str)
hippocampus(id: str)
wm_gm_csf(id: str)
amid.cl_detection.CLDetection2023
The data for the "Cephalometric Landmark Detection in Lateral X-ray Images" Challenge, held with the MICCAI-2023 conference.
Notes
The data can only be obtained by contacting the organizers by email. See the challenge home page for details.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
(str, Path)
|
path to the folder containing the raw downloaded and unarchived data. If not provided, the cache is assumed to be already populated. |
required |
version |
str
|
the data version. Only has effect if the library was installed from a cloned git repository. |
required |
Examples:
>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = CLDetection2023(root='/path/to/data/root/folder')
>>> print(len(ds.ids))
# 400
>>> print(ds.image(ds.ids[0]).shape)
# (2400, 1935)
ids()
image(id: str)
points(id: str)
spacing(id: str)
amid.crlm.CRLM
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
(str, Path)
|
path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated. |
required |
version |
str
|
the data version. Only has effect if the library was installed from a cloned git repository. |
required |
Notes
Download links: https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=89096268#89096268b2cc35fce0664a2b875b5ec675ba9446
This collection consists of DICOM images and DICOM Segmentation Objects (DSOs) for 197 patients with Colorectal Liver Metastases (CRLM). Comprised of Original DICOM CTs and Segmentations for each subject. The segmentations include 'Liver', 'Liver_Remnant' (liver that will remain after surgery based on a preoperative CT plan), 'Hepatic' and 'Portal' veins, and 'Tumor_x', where 'x' denotes the various tumor occurrences in the case
Examples:
>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = CRLM(root='/path/to/archives/root')
>>> print(len(ds.ids))
# 197
>>> print(ds.image(ds.ids[0]).shape)
# (512, 512, 52)
References
ids()
image(id: str)
mask(id: str) -> Dict[str, np.ndarray]
Returns dict: {'liver': ..., 'hepatic': ..., 'tumor_x': ...}
spacing(id: str)
Returns the voxel spacing along axes (x, y, z).
slice_locations(id: str)
affine(id: str)
Returns 4x4 matrix that gives the image's spatial orientation.
amid.ct_ich.CT_ICH
(C)omputed (T)omography Images for (I)ntracranial (H)emorrhage Detection and (S)egmentation.
This dataset contains 75 head CT scans including 36 scans for patients diagnosed with intracranial hemorrhage with the following types: Intraventricular, Intraparenchymal, Subarachnoid, Epidural and Subdural.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
(str, Path)
|
path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated. |
required |
version |
str
|
the data version. Only has effect if the library was installed from a cloned git repository. |
required |
Notes
Data can be downloaded here: https://physionet.org/content/ct-ich/1.3.1/.
Then, the folder with raw downloaded data should contain folders ct_scans
and masks
along with other files.
Examples:
>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = CT_ICH(root='/path/to/downloaded/data/folder/')
>>> print(len(ds.ids))
# 75
>>> print(ds.image(ds.ids[0]).shape)
# (512, 512, 39)
>>> print(ds.mask(ds.ids[0]).shape)
# (512, 512, 39)
ids()
image(id: str)
mask(id: str)
affine(id: str)
The 4x4 matrix that gives the image's spatial orientation.
voxel_spacing(id: str)
spacing(id: str)
Returns voxel spacing along axes (x, y, z).
age(id: str) -> float
sex(id: str) -> str
intraventricular_hemorrhage(id: str)
Returns True if hemorrhage exists and its type is intraventricular.
intraparenchymal_hemorrhage(id: str)
Returns True if hemorrhage was diagnosed and its type is intraparenchymal.
subarachnoid_hemorrhage(id: str)
Returns True if hemorrhage was diagnosed and its type is subarachnoid.
epidural_hemorrhage(id: str)
Returns True if hemorrhage was diagnosed and its type is epidural.
subdural_hemorrhage(id: str)
Returns True if hemorrhage was diagnosed and its type is subdural.
fracture(id: str)
Returns True if skull fracture was diagnosed.
notes(id: str)
Returns special notes if they exist.
hemorrhage_diagnosis_raw_metadata(id: str)
amid.crossmoda.CrossMoDA
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
(str, Path)
|
path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated. |
required |
version |
str
|
the data version. Only has effect if the library was installed from a cloned git repository. |
required |
Notes
Download links: 2021 & 2022: https://zenodo.org/record/6504722#.YsgwnNJByV4
Examples:
>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = CrossMoDA(root='/path/to/archives/root')
>>> print(len(ds.ids))
# 484
>>> print(ds.image(ds.ids[0]).shape)
# (512, 512, 214)
References
ids()
train_source_df(id: str)
image(id: str) -> Union[np.ndarray, None]
pixel_spacing(id: str)
spacing(id: str)
Returns pixel spacing along axes (x, y, z)
affine(id: str)
The 4x4 matrix that gives the image's spatial orientation
split(id: str) -> str
The split in which this entry is contained: training_source, training_target, validation
year(id: str) -> int
The year in which this entry was published: 2021 or 2022
masks(id: str) -> Union[np.ndarray, None]
Combined mask of schwannoma and cochlea (1 and 2 respectively)
koos_grade(id: str)
VS Tumour characteristic according to Koos grading scale: [1..4] or (-1 - post operative)
amid.deeplesion.DeepLesion
DeepLesion is composed of 33,688 bookmarked radiology images from 10,825 studies of 4,477 unique patients. For every bookmarked image, a bound- ing box is created to cover the target lesion based on its measured diameters [1].
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
(str, Path)
|
path to the folder containing |
required |
Notes
Dataset is available at https://nihcc.app.box.com/v/DeepLesion
To download the data we recommend using a Python script provided by the authors batch_download_zips.py
.
Once you download the data and unarchive all 56 zip archives, you should run DL_save_nifti.py
provided by the authors to convert 2D PNGs into 20094 nii.gz files.
Example
ds = DeepLesion(root='/path/to/folder') print(len(ds.ids))
20094
References
.. [1] Yan, Ke, Xiaosong Wang, Le Lu, and Ronald M. Summers. "Deeplesion: Automated deep mining, categorization and detection of significant radiology image findings using large-scale clinical lesion annotations." arXiv preprint arXiv:1710.01766 (2017).
ids()
patient_id(id: str)
study_id(id: str)
series_id(id: str)
sex(id: str)
age(id: str)
Patient Age might be different for different studies (dataset contains longitudinal records).
ct_window(id: str)
CT window extracted from DICOMs. Recall, that it is min-max values for windowing, not width-level.
affine(id: str)
spacing(id: str)
image(id: str)
Some 3D volumes are stored as separate subvolumes, e.g. ds.ids[15000] and ds.ids[15001].
train_val_test(id: str)
Authors' defined randomly generated patient-level data split, train=1, validation=2, test=3, 70/15/15 ratio.
lesion_position(id: str)
Lesion measurements as it appear in DL_info.csv, for details see https://nihcc.app.box.com/v/DeepLesion/file/306056134060 .
mask(id: str)
Mask of provided bounding boxes. Recall that bboxes annotation is very coarse, it only covers a single 2D slice.
amid.egd.EGD
The Erasmus Glioma Database (EGD): Structural MRI scans, WHO 2016 subtypes, and segmentations of 774 patients with glioma [1]_.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
(str, Path)
|
path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated. |
required |
version |
str
|
the data version. Only has effect if the library was installed from a cloned git repository. |
required |
Notes
The access to the dataset could be requested at XNAT portal [https://xnat.bmia.nl/data/archive/projects/egd].
To download the data in the compatible structure we recommend to use egd-downloader script [https://zenodo.org/record/4761089#.YtZpLtJBxhF]. Please, refer to its README for further information.
Examples:
>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> egd = EGD(root='/path/to/downloaded/data/folder/')
>>> print(len(egd.ids))
# 774
>>> print(egd.t1gd(egd.ids[215]).shape)
# (197, 233, 189)
>>> print(egd.manufacturer(egd.ids[444]))
# Philips Medical Systems
References
.. [1] van der Voort, Sebastian R., et al. "The Erasmus Glioma Database (EGD): Structural MRI scans, WHO 2016 subtypes, and segmentations of 774 patients with glioma." Data in brief 37 (2021): 107191. https://www.sciencedirect.com/science/article/pii/S2352340921004753
ids()
brain_mask(id: str)
deface_mask(id: str)
modality(id: str)
subject_id(id: str)
affine(id: str)
voxel_spacing(id: str)
spacing(id: str)
image(id: str)
genetic_and_histological_label_idh(id: str)
genetic_and_histological_label_1p19q(id: str)
genetic_and_histological_label_grade(id: str)
age(id: str)
sex(id: str)
observer(id: str)
original_scan(id: str)
manufacturer(id: str)
system(id: str)
field(id: str)
mask(id: str)
amid.flare2022.FLARE2022
An abdominal organ segmentation dataset for semi-supervised learning [1]_.
The dataset was used at the MICCAI FLARE 2022 challenge.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
(str, Path)
|
path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated. |
required |
Notes
Download link: https://flare22.grand-challenge.org/Dataset/
The root
folder should contain the two downloaded folders, namely: "Training" and "Validation".
Examples:
>>> # Place the downloaded folders in any folder and pass the path to the constructor:
>>> ds = FLARE2022(root='/path/to/downloaded/data/folder/')
>>> print(len(ds.ids))
# 2100
>>> print(ds.image(ds.ids[0]).shape)
# (512, 512, 110)
>>> print(ds.mask(ds.ids[25]).shape)
# (512, 512, 104)
References
.. [1] Ma, Jun, et al. "Fast and Low-GPU-memory abdomen CT organ segmentation: The FLARE challenge." Medical Image Analysis 82 (2022): 102616.
ids()
image(id: str)
affine(id: str)
The 4x4 matrix that gives the image's spatial orientation
mask(id: str)
amid.hcp.HCP
ids()
image(id: str)
affine(id: str)
spacing(id: str)
amid.kits.KiTS23
Kidney and Kidney Tumor Segmentation Challenge, The 2023 Kidney and Kidney Tumor Segmentation challenge (abbreviated KiTS23) is a competition in which teams compete to develop the best system for automatic semantic segmentation of kidneys, renal tumors, and renal cysts.
Competition page is https://kits-challenge.org/kits23/, official competition repository is https://github.com/neheller/kits23/.
For usage, clone the repository https://github.com/neheller/kits23/, install and run kits23_download_data
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
|
required |
Example
ids()
image(id: str)
mask(id: str)
affine(id: str)
The 4x4 matrix that gives the image's spatial orientation.
amid.lidc.dataset.LIDC
The (L)ung (I)mage (D)atabase (C)onsortium image collection (LIDC-IDRI) [1]_ consists of diagnostic and lung cancer screening thoracic computed tomography (CT) scans with marked-up annotated lesions and lung nodules segmentation task. Scans contains multiple expert annotations.
Number of CT scans: 1018.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
(str, Path)
|
path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated. |
required |
version |
str
|
the data version. Only has effect if the library was installed from a cloned git repository. |
required |
Notes
Follow the download instructions at https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=1966254.
Then, the folder with raw downloaded data should contain folder LIDC-IDRI
,
which contains folders LIDC-IDRI-*
.
Examples:
>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = LIDC(root='/path/to/downloaded/data/folder/')
>>> print(len(ds.ids))
# 1018
>>> print(ds.image(ds.ids[0]).shape)
# (512, 512, 194)
>>> print(ds.cancer(ds.ids[0]).shape)
# (512, 512, 194)
References
.. [1] Armato III, McLennan, et al. "The lung image database consortium (lidc) and image database resource initiative (idri): a completed reference database of lung nodules on ct scans." Medical physics 38(2) (2011): 915–931. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3041807/
ids()
image(id: str)
study_uid(id: str)
series_uid(id: str)
patient_id(id: str)
sop_uids(id: str)
pixel_spacing(id: str)
slice_locations(id: str)
voxel_spacing(id: str)
Returns voxel spacing along axes (x, y, z).
spacing(id: str)
Volumetric spacing of the image.
The maximum relative difference in slice_locations
< 1e-3
(except 4 images listed below),
so we allow ourselves to use the common spacing for the whole 3D image.
Note
The slice_locations
attribute typically (but not always!) has the constant step.
In LIDC dataset, only 4 images have difference in slice_locations
> 1e-3:
1.3.6.1.4.1.14519.5.2.1.6279.6001.526570782606728516388531252230
1.3.6.1.4.1.14519.5.2.1.6279.6001.329334252028672866365623335798
1.3.6.1.4.1.14519.5.2.1.6279.6001.245181799370098278918756923992
1.3.6.1.4.1.14519.5.2.1.6279.6001.103115201714075993579787468219
And these differences appear in the maximum of 3 slices.
Therefore, we consider their impact negligible.
contrast_used(id: str)
If the DICOM file for the scan had any Contrast tag, this is marked as True
.
is_from_initial(id: str)
Indicates whether or not this PatientID was tagged as part of the initial 399 release.
orientation_matrix(id: str)
sex(id: str)
age(id: str)
conv_kernel(id: str)
kvp(id: str)
tube_current(id: str)
study_date(id: str)
accession_number(id: str)
nodules(id: str)
nodules_masks(id: str)
cancer(id: str)
amid.lits.dataset.LiTS
A (Li)ver (T)umor (S)egmentation dataset [1] from Medical Segmentation Decathlon [2]
There are two segmentation tasks on this dataset: liver and liver tumor segmentation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
(str, Path)
|
path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated. |
required |
version |
str
|
the data version. Only has effect if the library was installed from a cloned git repository. |
required |
Notes
Follow the download instructions at https://competitions.codalab.org/competitions/17094.
Then, the folder with raw downloaded data should contain two zip archives with the train data
(Training_Batch1.zip
and Training_Batch2.zip
)
and a folder with the test data
(LITS-Challenge-Test-Data
).
The folder with test data should have original structure: <...>/LITS-Challenge-Test-Data/test-volume-0.nii <...>/LITS-Challenge-Test-Data/test-volume-1.nii ...
P.S. Organs boxes are also provided from a separate source https://github.com/superxuang/caffe_3d_faster_rcnn.
Examples:
>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = LiTS(root='/path/to/downloaded/data/folder/')
>>> print(len(ds.ids))
# 201
>>> print(ds.image(ds.ids[0]).shape)
# (512, 512, 163)
>>> print(ds.tumor_mask(ds.ids[80]).shape)
# (512, 512, 771)
References
.. [1] Bilic, Patrick, et al. "The liver tumor segmentation benchmark (lits)." arXiv preprint arXiv:1901.04056 (2019). .. [2] Antonelli, Michela, et al. "The medical segmentation decathlon." arXiv preprint arXiv:2106.05735 (2021).
ids()
fold(id: str)
image(id: str)
affine(id: str)
The 4x4 matrix that gives the image's spatial orientation.
voxel_spacing(id: str)
spacing(id: str)
Returns voxel spacing along axes (x, y, z).
mask(id: str)
amid.liver_medseg.LiverMedseg
LiverMedseg is a public CT segmentation dataset with 50 annotated images. Case collection of 50 livers with their segments. Images obtained from Decathlon Medical Segmentation competition
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
(str, Path)
|
path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated. |
required |
version |
str
|
the data version. Only has effect if the library was installed from a cloned git repository. |
required |
Notes
Download links: https://www.medseg.ai/database/liver-segments-50-cases
Examples:
>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = LiverMedseg(root='/path/to/archives/root')
>>> print(len(ds.ids))
# 50
>>> print(ds.image(ds.ids[0]).shape)
# (512, 512, 38)
References
ids()
image(id: str) -> np.ndarray
affine(id: str) -> np.ndarray
The 4x4 matrix that gives the image's spatial orientation.
voxel_spacing(id: str) -> tuple
spacing(id: str) -> tuple
mask(id: str) -> np.ndarray
amid.midrc.MIDRC
MIDRC-RICORD dataset 1a is a public COVID-19 CT segmentation dataset with 120 scans.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
(str, Path)
|
path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated. |
required |
version |
str
|
the data version. Only has effect if the library was installed from a cloned git repository. |
required |
Notes
Follow the download instructions at https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=80969742 Download both Images and Annotations to the same folder
Then, the folder with downloaded data should contain two paths with the data
The folder should have this structure:
<...>/
Examples:
>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = MIDRC(root='/path/to/downloaded/data/folder/')
>>> print(len(ds.ids))
155
>>> print(ds.image(ds.ids[0]).shape)
(512, 512, 112)
>>> print(ds.mask(ds.ids[80]).shape)
(6, 512, 512, 450)
References
ids()
image(id: str)
image_meta(id: str)
spacing(id: str)
labels(id: str)
mask(id: str)
amid.mood.MOOD
A (M)edival (O)ut-(O)f-(D)istribution analysis challenge [1]_
This dataset contains raw brain MRI and abdominal CT images.
Number of training samples: - Brain: 800 scans ( 256 x 256 x 256 ) - Abdominal: 550 scans ( 512 x 512 x 512 )
For each setup there are 4 toy test samples with OOD cases.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
(str, Path)
|
path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated. |
required |
version |
str
|
the data version. Only has effect if the library was installed from a cloned git repository. |
required |
Notes
Follow the download instructions at https://www.synapse.org/#!Synapse:syn21343101/wiki/599515.
Then, the folder with raw downloaded data should contain four zip archives with data
(abdom_toy.zip
, abdom_train.zip
, brain_toy.zip
and brain_train.zip
).
Examples:
>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = MOOD(root='/path/to/downloaded/data/folder/')
>>> print(len(ds.ids))
# 1358
>>> print(ds.image(ds.ids[0]).shape)
# (512, 512, 512)
>>> print(ds.pixel_label(ds.ids[0]).shape)
# (512, 512, 512)
References
.. [1] Zimmerer, Petersen, et al. "Medical Out-of-Distribution Analysis Challenge 2022." doi: 10.5281/zenodo.6362313 (2022).
ids()
fold(id: str)
Returns fold: train or toy (test).
task(id: str)
Returns task: brain (MRI) or abdominal (CT).
image(id: str)
affine(id: str)
The 4x4 matrix that gives the image's spatial orientation.
voxel_spacing(id: str)
spacing(id: str)
Returns voxel spacing along axes (x, y, z).
sample_label(id: str)
Returns sample-level OOD score for toy examples and None otherwise. 0 indicates no abnormality and 1 indicates abnormal input.
pixel_label(id: str)
Returns voxel-level OOD scores for toy examples and None otherwise. 0 indicates no abnormality and 1 indicates abnormal input.
amid.msd.MSD
MSD is a Medical Segmentaton Decathlon Challenge with 10 tasks.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
(str, Path)
|
path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated. |
required |
version |
str
|
the data version. Only has effect if the library was installed from a cloned git repository. |
required |
Notes
Data can be downloaded here:http://medicaldecathlon.com/
or here: https://msd-for-monai.s3-us-west-2.amazonaws.com/
or here: https://drive.google.com/drive/folders/1HqEgzS8BV2c7xYNrZdEAnrHk7osJJ--2/
Then, the folder with raw downloaded data should contain tar archive with data and masks
(Task03_Liver.tar
).
ids() -> tuple
train_test(id: str) -> str
task(id: str) -> str
image(id: str)
affine(id: str)
The 4x4 matrix that gives the image's spatial orientation.
image_modality(id: str) -> str
segmentation_labels(id: str) -> dict
Returns segmentation labels for the task
mask(id: str)
amid.mslub.dataset.MSLUB
ids()
image(id: str)
mask(id: str)
patient(id: str)
affine(id: str)
amid.medseg9.Medseg9
Medseg9 is a public COVID-19 CT segmentation dataset with 9 annotated images.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
(str, Path)
|
path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated. |
required |
version |
str
|
the data version. Only has effect if the library was installed from a cloned git repository. |
required |
Notes
Data can be downloaded here: http://medicalsegmentation.com/covid19/.
Then, the folder with raw downloaded data should contain three zip archives with data and masks
(rp_im.zip
, rp_lung_msk.zip
, rp_msk.zip
).
Examples:
>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = Medseg9(root='/path/to/downloaded/data/folder/')
>>> print(len(ds.ids))
# 9
>>> print(ds.image(ds.ids[0]).shape)
# (630, 630, 45)
>>> print(ds.covid(ds.ids[0]).shape)
# (630, 630, 45)
ids()
image(id: str)
affine(id: str)
The 4x4 matrix that gives the image's spatial orientation.
voxel_spacing(id: str)
spacing(id: str)
Returns voxel spacing along axes (x, y, z).
lungs(id: str)
covid(id: str)
int16 mask. 0 - normal, 1 - ground-glass opacities (матовое стекло), 2 - consolidation (консолидация).
amid.cancer_500.dataset.MoscowCancer500
The Moscow Radiology Cancer-500 dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
(str, Path)
|
path to the folder containing the raw downloaded files. If not provided, the cache is assumed to be already populated. |
required |
version |
str
|
the data version. Only has effect if the library was installed from a cloned git repository. |
required |
Notes
Download links:
https://mosmed.ai/en/datasets/mosmeddata-kt-s-priznakami-raka-legkogo-tip-viii/
After pressing the download
button you will have to provide an email address to which further instructions
will be sent.
Examples:
>>> # Place the downloaded files in any folder and pass the path to the constructor:
>>> ds = MoscowCancer500(root='/path/to/files/root')
>>> print(len(ds.ids))
# 979
>>> print(ds.image(ds.ids[0]).shape)
# (512, 512, 67)
ids()
image(id: str)
study_uid(id: str)
series_uid(id: str)
sop_uids(id: str)
pixel_spacing(id: str)
slice_locations(id: str)
orientation_matrix(id: str)
instance_numbers(id: str)
conv_kernel(id: str)
kvp(id: str)
patient_id(id: str)
study_date(id: str)
accession_number(id: str)
nodules(id: str)
amid.covid_1110.MoscowCovid1110
The Moscow Radiology COVID-19 dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
(str, Path)
|
path to the folder containing the raw downloaded files. If not provided, the cache is assumed to be already populated. |
required |
version |
str
|
the data version. Only has effect if the library was installed from a cloned git repository. |
required |
Notes
Download links: https://mosmed.ai/en/datasets/covid191110/
Examples:
>>> # Place the downloaded files in any folder and pass the path to the constructor:
>>> ds = MoscowCovid1110(root='/path/to/files/root')
>>> print(len(ds.ids))
# 1110
>>> print(ds.image(ds.ids[0]).shape)
# (512, 512, 43)
ids()
image(id: str)
affine(id: str)
label(id: str)
mask(id: str)
amid.nlst.NLST
Dataset with low-dose CT scans of 26,254 patients acquired during National Lung Screening Trial.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
(str, Path)
|
path to the folder (usually called NLST) containing the patient subfolders (like 101426). If not provided, the cache is assumed to be already populated. |
required |
version |
str
|
the data version. Only has effect if the library was installed from a cloned git repository. |
required |
Notes
Follow the download instructions at
https://wiki.cancerimagingarchive.net/display/NLST/National+Lung+Screening+Trial.
The dicoms should be placed under the following folders' structure:
<...>/
Examples:
>>> ds = NLST(root='/path/to/NLST/')
>>> print(len(ds.ids))
...
>>> print(ds.image(ds.ids[0]).shape)
...
>>> print(ds.mask(ds.ids[80]).shape)
...
References
ids()
image(id: str)
study_uid(id: str)
series_uid(id: str)
sop_uids(id: str)
pixel_spacing(id: str)
slice_locations(id: str)
orientation_matrix(id: str)
conv_kernel(id: str)
kvp(id: str)
patient_id(id: str)
study_date(id: str)
accession_number(id: str)
amid.nsclc.NSCLC
NSCLC-Radiomics is a public cell lung cancer segmentation dataset with 422 patients.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
(str, Path)
|
path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated. |
required |
version |
str
|
the data version. Only has effect if the library was installed from a cloned git repository. |
required |
Notes
Follow the download instructions at https://wiki.cancerimagingarchive.net/display/Public/NSCLC-Radiomics
The folder with downloaded data should contain two paths
The folder should have this structure:
<...>/
Examples:
>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = NSCLC(root='/path/to/downloaded/data/folder/')
>>> print(len(ds.ids))
422
>>> print(ds.image(ds.ids[0]).shape)
(512, 512, 134)
>>> print(ds.mask(ds.ids[80]).shape)
(512, 512, 108)
References
ids()
image(id: str)
image_meta(id: str)
sex(id: str) -> str
Sex of the patient.
age(id: str) -> Union[int, None]
Age of the patient, dataset contains 97 patients with unknown Age.
spacing(id: str)
mask(id: str)
lung_left(id: str)
lung_right(id: str)
lungs_total(id: str)
heart(id: str)
esophagus(id: str)
spinal_cord(id: str)
amid.rsna_bc.dataset.RSNABreastCancer
site_id(id: str)
patient_id(id: str)
image_id(id: str)
laterality(id: str)
view(id: str)
age(id: str)
cancer(id: str)
biopsy(id: str)
invasive(id: str)
BIRADS(id: str)
implant(id: str)
density(id: str)
machine_id(id: str)
prediction_id(id: str)
difficult_negative_case(id: str)
ids()
image(id: str)
padding_value(id: str)
intensity_sign(id: str)
amid.ribfrac.dataset.RibFrac
RibFrac dataset is a benchmark for developping algorithms on rib fracture detection, segmentation and classification. We hope this large-scale dataset could facilitate both clinical research for automatic rib fracture detection and diagnoses, and engineering research for 3D detection, segmentation and classification.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
(str, Path)
|
path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated. |
required |
Notes
Data downloaded from here: https://doi.org/10.5281/zenodo.3893507 -- train Part1 (300 images) https://doi.org/10.5281/zenodo.3893497 -- train Part2 (120 images) https://doi.org/10.5281/zenodo.3893495 -- val (80 images) https://zenodo.org/record/3993380 -- test (160 images without annotation)
References
Jiancheng Yang, Liang Jin, Bingbing Ni, & Ming Li. (2020). RibFrac Dataset: A Benchmark for Rib Fracture Detection, Segmentation and Classification
ids()
image(id: str)
label(id: str)
affine(id: str)
The 4x4 matrix that gives the image's spatial orientation
amid.stanford_coca.StanfordCoCa
A Stanford AIMI's Co(ronary) Ca(lcium) dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
(str, Path)
|
path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated. |
required |
version |
str
|
the data version. Only has effect if the library was installed from a cloned git repository. |
required |
Notes
Follow the download instructions at https://stanfordaimi.azurewebsites.net/datasets/e8ca74dc-8dd4-4340-815a-60b41f6cb2aa. You'll need to register and accept the terms of use. After that, copy the files from Azure:
azcopy copy 'some-generated-access-link' /path/to/downloaded/data/ --recursive=true
Then, the folder with raw downloaded data should contain two subfolders - a subset with gated coronary CT scans
and corresponding coronary calcium segmentation masks (Gated_release_final
)
and a folder with the non-gated CT scans with corresponding coronary with coronary artery calcium scores
(deidentified_nongated
).
The folder with gated data should have original structure: ./Gated_release_final/patient/0/folder-with-dcms/ ./Gated_release_final/calcium_xml/0.xml ...
The folder with nongated data should have original structure: ./deidentified_nongated/0/folder-with-dcms/ ...
Examples:
>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = StanfordCoCa(root='/path/to/downloaded/data/folder/')
>>> print(len(ds.ids))
# 971
>>> print(ds.image(ds.ids[0]).shape)
# (512, 512, 57)
ids()
image(id: str)
series_uid(id: str)
study_uid(id: str)
pixel_spacing(id: str)
slice_locations(id: str)
orientation_matrix(id: str)
calcifications(id: str)
Returns list of Calcifications
score(id: str)
amid.tbad.TBAD
A dataset of 3D Computed Tomography (CT) images for Type-B Aortic Dissection segmentation.
Notes
The data can only be obtained by contacting the authors by email. See the dataset home page for details.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
(str, Path)
|
path to the folder containing the raw downloaded files. If not provided, the cache is assumed to be already populated. |
required |
version |
str
|
the data version. Only has effect if the library was installed from a cloned git repository. |
required |
Examples:
>>> # Place the downloaded files in any folder and pass the path to the constructor:
>>> ds = TBAD(root='/path/to/files/root')
>>> print(len(ds.ids))
# 100
>>> print(ds.image(ds.ids[0]).shape)
# (512, 512, 327)
References
.. [1] Yao, Zeyang & Xie, Wen & Zhang, Jiawei & Dong, Yuhao & Qiu, Hailong & Haiyun, Yuan & Jia, Qianjun & Tianchen, Wang & Shi, Yiyi & Zhuang, Jian & Que, Lifeng & Xu, Xiaowei & Huang, Meiping. (2021). ImageTBAD: A 3D Computed Tomography Angiography Image Dataset for Automatic Segmentation of Type-B Aortic Dissection. Frontiers in Physiology. 12. 732711. 10.3389/fphys.2021.732711.
ids()
image(id: str)
affine(id: str)
The 4x4 matrix that gives the image's spatial orientation.
mask(id: str)
amid.totalsegmentator.dataset.Totalsegmentator
In 1204 CT images we segmented 104 anatomical structures (27 organs, 59 bones, 10 muscles, 8 vessels) covering a majority of relevant classes for most use cases.
The CT images were randomly sampled from clinical routine, thus representing a real world dataset which generalizes to clinical application.
The dataset contains a wide range of different pathologies, scanners, sequences and institutions. [1]
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
(str, Path)
|
absolute path to the downloaded archive. If not provided, the cache is assumed to be already populated. |
required |
Notes
Download link: https://zenodo.org/record/6802614/files/Totalsegmentator_dataset.zip
Examples:
>>> # Download the archive to any folder and pass the path to the constructor:
>>> ds = Totalsegmentator(root='/path/to/the/downloaded/archive')
>>> print(len(ds.ids))
# 1204
>>> print(ds.image(ds.ids[0]).shape)
# (294, 192, 179)
>>> print(ds.aorta(ds.ids[25]).shape)
# (320, 320, 145)
References
.. [1] Jakob Wasserthal (2022) Dataset with segmentations of 104 important anatomical structures in 1204 CT images. Available at: https://zenodo.org/record/6802614#.Y6M2MxXP1D8
ids()
image(id: str)
affine(id: str)
The 4x4 matrix that gives the image's spatial orientation
amid.upenn_gbm.upenn_gbm.UPENN_GBM
Multi-parametric magnetic resonance imaging (mpMRI) scans for de novo Glioblastoma (GBM) patients from the University of Pennsylvania Health System (UPENN-GBM). Dataset contains 630 patients.
All samples are registered to a common atlas (SRI) using a uniform preprocessing and the segmentation are aligned with them.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
(str, Path)
|
path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated. |
required |
Notes
Follow the download instructions at https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=70225642 Download to the root folder nifti images and metadata. Organise folder as folows:
<...>/
<...>/
Examples:
>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = UPENN_GBM(root='/path/to/downloaded/data/folder/')
>>> print(len(ds.ids))
# 671
>>> print(ds.image(ds.ids[215]).shape)
# (4, 240, 240, 155)
>>> print(d.acqusition_info(d.ids[215]).manufacturer)
# SIEMENS
References
.. [1] Bakas, S., Sako, C., Akbari, H., Bilello, M., Sotiras, A., Shukla, G., Rudie, J. D., Flores Santamaria, N., Fathi Kazerooni, A., Pati, S., Rathore, S., Mamourian, E., Ha, S. M., Parker, W., Doshi, J., Baid, U., Bergman, M., Binder, Z. A., Verma, R., … Davatzikos, C. (2021). Multi-parametric magnetic resonance imaging (mpMRI) scans for de novo Glioblastoma (GBM) patients from the University of Pennsylvania Health System (UPENN-GBM) (Version 2) [Data set]. The Cancer Imaging Archive. https://doi.org/10.7937/TCIA.709X-DN49
ids()
modalities(id: str)
dsc_modalities(id: str)
dti_modalities(id: str)
mask(id: str)
is_mask_automated(id: str)
image(id: str)
image_unstripped(id: str)
image_DTI(id: str)
image_DSC(id: str)
clinical_info(id: str) -> ClinicalInfo
acqusition_info(id: str) -> AcquisitionInfo
subject_id(id: str)
affine(id: str)
spacing(id: str)
amid.vs_seg.dataset.VSSEG
Segmentation of vestibular schwannoma from MRI, an open annotated dataset ... (VS-SEG) [1]_.
The dataset contains 250 pairs of T1c and T2 images of the brain with the vestibular schwannoma segmentation task.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
(str, Path)
|
path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated. |
required |
version |
str
|
the data version. Only has effect if the library was installed from a cloned git repository. |
required |
Notes
The dataset and corresponding metadata could be downloaded at the TCIA page: https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=70229053.
To download DICOM images using .tcia
file, we used public build of TCIA downloader:
https://github.com/ygidtu/NBIA_data_retriever_CLI.
Then, download the rest of metadata from TCIA page:
- DirectoryNamesMappingModality.csv
- Vestibular-Schwannoma-SEG_matrices Mar 2021.zip
- Vestibular-Schwannoma-SEG contours Mar 2021.zip
and unzip the latter two .zip
archives.
So the root
folder should contain 3 folders and 1 .csv
file:
<...>/DirectoryNamesMappingModality.csv
<...>/Vestibular-Schwannoma-SEG/
├── VS-SEG-001/...
├── VS-SEG-002/...
└── ...
<...>/contours/
<...>/registration_matrices/
Examples:
>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = VSSEG(root='/path/to/downloaded/data/folder/')
>>> print(len(ds.ids))
# 484
>>> print(ds.image(ds.ids[0]).shape)
# (512, 512, 120)
>>> print(ds.schwannoma(ds.ids[1]).shape)
# (384, 384, 80)
References
.. [1] Shapey, Jonathan, et al. "Segmentation of vestibular schwannoma from MRI, an open annotated dataset and baseline algorithm." Scientific Data 8.1 (2021): 1-6. https://www.nature.com/articles/s41597-021-01064-w
ids()
modality(id: str)
subject_id(id: str)
image(id: str)
spacing(id: str)
The maximum relative difference in slice_locations
< 1e-12,
so we allow ourselves to use the common spacing for the whole 3D image.
schwannoma(id: str)
cochlea(id: str)
meningioma(id: str)
study_uid(id: str)
series_uid(id: str)
patient_id(id: str)
study_date(id: str)
amid.verse.VerSe
A Vertebral Segmentation Dataset with Fracture Grading [1]_
The dataset was used in the MICCAI-2019 and MICCAI-2020 Vertebrae Segmentation Challenges.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
(str, Path)
|
path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated. |
required |
version |
str
|
the data version. Only has effect if the library was installed from a cloned git repository. |
required |
Notes
Download links: 2019: https://osf.io/jtfa5/ 2020: https://osf.io/4skx2/
Examples:
>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = VerSe(root='/path/to/archives/root')
>>> print(len(ds.ids))
# 374
>>> print(ds.image(ds.ids[0]).shape)
# (512, 512, 214)
References
.. [1] Löffler MT, Sekuboyina A, Jacob A, et al. A Vertebral Segmentation Dataset with Fracture Grading. Radiol Artif Intell. 2020;2(4):e190138. Published 2020 Jul 29. doi:10.1148/ryai.2020190138
ids()
image(id: str)
affine(id: str)
The 4x4 matrix that gives the image's spatial orientation
split(id: str)
The split in which this entry is contained: training, validate, test
patient(id: str)
The unique patient id
year(id: str)
The year in which this entry was published: 2019, 2020
centers(id: str)
Vertebrae centers in format {label: [x, y, z]}
masks(id: str) -> Union[np.ndarray, None]
Vertebrae masks