Datasets API
        amid.amos.dataset.AMOS
  AMOS provides 500 CT and 100 MRI scans collected from multi-center, multi-vendor, multi-modality, multi-phase, multi-disease patients, each with voxel-level annotations of 15 abdominal organs, providing challenging examples and test-bed for studying robust segmentation algorithms under diverse targets and scenarios. [1]
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| root | str, Path | Absolute path to the root containing the downloaded archive and meta. If not provided, the cache is assumed to be already populated. | required | 
Notes
Download link: https://zenodo.org/record/7262581/files/amos22.zip
Examples:
>>> # Download the archive and meta to any folder and pass the path to the constructor:
>>> ds = AMOS(root='/path/to/the/downloaded/files')
>>> print(len(ds.ids))
# 961
>>> print(ds.image(ds.ids[0]).shape)
# (768, 768, 90)
>>> print(ds.mask(ds.ids[26]).shape)
# (512, 512, 124)
References
.. [1] JI YUANFENG. (2022). Amos: A large-scale abdominal multi-organ benchmark for versatile medical image segmentation [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7262581
ids()
  
image(id: str)
  
affine(id: str)
  The 4x4 matrix that gives the image's spatial orientation
mask(id: str)
  
        amid.bimcv.BIMCVCovid19
  BIMCV COVID-19 Dataset, CT-images only It includes BIMCV COVID-19 positive partition (https://arxiv.org/pdf/2006.01174.pdf) and negative partion (https://ieee-dataport.org/open-access/bimcv-covid-19-large-annotated-dataset-rx-and-ct-images-covid-19-patients-0)
PCR tests are not used
GitHub page: https://github.com/BIMCV-CSUSP/BIMCV-COVID-19
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| root | str, Path | path to the folder containing the downloaded and parsed data. | required | 
Notes
Dataset has 2 partitions: bimcv-covid19-positive and bimcv-covid19-positive Each partition is spread over the 81 different tgz archives. The archives includes metadata about subject, sessions, and labels. Also there are some tgz archives for nifty images in nii.gz format
Examples:
>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = BIMCVCovid19(root='/path/to/downloaded/data/folder/')
>>> print(len(ds.ids))
# 201
>>> print(ds.image(ds.ids[0]).shape)
# (512, 512, 163)
>>> print(ds.is_positive(ds.ids[0]))
# True
>>> print(ds.subject_info[80])
# {'modality_dicom': "['CT']",
#  'body_parts': "[['chest']]",
#  'age': '[80]',
#  'gender': 'M'}
References
.. [1] Maria De La Iglesia Vayá, Jose Manuel Saborit, Joaquim Angel Montell, Antonio Pertusa, Aurelia Bustos, Miguel Cazorla, Joaquin Galant, Xavier Barber, Domingo Orozco-Beltrán, Francisco Garcia, Marisa Caparrós, Germán González, and Jose María Salinas. BIMCV COVID-19+: a large annotated dataset of RX and CT images from COVID-19 patients. arXiv:2006.01174, 2020. .. [2] Maria de la Iglesia Vayá, Jose Manuel Saborit-Torres, Joaquim Angel Montell Serrano, Elena Oliver-Garcia, Antonio Pertusa, Aurelia Bustos, Miguel Cazorla, Joaquin Galant, Xavier Barber, Domingo Orozco-Beltrán, Francisco García-García, Marisa Caparrós, Germán González, Jose María Salinas, 2021. BIMCV COVID-19-: a large annotated dataset of RX and CT images from COVID-19 patients. Available at: https://dx.doi.org/10.21227/m4j2-ap59.
ids()
  
session_id(id: str)
  
subject_id(id: str)
  
is_positive(id: str)
  
image(id: str)
  
affine(id: str)
  
tags(id: str) -> dict
  dicom tags
label_info(id: str) -> dict
  labelCUIS, Report, LocalizationsCUIS etc.
subject_info(id: str) -> dict
  modality_dicom (=[CT]), body_parts(=[chest]), age, gender
session_info(id: str) -> dict
  study_date, medical_evaluation
        amid.brats2021.BraTS2021
  Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| root | str, Path | path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated. | required | 
| version | str | the data version. Only has effect if the library was installed from a cloned git repository. | required | 
Notes
Download links: 2021: http://www.braintumorsegmentation.org/
Examples:
>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = BraTS2021(root='/path/to/archives/root')
>>> print(len(ds.ids))
# 5880
>>> print(ds.image(ds.ids[0]).shape)
# (240, 240, 155)
References
ids()
  
fold(id: str)
  
mapping21_17(id: str) -> pd.DataFrame
  
subject_id(id: str) -> str
  
modality(id: str) -> str
  
image(id: str)
  
mask(id: str)
  
spacing(id: str)
  Returns the voxel spacing along axes (x, y, z).
affine(id: str)
  Returns 4x4 matrix that gives the image's spatial orientation.
        amid.cc359.dataset.CC359
  A (C)algary-(C)ampinas public brain MR dataset with (359) volumetric images [1]_.
There are three segmentation tasks on this dataset: (i) brain, (ii) hippocampus, and (iii) White-Matter (WM), Gray-Matter (WM), and Cerebrospinal Fluid (CSF) segmentation.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| root | str, Path | path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated. | required | 
| version | str | the data version. Only has effect if the library was installed from a cloned git repository. | required | 
Notes
homepage (upd): https://sites.google.com/view/calgary-campinas-dataset/home homepage (old): https://miclab.fee.unicamp.br/calgary-campinas-359-updated-05092017
To obtain MR images and brain and hippocampus segmentation masks, please, follow the instructions at the download platform: https://portal.conp.ca/dataset?id=projects/calgary-campinas.
Via datalad lib you need to download three zip archives:
    - Original.zip (the original MR images)
    - hippocampus_staple.zip (Silver-standard hippocampus masks generated using STAPLE)
    - Silver-standard-machine-learning.zip (Silver-standard brain masks generated using a machine learning method)
To the current date, WM, GM, and CSF mask could be downloaded only from the google drive: https://drive.google.com/drive/u/0/folders/0BxLb0NB2MjVZNm9JY1pWNFp6WTA?resourcekey=0-2sXMr8q-n2Nn6iY3PbBAdA.
Here you need to manually download a folder (from the google drive root above)
CC359/Reconstructed/CC359/WM-GM-CSF/
So the root folder to pass to this dataset class should contain four objects:
    - three zip archives (Original.zip, hippocampus_staple.zip, and Silver-standard-machine-learning.zip)
    - one folder WM-GM-CSF with the original structure:
        <...>/WM-GM-CSF/CC0319_ge_3_45_M.nii.gz
        <...>/WM-GM-CSF/CC0324_ge_3_56_M.nii.gz
        ...
Examples:
>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> cc359 = CC359(root='/path/to/downloaded/data/folder/')
>>> print(len(cc359.ids))
# 359
>>> print(cc359.image(cc359.ids[0]).shape)
# (171, 256, 256)
>>> print(cc359.wm_gm_csf(cc359.ids[80]).shape)
# (180, 240, 240)
References
.. [1] Souza, Roberto, et al. "An open, multi-vendor, multi-field-strength brain MR dataset and analysis of publicly available skull stripping methods agreement." NeuroImage 170 (2018): 482-494. https://www.sciencedirect.com/science/article/pii/S1053811917306687
ids()
  
vendor(id: str)
  
field(id: str)
  
age(id: str)
  
gender(id: str)
  
image(id: str)
  
affine(id: str)
  The 4x4 matrix that gives the image's spatial orientation.
voxel_spacing(id: str)
  
spacing(id: str)
  Returns voxel spacing along axes (x, y, z).
brain(id: str)
  
hippocampus(id: str)
  
wm_gm_csf(id: str)
  
        amid.cl_detection.CLDetection2023
  The data for the "Cephalometric Landmark Detection in Lateral X-ray Images" Challenge, held with the MICCAI-2023 conference.
Notes
The data can only be obtained by contacting the organizers by email. See the challenge home page for details.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| root | str, Path | path to the folder containing the raw downloaded and unarchived data. If not provided, the cache is assumed to be already populated. | required | 
| version | str | the data version. Only has effect if the library was installed from a cloned git repository. | required | 
Examples:
>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = CLDetection2023(root='/path/to/data/root/folder')
>>> print(len(ds.ids))
# 400
>>> print(ds.image(ds.ids[0]).shape)
# (2400, 1935)
ids()
  
image(id: str)
  
points(id: str)
  
spacing(id: str)
  
normalizer(id: str)
  
  
      classmethod
  
  
        amid.crlm.CRLM
  Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| root | str, Path | path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated. | required | 
| version | str | the data version. Only has effect if the library was installed from a cloned git repository. | required | 
Notes
Download links: https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=89096268#89096268b2cc35fce0664a2b875b5ec675ba9446
This collection consists of DICOM images and DICOM Segmentation Objects (DSOs) for 197 patients with Colorectal Liver Metastases (CRLM). Comprised of Original DICOM CTs and Segmentations for each subject. The segmentations include 'Liver', 'Liver_Remnant' (liver that will remain after surgery based on a preoperative CT plan), 'Hepatic' and 'Portal' veins, and 'Tumor_x', where 'x' denotes the various tumor occurrences in the case
Examples:
>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = CRLM(root='/path/to/archives/root')
>>> print(len(ds.ids))
# 197
>>> print(ds.image(ds.ids[0]).shape)
# (512, 512, 52)
References
ids()
  
image(id: str)
  
mask(id: str) -> Dict[str, np.ndarray]
  Returns dict: {'liver': ..., 'hepatic': ..., 'tumor_x': ...}
spacing(id: str)
  Returns the voxel spacing along axes (x, y, z).
slice_locations(id: str)
  
affine(id: str)
  Returns 4x4 matrix that gives the image's spatial orientation.
        amid.ct_ich.CT_ICH
  (C)omputed (T)omography Images for (I)ntracranial (H)emorrhage Detection and (S)egmentation.
This dataset contains 75 head CT scans including 36 scans for patients diagnosed with intracranial hemorrhage with the following types: Intraventricular, Intraparenchymal, Subarachnoid, Epidural and Subdural.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| root | str, Path | path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated. | required | 
| version | str | the data version. Only has effect if the library was installed from a cloned git repository. | required | 
Notes
Data can be downloaded here: https://physionet.org/content/ct-ich/1.3.1/.
Then, the folder with raw downloaded data should contain folders ct_scans and masks along with other files.
Examples:
>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = CT_ICH(root='/path/to/downloaded/data/folder/')
>>> print(len(ds.ids))
# 75
>>> print(ds.image(ds.ids[0]).shape)
# (512, 512, 39)
>>> print(ds.mask(ds.ids[0]).shape)
# (512, 512, 39)
ids()
  
image(id: str)
  
mask(id: str)
  
affine(id: str)
  The 4x4 matrix that gives the image's spatial orientation.
voxel_spacing(id: str)
  
spacing(id: str)
  Returns voxel spacing along axes (x, y, z).
age(id: str)
  
gender(id: str)
  
intraventricular_hemorrhage(id: str)
  Returns True if hemorrhage exists and its type is intraventricular.
intraparenchymal_hemorrhage(id: str)
  Returns True if hemorrhage was diagnosed and its type is intraparenchymal.
subarachnoid_hemorrhage(id: str)
  Returns True if hemorrhage was diagnosed and its type is subarachnoid.
epidural_hemorrhage(id: str)
  Returns True if hemorrhage was diagnosed and its type is epidural.
subdural_hemorrhage(id: str)
  Returns True if hemorrhage was diagnosed and its type is subdural.
fracture(id: str)
  Returns True if skull fracture was diagnosed.
notes(id: str)
  Returns special notes if they exist.
hemorrhage_diagnosis_raw_metadata(id: str)
  
        amid.crossmoda.CrossMoDA
  Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| root | str, Path | path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated. | required | 
| version | str | the data version. Only has effect if the library was installed from a cloned git repository. | required | 
Notes
Download links: 2021 & 2022: https://zenodo.org/record/6504722#.YsgwnNJByV4
Examples:
>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = CrossMoDA(root='/path/to/archives/root')
>>> print(len(ds.ids))
# 484
>>> print(ds.image(ds.ids[0]).shape)
# (512, 512, 214)
References
ids()
  
train_source_df(id: str)
  
image(id: str) -> Union[np.ndarray, None]
  
pixel_spacing(id: str)
  
spacing(id: str)
  Returns pixel spacing along axes (x, y, z)
affine(id: str)
  The 4x4 matrix that gives the image's spatial orientation
split(id: str) -> str
  The split in which this entry is contained: training_source, training_target, validation
year(id: str) -> int
  The year in which this entry was published: 2021 or 2022
masks(id: str) -> Union[np.ndarray, None]
  Combined mask of schwannoma and cochlea (1 and 2 respectively)
koos_grade(id: str)
  VS Tumour characteristic according to Koos grading scale: [1..4] or (-1 - post operative)
        amid.egd.EGD
  The Erasmus Glioma Database (EGD): Structural MRI scans, WHO 2016 subtypes, and segmentations of 774 patients with glioma [1]_.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| root | str, Path | path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated. | required | 
| version | str | the data version. Only has effect if the library was installed from a cloned git repository. | required | 
Notes
The access to the dataset could be requested at XNAT portal [https://xnat.bmia.nl/data/archive/projects/egd].
To download the data in the compatible structure we recommend to use egd-downloader script [https://zenodo.org/record/4761089#.YtZpLtJBxhF]. Please, refer to its README for further information.
Examples:
>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> egd = EGD(root='/path/to/downloaded/data/folder/')
>>> print(len(egd.ids))
# 774
>>> print(egd.t1gd(egd.ids[215]).shape)
# (197, 233, 189)
>>> print(egd.manufacturer(egd.ids[444]))
# Philips Medical Systems
References
.. [1] van der Voort, Sebastian R., et al. "The Erasmus Glioma Database (EGD): Structural MRI scans, WHO 2016 subtypes, and segmentations of 774 patients with glioma." Data in brief 37 (2021): 107191. https://www.sciencedirect.com/science/article/pii/S2352340921004753
ids()
  
brain_mask(id: str)
  
deface_mask(id: str)
  
modality(id: str)
  
subject_id(id: str)
  
affine(id: str)
  
voxel_spacing(id: str)
  
spacing(id: str)
  
image(id: str)
  
genetic_and_histological_label_idh(id: str)
  
genetic_and_histological_label_1p19q(id: str)
  
genetic_and_histological_label_grade(id: str)
  
age(id: str)
  
sex(id: str)
  
observer(id: str)
  
original_scan(id: str)
  
manufacturer(id: str)
  
system(id: str)
  
field(id: str)
  
mask(id: str)
  
        amid.flare2022.FLARE2022
  An abdominal organ segmentation dataset for semi-supervised learning [1]_.
The dataset was used at the MICCAI FLARE 2022 challenge.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| root | str, Path | path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated. | required | 
Notes
Download link: https://flare22.grand-challenge.org/Dataset/
The root folder should contain the two downloaded folders, namely: "Training" and "Validation".
Examples:
>>> # Place the downloaded folders in any folder and pass the path to the constructor:
>>> ds = FLARE2022(root='/path/to/downloaded/data/folder/')
>>> print(len(ds.ids))
# 2100
>>> print(ds.image(ds.ids[0]).shape)
# (512, 512, 110)
>>> print(ds.mask(ds.ids[25]).shape)
# (512, 512, 104)
References
.. [1] Ma, Jun, et al. "Fast and Low-GPU-memory abdomen CT organ segmentation: The FLARE challenge." Medical Image Analysis 82 (2022): 102616.
ids()
  
image(id: str)
  
affine(id: str)
  The 4x4 matrix that gives the image's spatial orientation
mask(id: str)
  
        amid.lidc.dataset.LIDC
  The (L)ung (I)mage (D)atabase (C)onsortium image collection (LIDC-IDRI) [1]_ consists of diagnostic and lung cancer screening thoracic computed tomography (CT) scans with marked-up annotated lesions and lung nodules segmentation task. Scans contains multiple expert annotations.
Number of CT scans: 1018.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| root | str, Path | path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated. | required | 
| version | str | the data version. Only has effect if the library was installed from a cloned git repository. | required | 
Notes
Follow the download instructions at https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=1966254.
Then, the folder with raw downloaded data should contain folder LIDC-IDRI,
which contains folders LIDC-IDRI-*.
Examples:
>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = LIDC(root='/path/to/downloaded/data/folder/')
>>> print(len(ds.ids))
# 1018
>>> print(ds.image(ds.ids[0]).shape)
# (512, 512, 194)
>>> print(ds.cancer(ds.ids[0]).shape)
# (512, 512, 194)
References
.. [1] Armato III, McLennan, et al. "The lung image database consortium (lidc) and image database resource initiative (idri): a completed reference database of lung nodules on ct scans." Medical physics 38(2) (2011): 915–931. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3041807/
ids()
  
image(id: str)
  
study_uid(id: str)
  
series_uid(id: str)
  
patient_id(id: str)
  
sop_uids(id: str)
  
pixel_spacing(id: str)
  
slice_locations(id: str)
  
voxel_spacing(id: str)
  Returns voxel spacing along axes (x, y, z).
spacing(id: str)
  Volumetric spacing of the image.
The maximum relative difference in slice_locations < 1e-3
(except 4 images listed below),
so we allow ourselves to use the common spacing for the whole 3D image.
Note
The slice_locations attribute typically (but not always!) has the constant step.
In LIDC dataset, only 4 images have difference in slice_locations > 1e-3:
    1.3.6.1.4.1.14519.5.2.1.6279.6001.526570782606728516388531252230
    1.3.6.1.4.1.14519.5.2.1.6279.6001.329334252028672866365623335798
    1.3.6.1.4.1.14519.5.2.1.6279.6001.245181799370098278918756923992
    1.3.6.1.4.1.14519.5.2.1.6279.6001.103115201714075993579787468219
And these differences appear in the maximum of 3 slices.
Therefore, we consider their impact negligible.
contrast_used(id: str)
  If the DICOM file for the scan had any Contrast tag, this is marked as True.
is_from_initial(id: str)
  Indicates whether or not this PatientID was tagged as part of the initial 399 release.
orientation_matrix(id: str)
  
conv_kernel(id: str)
  
kvp(id: str)
  
study_date(id: str)
  
accession_number(id: str)
  
nodules(id: str)
  
nodules_masks(id: str)
  
cancer(id: str)
  
        amid.lits.dataset.LiTS
  A (Li)ver (T)umor (S)egmentation dataset [1] from Medical Segmentation Decathlon [2]
There are two segmentation tasks on this dataset: liver and liver tumor segmentation.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| root | str, Path | path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated. | required | 
| version | str | the data version. Only has effect if the library was installed from a cloned git repository. | required | 
Notes
Follow the download instructions at https://competitions.codalab.org/competitions/17094.
Then, the folder with raw downloaded data should contain two zip archives with the train data
(Training_Batch1.zip and Training_Batch2.zip)
and a folder with the test data
(LITS-Challenge-Test-Data).
The folder with test data should have original structure: <...>/LITS-Challenge-Test-Data/test-volume-0.nii <...>/LITS-Challenge-Test-Data/test-volume-1.nii ...
P.S. Organs boxes are also provided from a separate source https://github.com/superxuang/caffe_3d_faster_rcnn.
Examples:
>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = LiTS(root='/path/to/downloaded/data/folder/')
>>> print(len(ds.ids))
# 201
>>> print(ds.image(ds.ids[0]).shape)
# (512, 512, 163)
>>> print(ds.tumor_mask(ds.ids[80]).shape)
# (512, 512, 771)
References
.. [1] Bilic, Patrick, et al. "The liver tumor segmentation benchmark (lits)." arXiv preprint arXiv:1901.04056 (2019). .. [2] Antonelli, Michela, et al. "The medical segmentation decathlon." arXiv preprint arXiv:2106.05735 (2021).
ids()
  
fold(id: str)
  
image(id: str)
  
affine(id: str)
  The 4x4 matrix that gives the image's spatial orientation.
voxel_spacing(id: str)
  
spacing(id: str)
  Returns voxel spacing along axes (x, y, z).
mask(id: str)
  
        amid.liver_medseg.LiverMedseg
  LiverMedseg is a public CT segmentation dataset with 50 annotated images. Case collection of 50 livers with their segments. Images obtained from Decathlon Medical Segmentation competition
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| root | str, Path | path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated. | required | 
| version | str | the data version. Only has effect if the library was installed from a cloned git repository. | required | 
Notes
Download links: https://www.medseg.ai/database/liver-segments-50-cases
Examples:
>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = LiverMedseg(root='/path/to/archives/root')
>>> print(len(ds.ids))
# 50
>>> print(ds.image(ds.ids[0]).shape)
# (512, 512, 38)
References
ids()
  
image(id: str) -> np.ndarray
  
affine(id: str) -> np.ndarray
  The 4x4 matrix that gives the image's spatial orientation.
voxel_spacing(id: str) -> tuple
  
spacing(id: str) -> tuple
  
mask(id: str) -> np.ndarray
  
        amid.midrc.MIDRC
  MIDRC-RICORD dataset 1a is a public COVID-19 CT segmentation dataset with 120 scans.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| root | str, Path | path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated. | required | 
| version | str | the data version. Only has effect if the library was installed from a cloned git repository. | required | 
Notes
Follow the download instructions at https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=80969742 Download both Images and Annotations to the same folder
Then, the folder with downloaded data should contain two paths with the data
The folder should have this structure:
    <...>/
Examples:
>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = MIDRC(root='/path/to/downloaded/data/folder/')
>>> print(len(ds.ids))
 155
>>> print(ds.image(ds.ids[0]).shape)
 (512, 512, 112)
>>> print(ds.mask(ds.ids[80]).shape)
 (6, 512, 512, 450)
References
ids()
  
image(id: str)
  
image_meta(id: str)
  
spacing(id: str)
  
labels(id: str)
  
mask(id: str)
  
        amid.mood.MOOD
  A (M)edival (O)ut-(O)f-(D)istribution analysis challenge [1]_
This dataset contains raw brain MRI and abdominal CT images.
Number of training samples: - Brain: 800 scans ( 256 x 256 x 256 ) - Abdominal: 550 scans ( 512 x 512 x 512 )
For each setup there are 4 toy test samples with OOD cases.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| root | str, Path | path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated. | required | 
| version | str | the data version. Only has effect if the library was installed from a cloned git repository. | required | 
Notes
Follow the download instructions at https://www.synapse.org/#!Synapse:syn21343101/wiki/599515.
Then, the folder with raw downloaded data should contain four zip archives with data
(abdom_toy.zip, abdom_train.zip, brain_toy.zip and brain_train.zip).
Examples:
>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = MOOD(root='/path/to/downloaded/data/folder/')
>>> print(len(ds.ids))
# 1358
>>> print(ds.image(ds.ids[0]).shape)
# (512, 512, 512)
>>> print(ds.pixel_label(ds.ids[0]).shape)
# (512, 512, 512)
References
.. [1] Zimmerer, Petersen, et al. "Medical Out-of-Distribution Analysis Challenge 2022." doi: 10.5281/zenodo.6362313 (2022).
ids()
  
fold(id: str)
  Returns fold: train or toy (test).
task(id: str)
  Returns task: brain (MRI) or abdominal (CT).
image(id: str)
  
affine(id: str)
  The 4x4 matrix that gives the image's spatial orientation.
voxel_spacing(id: str)
  
spacing(id: str)
  Returns voxel spacing along axes (x, y, z).
sample_label(id: str)
  Returns sample-level OOD score for toy examples and None otherwise. 0 indicates no abnormality and 1 indicates abnormal input.
pixel_label(id: str)
  Returns voxel-level OOD scores for toy examples and None otherwise. 0 indicates no abnormality and 1 indicates abnormal input.
        amid.medseg9.Medseg9
  Medseg9 is a public COVID-19 CT segmentation dataset with 9 annotated images.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| root | str, Path | path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated. | required | 
| version | str | the data version. Only has effect if the library was installed from a cloned git repository. | required | 
Notes
Data can be downloaded here: http://medicalsegmentation.com/covid19/.
Then, the folder with raw downloaded data should contain three zip archives with data and masks
(rp_im.zip, rp_lung_msk.zip, rp_msk.zip).
Examples:
>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = Medseg9(root='/path/to/downloaded/data/folder/')
>>> print(len(ds.ids))
# 9
>>> print(ds.image(ds.ids[0]).shape)
# (630, 630, 45)
>>> print(ds.covid(ds.ids[0]).shape)
# (630, 630, 45)
ids()
  
image(id: str)
  
affine(id: str)
  The 4x4 matrix that gives the image's spatial orientation.
voxel_spacing(id: str)
  
spacing(id: str)
  Returns voxel spacing along axes (x, y, z).
lungs(id: str)
  
covid(id: str)
  int16 mask. 0 - normal, 1 - ground-glass opacities (матовое стекло), 2 - consolidation (консолидация).
        amid.cancer_500.dataset.MoscowCancer500
  The Moscow Radiology Cancer-500 dataset.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| root | str, Path | path to the folder containing the raw downloaded files. If not provided, the cache is assumed to be already populated. | required | 
| version | str | the data version. Only has effect if the library was installed from a cloned git repository. | required | 
Notes
Download links:
https://mosmed.ai/en/datasets/mosmeddata-kt-s-priznakami-raka-legkogo-tip-viii/
After pressing the download button you will have to provide an email address to which further instructions
will be sent.
Examples:
>>> # Place the downloaded files in any folder and pass the path to the constructor:
>>> ds = MoscowCancer500(root='/path/to/files/root')
>>> print(len(ds.ids))
# 979
>>> print(ds.image(ds.ids[0]).shape)
# (512, 512, 67)
ids()
  
image(id: str)
  
study_uid(id: str)
  
series_uid(id: str)
  
sop_uids(id: str)
  
pixel_spacing(id: str)
  
slice_locations(id: str)
  
orientation_matrix(id: str)
  
instance_numbers(id: str)
  
conv_kernel(id: str)
  
kvp(id: str)
  
patient_id(id: str)
  
study_date(id: str)
  
accession_number(id: str)
  
nodules(id: str)
  
        amid.covid_1110.MoscowCovid1110
  The Moscow Radiology COVID-19 dataset.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| root | str, Path | path to the folder containing the raw downloaded files. If not provided, the cache is assumed to be already populated. | required | 
| version | str | the data version. Only has effect if the library was installed from a cloned git repository. | required | 
Notes
Download links: https://mosmed.ai/en/datasets/covid191110/
Examples:
>>> # Place the downloaded files in any folder and pass the path to the constructor:
>>> ds = MoscowCovid1110(root='/path/to/files/root')
>>> print(len(ds.ids))
# 1110
>>> print(ds.image(ds.ids[0]).shape)
# (512, 512, 43)
ids()
  
image(id: str)
  
affine(id: str)
  
label(id: str)
  
mask(id: str)
  
        amid.nlst.NLST
  Dataset with low-dose CT scans of 26,254 patients acquired during National Lung Screening Trial.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| root | str, Path | path to the folder (usually called NLST) containing the patient subfolders (like 101426). If not provided, the cache is assumed to be already populated. | required | 
| version | str | the data version. Only has effect if the library was installed from a cloned git repository. | required | 
Notes
Follow the download instructions at
https://wiki.cancerimagingarchive.net/display/NLST/National+Lung+Screening+Trial.
The dicoms should be placed under the following folders' structure:
    <...>/
Examples:
>>> ds = NLST(root='/path/to/NLST/')
>>> print(len(ds.ids))
 ...
>>> print(ds.image(ds.ids[0]).shape)
 ...
>>> print(ds.mask(ds.ids[80]).shape)
 ...
References
ids()
  
image(id: str)
  
study_uid(id: str)
  
series_uid(id: str)
  
sop_uids(id: str)
  
pixel_spacing(id: str)
  
slice_locations(id: str)
  
orientation_matrix(id: str)
  
conv_kernel(id: str)
  
kvp(id: str)
  
patient_id(id: str)
  
study_date(id: str)
  
accession_number(id: str)
  
        amid.nsclc.NSCLC
  NSCLC-Radiomics is a public cell lung cancer segmentation dataset with 422 patients.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| root | str, Path | path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated. | required | 
| version | str | the data version. Only has effect if the library was installed from a cloned git repository. | required | 
Notes
Follow the download instructions at https://wiki.cancerimagingarchive.net/display/Public/NSCLC-Radiomics
The folder with downloaded data should contain two paths
The folder should have this structure:
    <...>/
Examples:
>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = NSCLC(root='/path/to/downloaded/data/folder/')
>>> print(len(ds.ids))
 422
>>> print(ds.image(ds.ids[0]).shape)
 (512, 512, 134)
>>> print(ds.mask(ds.ids[80]).shape)
 (512, 512, 108)
References
ids()
  
image(id: str)
  
image_meta(id: str)
  
spacing(id: str)
  
mask(id: str)
  
lung_left(id: str)
  
lung_right(id: str)
  
lungs_total(id: str)
  
heart(id: str)
  
esophagus(id: str)
  
spinal_cord(id: str)
  
        amid.rsna_bc.dataset.RSNABreastCancer
  
ids()
  
image(id: str)
  
padding_value(id: str)
  
intensity_sign(id: str)
  
        amid.stanford_coca.StanfordCoCa
  A Stanford AIMI's Co(ronary) Ca(lcium) dataset.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| root | str, Path | path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated. | required | 
| version | str | the data version. Only has effect if the library was installed from a cloned git repository. | required | 
Notes
Follow the download instructions at https://stanfordaimi.azurewebsites.net/datasets/e8ca74dc-8dd4-4340-815a-60b41f6cb2aa. You'll need to register and accept the terms of use. After that, copy the files from Azure:
azcopy copy 'some-generated-access-link' /path/to/downloaded/data/ --recursive=true
Then, the folder with raw downloaded data should contain two subfolders - a subset with gated coronary CT scans
and corresponding coronary calcium segmentation masks (Gated_release_final)
and a folder with the non-gated CT scans with corresponding coronary with coronary artery calcium scores
(deidentified_nongated).
The folder with gated data should have original structure: ./Gated_release_final/patient/0/folder-with-dcms/ ./Gated_release_final/calcium_xml/0.xml ...
The folder with nongated data should have original structure: ./deidentified_nongated/0/folder-with-dcms/ ...
Examples:
>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = StanfordCoCa(root='/path/to/downloaded/data/folder/')
>>> print(len(ds.ids))
# 971
>>> print(ds.image(ds.ids[0]).shape)
# (512, 512, 57)
ids()
  
image(id: str)
  
series_uid(id: str)
  
study_uid(id: str)
  
pixel_spacing(id: str)
  
slice_locations(id: str)
  
orientation_matrix(id: str)
  
calcifications(id: str)
  Returns list of Calcifications
score(id: str)
  
        amid.totalsegmentator.dataset.Totalsegmentator
  In 1204 CT images we segmented 104 anatomical structures (27 organs, 59 bones, 10 muscles, 8 vessels) covering a majority of relevant classes for most use cases.
The CT images were randomly sampled from clinical routine, thus representing a real world dataset which generalizes to clinical application.
The dataset contains a wide range of different pathologies, scanners, sequences and institutions. [1]
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| root | str, Path | absolute path to the downloaded archive. If not provided, the cache is assumed to be already populated. | required | 
Notes
Download link: https://zenodo.org/record/6802614/files/Totalsegmentator_dataset.zip
Examples:
>>> # Download the archive to any folder and pass the path to the constructor:
>>> ds = Totalsegmentator(root='/path/to/the/downloaded/archive')
>>> print(len(ds.ids))
# 1204
>>> print(ds.image(ds.ids[0]).shape)
# (294, 192, 179)
>>> print(ds.aorta(ds.ids[25]).shape)
# (320, 320, 145)
References
.. [1] Jakob Wasserthal (2022) Dataset with segmentations of 104 important anatomical structures in 1204 CT images. Available at: https://zenodo.org/record/6802614#.Y6M2MxXP1D8
ids()
  
image(id: str)
  
affine(id: str)
  The 4x4 matrix that gives the image's spatial orientation
        amid.upenn_gbm.upenn_gbm.UPENN_GBM
  Multi-parametric magnetic resonance imaging (mpMRI) scans for de novo Glioblastoma (GBM) patients from the University of Pennsylvania Health System (UPENN-GBM). Dataset contains 630 patients.
All samples are registered to a common atlas (SRI) using a uniform preprocessing and the segmentation are aligned with them.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| root | str, Path | path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated. | required | 
Notes
Follow the download instructions at https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=70225642 Download to the root folder nifti images and metadata. Organise folder as folows:
<...>/
<...>/
Examples:
>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = UPENN_GBM(root='/path/to/downloaded/data/folder/')
>>> print(len(ds.ids))
# 671
>>> print(ds.image(ds.ids[215]).shape)
# (4, 240, 240, 155)
>>> print(d.acqusition_info(d.ids[215]).manufacturer)
# SIEMENS
References
.. [1] Bakas, S., Sako, C., Akbari, H., Bilello, M., Sotiras, A., Shukla, G., Rudie, J. D., Flores Santamaria, N., Fathi Kazerooni, A., Pati, S., Rathore, S., Mamourian, E., Ha, S. M., Parker, W., Doshi, J., Baid, U., Bergman, M., Binder, Z. A., Verma, R., … Davatzikos, C. (2021). Multi-parametric magnetic resonance imaging (mpMRI) scans for de novo Glioblastoma (GBM) patients from the University of Pennsylvania Health System (UPENN-GBM) (Version 2) [Data set]. The Cancer Imaging Archive. https://doi.org/10.7937/TCIA.709X-DN49
ids()
  
modalities(id: str)
  
dsc_modalities(id: str)
  
dti_modalities(id: str)
  
mask(id: str)
  
is_mask_automated(id: str)
  
image(id: str)
  
image_unstripped(id: str)
  
image_DTI(id: str)
  
image_DSC(id: str)
  
clinical_info(id: str) -> ClinicalInfo
  
acqusition_info(id: str) -> AcquisitionInfo
  
subject_id(id: str)
  
affine(id: str)
  
spacing(id: str)
  
        amid.vs_seg.dataset.VSSEG
  Segmentation of vestibular schwannoma from MRI, an open annotated dataset ... (VS-SEG) [1]_.
The dataset contains 250 pairs of T1c and T2 images of the brain with the vestibular schwannoma segmentation task.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| root | str, Path | path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated. | required | 
| version | str | the data version. Only has effect if the library was installed from a cloned git repository. | required | 
Notes
The dataset and corresponding metadata could be downloaded at the TCIA page: https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=70229053.
To download DICOM images using .tcia file, we used public build of TCIA downloader:
https://github.com/ygidtu/NBIA_data_retriever_CLI.
Then, download the rest of metadata from TCIA page:
  - DirectoryNamesMappingModality.csv
  - Vestibular-Schwannoma-SEG_matrices Mar 2021.zip
  - Vestibular-Schwannoma-SEG contours Mar 2021.zip
and unzip the latter two .zip archives.
So the root folder should contain 3 folders and 1 .csv file:
    <...>/DirectoryNamesMappingModality.csv
    <...>/Vestibular-Schwannoma-SEG/
            ├── VS-SEG-001/...
            ├── VS-SEG-002/...
            └── ...
    <...>/contours/
    <...>/registration_matrices/
Examples:
>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = VSSEG(root='/path/to/downloaded/data/folder/')
>>> print(len(ds.ids))
# 484
>>> print(ds.image(ds.ids[0]).shape)
# (512, 512, 120)
>>> print(ds.schwannoma(ds.ids[1]).shape)
# (384, 384, 80)
References
.. [1] Shapey, Jonathan, et al. "Segmentation of vestibular schwannoma from MRI, an open annotated dataset and baseline algorithm." Scientific Data 8.1 (2021): 1-6. https://www.nature.com/articles/s41597-021-01064-w
ids()
  
modality(id: str)
  
subject_id(id: str)
  
image(id: str)
  
spacing(id: str)
  The maximum relative difference in slice_locations < 1e-12,
so we allow ourselves to use the common spacing for the whole 3D image.
schwannoma(id: str)
  
cochlea(id: str)
  
meningioma(id: str)
  
study_uid(id: str)
  
series_uid(id: str)
  
patient_id(id: str)
  
study_date(id: str)
  
        amid.verse.VerSe
  A Vertebral Segmentation Dataset with Fracture Grading [1]_
The dataset was used in the MICCAI-2019 and MICCAI-2020 Vertebrae Segmentation Challenges.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| root | str, Path | path to the folder containing the raw downloaded archives. If not provided, the cache is assumed to be already populated. | required | 
| version | str | the data version. Only has effect if the library was installed from a cloned git repository. | required | 
Notes
Download links: 2019: https://osf.io/jtfa5/ 2020: https://osf.io/4skx2/
Examples:
>>> # Place the downloaded archives in any folder and pass the path to the constructor:
>>> ds = VerSe(root='/path/to/archives/root')
>>> print(len(ds.ids))
# 374
>>> print(ds.image(ds.ids[0]).shape)
# (512, 512, 214)
References
.. [1] Löffler MT, Sekuboyina A, Jacob A, et al. A Vertebral Segmentation Dataset with Fracture Grading. Radiol Artif Intell. 2020;2(4):e190138. Published 2020 Jul 29. doi:10.1148/ryai.2020190138
ids()
  
image(id: str)
  
affine(id: str)
  The 4x4 matrix that gives the image's spatial orientation
split(id: str)
  The split in which this entry is contained: training, validate, test
patient(id: str)
  The unique patient id
year(id: str)
  The year in which this entry was published: 2019, 2020
centers(id: str)
  Vertebrae centers in format {label: [x, y, z]}
masks(id: str) -> Union[np.ndarray, None]
  Vertebrae masks