ml
datasets
- class sparcscore.ml.datasets.HDF5SingleCellDataset(*args: Any, **kwargs: Any)
Class for handling SPARCSpy single cell datasets stored in HDF5 files.
This class provides a convenient interface for SPARCSpy formated hdf5 files containing single cell datasets. It supports loading data from multiple hdf5 files within specified directories, applying transformations on the data, and returning the required information, such as label or id, along with the single cell data.
- root_dir
Root directory where the hdf5 files are located.
- Type
str
- dir_labels
List of labels corresponding to the directories in dir_list.
- Type
list of int
- dir_list
List of path(s) where the hdf5 files are stored. Supports specifying a path to a specific hdf5 file or directory containing hdf5 files.
- Type
list of str
- transform
A optional user-defined function to apply transformations to the data. Default is None.
- Type
callable, optional
- max_level
Maximum levels of directory to search for hdf5 files. Default is 5.
- Type
int, optional
- return_id
Whether to return the index of the cell with the data. Default is False.
- Type
bool, optional
- return_fake_id
Whether to return a fake index (0) with the data. Default is False.
- Type
bool, optional
- select_channel
Specify a specific channel to select from the data. Default is None, which returns all channels.
- Type
int, optional
- add_hdf_to_index(current_label, path)
Adds single cell data from the hdf5 file located at ‘path’ with the specified ‘current_label’ to the index.
- scan_directory(path, current_label, levels_left)
Scans directories for hdf5 files and adds their data to the index with the specified ‘current_label’.
- stats()
Prints dataset statistics including total count and count per label.
- len()
Returns the total number of single cells in the dataset.
- getitem(idx)
Returns the data, label, and optional id/fake_id of the single cell specified by the index ‘idx’.
Examples
>>> hdf5_data = HDF5SingleCellDataset(dir_list=[‘data1.hdf5’, ‘data2.hdf5’], dir_labels=[0, 1], root_dir=‘/path/to/data’, transform=None, return_id=True) >>> len(hdf5_data) 2000 >>> sample = hdf5_data[0] >>> sample[0].shape torch.Size([1, 128, 128]) >>> sample[1] tensor(0) >>> sample[2] tensor(0)
metrics
- sparcscore.ml.metrics.precision(predictions, labels, pos_label=0)
Calculate precision for predicting class pos_label.
- Parameters
predictions (torch.Tensor) – Model predictions.
labels (torch.Tensor) – Ground truth labels.
pos_label (int, optional, default = 0) – The positive label for which to calculate precision.
- Returns
precision – Precision for predicting class pos_label.
- Return type
float
- sparcscore.ml.metrics.recall(predictions, labels, pos_label=0)
Calculate recall for predicting class pos_label.
- Parameters
predictions (torch.Tensor) – Model predictions.
labels (torch.Tensor) – Ground truth labels.
pos_label (int, optional, default = 0) – The positive label for which to calculate precision.
- Returns
recall – Recall for predicting class pos_label.
- Return type
float
models
- class sparcscore.ml.models.VGGBase(*args: Any, **kwargs: Any)
Bases:
torch.nn.Module
Base Implementation of VGG Model Architecture. Can be implemented with varying number of convolutional neural layers and fully connected layers.
- make_layers(cfg, in_channels, batch_norm=True)
Create sequential models layers according to the chosen configuration provided in cfg with optional batch normalization for the CNN.
- Parameters
cfg (list) – A list of integers and “M” representing the specific VGG architecture.
in_channels (int) – Number of input channels for the first convolutional layer.
batch_norm (bool, optional, default=True) – Whether to include batch normalization layers, by default True.
- Returns
A sequential model representing the VGG architecture.
- Return type
nn.Sequential
- make_layers_MLP(cfg_MLP, cfg)
Create sequential models layers according to the chosen configuration provided in cfg for the MLP.
- Parameters
cfg (list) – A list of integers and “M” representing the specific VGG architecture of the CNN
cfg_MLP (list) – A list of integers and “M” representing the specific VGG architecture of the MLP
- Returns
A sequential model representing the VGG architecture.
- Return type
nn.Sequential
- class sparcscore.ml.models.VGG1(*args: Any, **kwargs: Any)
Bases:
sparcscore.ml.models.VGGBase
Instance of VGGBase with the model architecture 1.
- class sparcscore.ml.models.VGG2(*args: Any, **kwargs: Any)
Bases:
sparcscore.ml.models.VGGBase
Instance of VGGBase with the model architecture 1.
plmodels
- class sparcscore.ml.plmodels.MultilabelSupervisedModel(*args: Any, **kwargs: Any)
Bases:
pytorch_lightning.LightningModule
A pytorch lightning network module to use a multi-label supervised Model.
- Parameters
type (str, optional, default = "VGG2") – Network architecture to used in model. Architectures are defined in sparcspy.ml.models Valid options: “VGG1”, “VGG2”, “VGG1_old”, “VGG2_old”.
kwargs (dict) – Additional parameters passed to the model.
- network
The selected network architecture.
- Type
torch.nn.Module
- train_metrics
MetricCollection for evaluating model on training data.
- Type
torchmetrics.MetricCollection
- val_metrics
MetricCollection for evaluating model on validation data.
- Type
torchmetrics.MetricCollection
- test_metrics
MetricCollection for evaluating model on test data.
- Type
torchmetrics.MetricCollection
- forward(x)
perform forward pass of model.
- configure_optimizers()
Optimization function
- on_train_epoch_end()
Callback function after each training epoch
- on_validation_epoch_end()
Callback function after each validation epoch
- confusion_plot(matrix)
Generate confusion matrix plot
- training_step(batch, batch_idx)
Perform a single training step
- validation_step(batch, batch_idx)
Perform a single validation step
- test_step(batch, batch_idx)
Perform a single test step
- test_epoch_end(outputs)
Callback function after testing epochs
pretrained_models
Collection of functions to load pretrained models to use in the SPARCSpy environment.
- sparcscore.ml.pretrained_models.autophagy_classifier1_0(device='cuda')
Load binary autophagy classification model published as Model 1.0 in original SPARCSpy publication.
- sparcscore.ml.pretrained_models.autophagy_classifier2_0(device='cuda')
Load binary autophagy classification model published as Model 2.0 in original SPARCSpy publication.
- sparcscore.ml.pretrained_models.autophagy_classifier2_1(device='cuda')
Load binary autophagy classification model published as Model 2.1 in original SPARCSpy publication.
transforms
- class sparcscore.ml.transforms.RandomRotation(choices=4, include_zero=True)
Randomly rotate input image in 90 degree steps.
- class sparcscore.ml.transforms.GaussianNoise(sigma=0.1, channels_to_exclude=[])
Add gaussian noise to the input image.
- class sparcscore.ml.transforms.GaussianBlur(kernel_size=[1, 1, 1, 1, 5, 5, 7, 9], sigma=(0.1, 2), channels=[])
Apply a gaussian blur to the input image.
- class sparcscore.ml.transforms.ChannelReducer(channels=5)
can reduce an imaging dataset dataset to 5, 3 or 1 channel 5: nuclei_mask, cell_mask, channel_nucleus, channel_cellmask, channel_of_interest 3: nuclei_mask, cell_mask, channel_of_interestå 1: channel_of_interestå
- class sparcscore.ml.transforms.ChannelSelector(channels=[0, 1, 2, 3, 4], num_channels=5)
select the channel used for prediction.
utils
- sparcscore.ml.utils.combine_datasets_balanced(list_of_datasets, class_labels, train_per_class, val_per_class, test_per_class, seed=None)
Combine multiple datasets to create a single balanced dataset with a specified number of samples per class for train, validation, and test set. A balanced dataset means that from each label source an equal number of data instances are used.
- Parameters
list_of_datasets (list[torch.utils.data.Dataset]) – List of datasets to be combined.
class_labels (list[str|int]) – List of class labels present in the datasets.
train_per_class (int) – Number of samples per class in the train set.
val_per_class (int) – Number of samples per class in the validation set.
test_per_class (int) – Number of samples per class in the test set.
seed (None | int) – Seed for the random number generator. Defaults to None.
- Returns
Combined train dataset with balanced samples per class. torch.utils.data.Dataset: Combined validation dataset with balanced samples per class. torch.utils.data.Dataset: Combined test dataset with balanced samples per class.
- Return type
torch.utils.data.Dataset
- Raises
ValueError – If a dataset’s length is too small to be split according to the provided sizes.