ml

datasets

class sparcscore.ml.datasets.HDF5SingleCellDataset(*args: Any, **kwargs: Any)

Class for handling SPARCSpy single cell datasets stored in HDF5 files.

This class provides a convenient interface for SPARCSpy formated hdf5 files containing single cell datasets. It supports loading data from multiple hdf5 files within specified directories, applying transformations on the data, and returning the required information, such as label or id, along with the single cell data.

root_dir

Root directory where the hdf5 files are located.

Type

str

dir_labels

List of labels corresponding to the directories in dir_list.

Type

list of int

dir_list

List of path(s) where the hdf5 files are stored. Supports specifying a path to a specific hdf5 file or directory containing hdf5 files.

Type

list of str

transform

A optional user-defined function to apply transformations to the data. Default is None.

Type

callable, optional

max_level

Maximum levels of directory to search for hdf5 files. Default is 5.

Type

int, optional

return_id

Whether to return the index of the cell with the data. Default is False.

Type

bool, optional

return_fake_id

Whether to return a fake index (0) with the data. Default is False.

Type

bool, optional

select_channel

Specify a specific channel to select from the data. Default is None, which returns all channels.

Type

int, optional

add_hdf_to_index(current_label, path)

Adds single cell data from the hdf5 file located at ‘path’ with the specified ‘current_label’ to the index.

scan_directory(path, current_label, levels_left)

Scans directories for hdf5 files and adds their data to the index with the specified ‘current_label’.

stats()

Prints dataset statistics including total count and count per label.

len()

Returns the total number of single cells in the dataset.

getitem(idx)

Returns the data, label, and optional id/fake_id of the single cell specified by the index ‘idx’.

Examples

>>> hdf5_data = HDF5SingleCellDataset(dir_list=[‘data1.hdf5’, ‘data2.hdf5’],
dir_labels=[0, 1],
root_dir=‘/path/to/data’,
transform=None,
return_id=True)
>>> len(hdf5_data)
2000
>>> sample = hdf5_data[0]
>>> sample[0].shape
torch.Size([1, 128, 128])
>>> sample[1]
tensor(0)
>>> sample[2]
tensor(0)

metrics

sparcscore.ml.metrics.precision(predictions, labels, pos_label=0)

Calculate precision for predicting class pos_label.

Parameters
  • predictions (torch.Tensor) – Model predictions.

  • labels (torch.Tensor) – Ground truth labels.

  • pos_label (int, optional, default = 0) – The positive label for which to calculate precision.

Returns

precision – Precision for predicting class pos_label.

Return type

float

sparcscore.ml.metrics.recall(predictions, labels, pos_label=0)

Calculate recall for predicting class pos_label.

Parameters
  • predictions (torch.Tensor) – Model predictions.

  • labels (torch.Tensor) – Ground truth labels.

  • pos_label (int, optional, default = 0) – The positive label for which to calculate precision.

Returns

recall – Recall for predicting class pos_label.

Return type

float

models

class sparcscore.ml.models.VGGBase(*args: Any, **kwargs: Any)

Bases: torch.nn.Module

Base Implementation of VGG Model Architecture. Can be implemented with varying number of convolutional neural layers and fully connected layers.

make_layers(cfg, in_channels, batch_norm=True)

Create sequential models layers according to the chosen configuration provided in cfg with optional batch normalization for the CNN.

Parameters
  • cfg (list) – A list of integers and “M” representing the specific VGG architecture.

  • in_channels (int) – Number of input channels for the first convolutional layer.

  • batch_norm (bool, optional, default=True) – Whether to include batch normalization layers, by default True.

Returns

A sequential model representing the VGG architecture.

Return type

nn.Sequential

make_layers_MLP(cfg_MLP, cfg)

Create sequential models layers according to the chosen configuration provided in cfg for the MLP.

Parameters
  • cfg (list) – A list of integers and “M” representing the specific VGG architecture of the CNN

  • cfg_MLP (list) – A list of integers and “M” representing the specific VGG architecture of the MLP

Returns

A sequential model representing the VGG architecture.

Return type

nn.Sequential

class sparcscore.ml.models.VGG1(*args: Any, **kwargs: Any)

Bases: sparcscore.ml.models.VGGBase

Instance of VGGBase with the model architecture 1.

class sparcscore.ml.models.VGG2(*args: Any, **kwargs: Any)

Bases: sparcscore.ml.models.VGGBase

Instance of VGGBase with the model architecture 1.

plmodels

class sparcscore.ml.plmodels.MultilabelSupervisedModel(*args: Any, **kwargs: Any)

Bases: pytorch_lightning.LightningModule

A pytorch lightning network module to use a multi-label supervised Model.

Parameters
  • type (str, optional, default = "VGG2") – Network architecture to used in model. Architectures are defined in sparcspy.ml.models Valid options: “VGG1”, “VGG2”, “VGG1_old”, “VGG2_old”.

  • kwargs (dict) – Additional parameters passed to the model.

network

The selected network architecture.

Type

torch.nn.Module

train_metrics

MetricCollection for evaluating model on training data.

Type

torchmetrics.MetricCollection

val_metrics

MetricCollection for evaluating model on validation data.

Type

torchmetrics.MetricCollection

test_metrics

MetricCollection for evaluating model on test data.

Type

torchmetrics.MetricCollection

forward(x)

perform forward pass of model.

configure_optimizers()

Optimization function

on_train_epoch_end()

Callback function after each training epoch

on_validation_epoch_end()

Callback function after each validation epoch

confusion_plot(matrix)

Generate confusion matrix plot

training_step(batch, batch_idx)

Perform a single training step

validation_step(batch, batch_idx)

Perform a single validation step

test_step(batch, batch_idx)

Perform a single test step

test_epoch_end(outputs)

Callback function after testing epochs

pretrained_models

Collection of functions to load pretrained models to use in the SPARCSpy environment.

sparcscore.ml.pretrained_models.autophagy_classifier1_0(device='cuda')

Load binary autophagy classification model published as Model 1.0 in original SPARCSpy publication.

sparcscore.ml.pretrained_models.autophagy_classifier2_0(device='cuda')

Load binary autophagy classification model published as Model 2.0 in original SPARCSpy publication.

sparcscore.ml.pretrained_models.autophagy_classifier2_1(device='cuda')

Load binary autophagy classification model published as Model 2.1 in original SPARCSpy publication.

transforms

class sparcscore.ml.transforms.RandomRotation(choices=4, include_zero=True)

Randomly rotate input image in 90 degree steps.

class sparcscore.ml.transforms.GaussianNoise(sigma=0.1, channels_to_exclude=[])

Add gaussian noise to the input image.

class sparcscore.ml.transforms.GaussianBlur(kernel_size=[1, 1, 1, 1, 5, 5, 7, 9], sigma=(0.1, 2), channels=[])

Apply a gaussian blur to the input image.

class sparcscore.ml.transforms.ChannelReducer(channels=5)

can reduce an imaging dataset dataset to 5, 3 or 1 channel 5: nuclei_mask, cell_mask, channel_nucleus, channel_cellmask, channel_of_interest 3: nuclei_mask, cell_mask, channel_of_interestå 1: channel_of_interestå

class sparcscore.ml.transforms.ChannelSelector(channels=[0, 1, 2, 3, 4], num_channels=5)

select the channel used for prediction.

utils

sparcscore.ml.utils.combine_datasets_balanced(list_of_datasets, class_labels, train_per_class, val_per_class, test_per_class, seed=None)

Combine multiple datasets to create a single balanced dataset with a specified number of samples per class for train, validation, and test set. A balanced dataset means that from each label source an equal number of data instances are used.

Parameters
  • list_of_datasets (list[torch.utils.data.Dataset]) – List of datasets to be combined.

  • class_labels (list[str|int]) – List of class labels present in the datasets.

  • train_per_class (int) – Number of samples per class in the train set.

  • val_per_class (int) – Number of samples per class in the validation set.

  • test_per_class (int) – Number of samples per class in the test set.

  • seed (None | int) – Seed for the random number generator. Defaults to None.

Returns

Combined train dataset with balanced samples per class. torch.utils.data.Dataset: Combined validation dataset with balanced samples per class. torch.utils.data.Dataset: Combined test dataset with balanced samples per class.

Return type

torch.utils.data.Dataset

Raises

ValueError – If a dataset’s length is too small to be split according to the provided sizes.