pipeline#
project#
Within scPortrait, all operations are centered around the concept of a Project
. A Project
is a python class which manages all of the scPortrait processing steps and is the central element through which all operations are performed. Each Project
directly maps to a directory on the file system which contains all of the inputs to a specific scPortrait run as well as the generated outputs. Depending on the structure of the data that is to be processed a different Project class is required. Please see here for more information.
Project#
- class scportrait.pipeline.project.Project(project_location, config_path, segmentation_f=None, extraction_f=None, classification_f=None, selection_f=None, overwrite=False, debug=False)#
Bases:
Logable
- DEFAULT_IMAGE_DTYPE#
alias of
uint16
- DEFAULT_SEGMENTATION_DTYPE#
alias of
uint32
- DEFAULT_SINGLE_CELL_IMAGE_DTYPE#
alias of
float16
- update_classification_f(classification_f) None #
Update the classification method chosen for the project without reinitializing the entire project.
- Parameters:
classification_f (class) – The classification method that should be used for the project.
- load_input_from_tif_files(file_paths, channel_names=None, crop=None, overwrite=None, remap=None, cache=None)#
Load input image from a list of files. The channels need to be specified in the following order: nucleus, cytosol other channels.
- Parameters:
file_paths (List[str]) – List containing paths to each channel like [“path1/img.tiff”, “path2/img.tiff”, “path3/img.tiff”]. Expects a list of file paths with length “input_channel” as defined in the config.yml.
crop (List[Tuple], optional) – When set, it can be used to crop the input image. The first element refers to the first dimension of the image and so on. For example use “[(0,1000),(0,2000)]” to crop the image to 1000 px height and 2000 px width from the top left corner.
- load_input_from_sdata(sdata_path, input_image_name='input_image', nucleus_segmentation_name=None, cytosol_segmentation_name=None, overwrite=None)#
Load input image from a spatialdata object.
- select(cell_sets: list[dict], calibration_marker: ndarray | None = None, segmentation_name: str = 'seg_all_nucleus', name: str | None = None)#
Select specified classes using the defined selection method.
segmentation#
Segmentation#
- class scportrait.pipeline.segmentation.Segmentation(config, directory, nuc_seg_name, cyto_seg_name, _tmp_image_path, project_location, debug, overwrite, project, filehandler, **kwargs)#
Bases:
ProcessingStep
Segmentation helper class used for creating segmentation workflows.
- maps#
Segmentation workflows based on the
Segmentation
class can use maps for saving and loading checkpoints and perform. Maps can be numpy arrays- Type:
dict(str)
- DEFAULT_FILTER_ADDTIONAL_FILE#
- Type:
str, default
filtered_classes.csv
- PRINT_MAPS_ON_DEBUG#
- Type:
bool, default
False
- identifier#
Only set if called by
ShardedSegmentation
. Unique index of the shard.- Type:
int, default
None
- window#
Only set if called by
ShardedSegmentation
. Defines the window which is assigned to the shard. The window will be applied to the input. The first element refers to the first dimension of the image and so on. For example use[(0,1000),(0,2000)]
To crop the image to 1000 px height and 2000 px width from the top left corner.- Type:
list(tuple), default
None
- input_path#
Only set if called by
ShardedSegmentation
. Location of the input hdf5 file. During sharded segmentation theShardedSegmentation
derived helper class will save the input image in form of a hdf5 file. This makes the input image available for parallel reading by the segmentation processes.- Type:
str, default
None
Example
def process(self): # two maps are initialized self.maps = {"map0": None, "map1": None} # its checked if the segmentation directory already contains these maps and they are then loaded. The index of the first map which has not been found is returned. It indicates the step where computation needs to resume current_step = self.load_maps_from_disk() if current_step <= 0: # do stuff and generate map0 self.save_map("map0") if current_step <= 1: # do stuff and generate map1 self.save_map("map1")
- save_map(map_name)#
Saves newly computed map.
- Args
map_name (str): name of the map to be saved, as defined in
self.maps
.
Example
# declare all intermediate maps self.maps = {"myMap": None} # load intermediate maps if possible and get current processing step current_step = self.load_maps_from_disk() if current_step <= 0: # do some computations self.maps["myMap"] = myNumpyArray # save map self.save_map("myMap")
- process(input_image)#
Process the input image with the segmentation method.
segmentation workflows#
- class scportrait.pipeline.segmentation.workflows.WGASegmentation(*args, **kwargs)#
- process(input_image)#
Process the input image with the segmentation method.
- class scportrait.pipeline.segmentation.workflows.ShardedWGASegmentation(*args, **kwargs)#
- method#
alias of
WGASegmentation
- class scportrait.pipeline.segmentation.workflows.DAPISegmentation(*args, **kwargs)#
- process(input_image)#
Process the input image with the segmentation method.
- class scportrait.pipeline.segmentation.workflows.ShardedDAPISegmentation(*args, **kwargs)#
- method#
alias of
DAPISegmentation
- class scportrait.pipeline.segmentation.workflows.DAPISegmentationCellpose(*args, **kwargs)#
- process(input_image)#
Process the input image with the segmentation method.
- class scportrait.pipeline.segmentation.workflows.ShardedDAPISegmentationCellpose(*args, **kwargs)#
- method#
alias of
DAPISegmentationCellpose
- class scportrait.pipeline.segmentation.workflows.CytosolSegmentationCellpose(*args, **kwargs)#
- process(input_image)#
Process the input image with the segmentation method.
- class scportrait.pipeline.segmentation.workflows.ShardedCytosolSegmentationCellpose(*args, **kwargs)#
- method#
alias of
CytosolSegmentationCellpose
- class scportrait.pipeline.segmentation.workflows.CytosolSegmentationDownsamplingCellpose(*args, **kwargs)#
- process(input_image)#
Process the input image with the segmentation method.
- class scportrait.pipeline.segmentation.workflows.ShardedCytosolSegmentationDownsamplingCellpose(*args, **kwargs)#
- method#
- class scportrait.pipeline.segmentation.workflows.CytosolOnlySegmentationCellpose(*args, **kwargs)#
- class scportrait.pipeline.segmentation.workflows.Sharded_CytosolOnly_Cellpose_Segmentation(*args, **kwargs)#
- method#
alias of
CytosolOnlySegmentationCellpose
- class scportrait.pipeline.segmentation.workflows.CytosolOnly_Segmentation_Downsampling_Cellpose(*args, **kwargs)#
- process(input_image) None #
Process the input image with the segmentation method.
extraction#
HDF5CellExtraction#
- class scportrait.pipeline.extraction.HDF5CellExtraction(*args, **kwargs)#
Bases:
ProcessingStep
A class to extracts single cell images from a segmented scPortrait project and save the results to an HDF5 file.
- process(partial=False, n_cells=None, seed=42)#
Extracts single cell images from a segmented scPortrait project and saves the results to an HDF5 file.
- Parameters:
input_segmentation_path (str) – Path of the segmentation HDF5 file. If this class is used as part of a project processing workflow, this argument will be provided automatically.
filtered_classes_path (str, optional) – Path to the filtered classes that should be used for extraction. Default is None. If not provided, will use the automatically generated paths.
Important
If this class is used as part of a project processing workflow, all of the arguments will be provided by the
Project
class based on the previous segmentation. The Project class will automatically provide the most recent segmentation forward together with the supplied parameters.Examples
# After project is initialized and input data has been loaded and segmented project.extract()
Notes
The following parameters are required in the config file when running this method:
HDF5CellExtraction: compression: True # threads used in multithreading threads: 80 # image size in pixels image_size: 128 # directory where intermediate results should be saved cache: "/mnt/temp/cache" # specs to define how HDF5 data should be chunked and saved hdf5_rdcc_nbytes: 5242880000 # 5GB 1024 * 1024 * 5000 hdf5_rdcc_w0: 1 hdf5_rdcc_nslots: 50000
classification#
MLClusterClassifier#
- class scportrait.pipeline.classification.MLClusterClassifier(*args, **kwargs)#
Class for classifying single cells using a pre-trained machine learning model.
This class takes a pre-trained model and uses it to classify single cells, using the model’s forward function or encoder function, depending on the user’s choice. The classification results are saved to a CSV file.
- __call__(*args, debug=None, overwrite=None, **kwargs)#
Call the processing step.
- Parameters:
debug (bool, optional, default
None
) – Allows overriding the value set on initiation. When set to True debug outputs will be printed where applicable.overwrite (bool, optional, default
None
) – Allows overriding the value set on initiation. When set to True, the processing step directory will be completely deleted and newly created when called.
- DEFAULT_MODEL_CLASS#
alias of
MultilabelSupervisedModel
- DEFAULT_DATA_LOADER#
alias of
HDF5SingleCellDataset
- process(extraction_dir: str, size: int = 0)#
Perform classification on the provided HDF5 dataset.
- Parameters:
extraction_dir (str) – Directory containing the extracted HDF5 files from the project. If this class is used as part of a project processing workflow, this argument will be provided automatically.
size (int, optional) – How many cells should be selected for inference. Default is 0, which means all cells are selected.
- Returns:
Results are written to CSV files located in the project directory.
- Return type:
None
Important
If this class is used as part of a project processing workflow, the first argument will be provided by the
Project
class based on the previous single-cell extraction. Therefore, only the second and third arguments need to be provided. The Project class will automatically provide the most recent extracted single-cell dataset together with the supplied parameters.Examples
project.classify()
Notes
The following parameters are required in the config file:
MLClusterClassifier: # Channel number on which the classification should be performed channel_classification: 4 # Number of threads to use for dataloader dataloader_worker_number: 24 # Batch size to pass to GPU batch_size: 900 # Path to PyTorch checkpoint that should be used for inference network: "path/to/model/" # Classifier architecture implemented in scPortrait # Choose one of VGG1, VGG2, VGG1_old, VGG2_old classifier_architecture: "VGG2_old" # If more than one checkpoint is provided in the network directory, which checkpoint should be chosen # Should either be "max" or a numeric value indicating the epoch number epoch: "max" # Name of the classifier used for saving the classification results to a directory label: "Autophagy_15h_classifier1" # List of which inference methods should be performed # Available: "forward" and "encoder" # If "forward": images are passed through all layers of the model and the final inference results are written to file # If "encoder": activations at the end of the CNN are written to file encoders: ["forward", "encoder"] # On which device inference should be performed # For speed, should be "cuda" inference_device: "cuda" #define dataset transforms transforms: resize: 128
CellFeaturizer#
- class scportrait.pipeline.classification.CellFeaturizer(*args, **kwargs)#
Class for extracting general image features from SPARCS single-cell image datasets. The extracted features are saved to a CSV file. The features are calculated on the basis of a specified channel.
The features which are calculated are:
Area of the masks in pixels
Mean intensity of the chosen channel in the regions labelled by each of the masks
Median intensity of the chosen channel in the regions labelled by each of the masks
75% quantile of the chosen channel in the regions labelled by each of the masks
25% quantile of the chosen channel in the regions labelled by each of the masks
Summed intensity of the chosen channel in the regions labelled by each of the masks
Summed intensity of the chosen channel in the region labelled by each of the masks normalized for area
The features are outputed in this order in the CSV file.
- __call__(*args, debug=None, overwrite=None, **kwargs)#
Call the processing step.
- Parameters:
debug (bool, optional, default
None
) – Allows overriding the value set on initiation. When set to True debug outputs will be printed where applicable.overwrite (bool, optional, default
None
) – Allows overriding the value set on initiation. When set to True, the processing step directory will be completely deleted and newly created when called.
- process(extraction_dir, size=0)#
Perform featurization on the provided HDF5 dataset.
- Parameters:
extraction_dir (str) – Directory containing the extracted HDF5 files from the project. If this class is used as part of a project processing workflow this argument will be provided automatically.
size (int, optional, default=0) – How many cells should be selected for inference. Default is 0, meaning all cells are selected.
- Returns:
Results are written to CSV files located in the project directory.
- Return type:
None
Important
If this class is used as part of a project processing workflow, the first argument will be provided by the
Project
class based on the previous single-cell extraction. Therefore, only the second and third argument need to be provided. The Project class will automatically provide the most recent extraction results together with the supplied parameters.Examples
# Define accessory dataset: additional HDF5 datasets that you want to perform an inference on # Leave empty if you only want to infer on all extracted cells in the current project project.classify()
Notes
The following parameters are required in the config file:
CellFeaturizer: # Channel number on which the featurization should be performed channel_classification: 4 # Number of threads to use for dataloader dataloader_worker_number: 0 # needs to be 0 if using CPU # Batch size to pass to GPU batch_size: 900 # On which device inference should be performed # For speed should be "cuda" inference_device: "cpu" # Label under which the results should be saved screen_label: "Ch3_Featurization"
selection#
LMDSelection#
- class scportrait.pipeline.selection.LMDSelection(*args, **kwargs)#
Bases:
ProcessingStep
Select single cells from a segmented sdata file and generate cutting data for the Leica LMD microscope. This method class relies on the functionality of the pylmd library.
- process(segmentation_name: str, cell_sets: list[dict], calibration_marker: array, name: str | None = None)#
Process function for selecting cells and generating their XML. Under the hood this method relies on the pylmd library and utilizies its SegmentationLoader Class.
- Parameters:
segmentation_name (str) – Name of the segmentation to be used for shape generation in the sdata object.
cell_sets (list of dict) – List of dictionaries containing the sets of cells which should be sorted into a single well. Mandatory keys for each dictionary are: name, classes. Optional keys are: well.
calibration_marker (numpy.array) – Array of size ‘(3,2)’ containing the calibration marker coordinates in the ‘(row, column)’ format.
Example
# Calibration marker should be defined as (row, column). marker_0 = np.array([-10, -10]) marker_1 = np.array([-10, 1100]) marker_2 = np.array([1100, 505]) # A numpy Array of shape (3, 2) should be passed. calibration_marker = np.array([marker_0, marker_1, marker_2]) # Sets of cells can be defined by providing a name and a list of classes in a dictionary. cells_to_select = [{"name": "dataset1", "classes": [1, 2, 3]}] # Alternatively, a path to a csv file can be provided. # If a relative path is provided, it is accessed relativ to the projects base directory. cells_to_select += [{"name": "dataset2", "classes": "segmentation/class_subset.csv"}] # If desired, wells can be passed with the individual sets. cells_to_select += [{"name": "dataset3", "classes": [4, 5, 6], "well": "A1"}] project.select(cells_to_select, calibration_marker)
Note
The following parameters are required in the config file:
LMDSelection: #the number of threads with which multithreaded tasks should be executed threads: 10 # the number of parallel processes to use for generation of cell sets each set # will processed with the designated number of threads processes_cell_sets: 1 # defines the channel used for generating cutting masks # segmentation.hdf5 => labels => segmentation_channel # When using WGA segmentation: # 0 corresponds to nuclear masks # 1 corresponds to cytosolic masks. segmentation_channel: 0 # dilation of the cutting mask in pixel shape_dilation: 10 # Cutting masks are transformed by binary dilation and erosion binary_smoothing: 3 # number of datapoints which are averaged for smoothing # the number of datapoints over an distance of n pixel is 2*n convolution_smoothing: 25 # fold reduction of datapoints for compression poly_compression_factor: 30 # Optimization of the cutting path inbetween shapes # optimized paths improve the cutting time and the microscopes focus # valid options are ["none", "hilbert", "greedy"] path_optimization: "hilbert" # Paramter required for hilbert curve based path optimization. # Defines the order of the hilbert curve used, which needs to be tuned with the total cutting area. # For areas of 1 x 1 mm we recommend at least p = 4, for whole slides we recommend p = 7. hilbert_p: 7 # Parameter required for greedy path optimization. # Instead of a global distance matrix, the k nearest neighbours are approximated. # The optimization problem is then greedily solved for the known set of nearest neighbours until the first set of neighbours is exhausted. # Established edges are then removed and the nearest neighbour approximation is recursivly repeated. greedy_k: 20 # The LMD reads coordinates as integers which leads to rounding of decimal places. # Points spread between two whole coordinates are therefore collapsed to whole coordinates. # This can be mitigated by scaling the entire coordinate system by a defined factor. # For a resolution of 0.6 um / px a factor of 100 is recommended. xml_decimal_transform: 100 # Overlapping shapes are merged based on a nearest neighbour heuristic. # All selected shapes closer than distance_heuristic pixel are checked for overlap. distance_heuristic: 300