Featurization#
MLClusterClassifier#
- class scportrait.pipeline.featurization.MLClusterClassifier(*args, **kwargs)#
Class for classifying single cells using a pre-trained machine learning model.
This class takes a pre-trained model and uses it to classify single cells, using the model’s forward function or encoder function, depending on the user’s choice. The classification results are saved to a CSV file.
- __call__(*args, debug=None, overwrite=None, **kwargs)#
Call the processing step.
- Parameters:
debug (bool, optional, default
None
) – Allows overriding the value set on initiation. When set to True debug outputs will be printed where applicable.overwrite (bool, optional, default
None
) – Allows overriding the value set on initiation. When set to True, the processing step directory will be completely deleted and newly created when called.
- DEFAULT_MODEL_CLASS#
alias of
MultilabelSupervisedModel
- DEFAULT_DATA_LOADER#
alias of
HDF5SingleCellDataset
- process(extraction_dir: str, size: int = 0)#
Perform classification on the provided HDF5 dataset.
- Parameters:
extraction_dir (str) – Directory containing the extracted HDF5 files from the project. If this class is used as part of a project processing workflow, this argument will be provided automatically.
size (int, optional) – How many cells should be selected for inference. Default is 0, which means all cells are selected.
- Returns:
Results are written to CSV files located in the project directory.
- Return type:
None
Important
If this class is used as part of a project processing workflow, the first argument will be provided by the
Project
class based on the previous single-cell extraction. Therefore, only the second and third arguments need to be provided. The Project class will automatically provide the most recent extracted single-cell dataset together with the supplied parameters.Examples
project.classify()
Notes
The following parameters are required in the config file:
MLClusterClassifier: # Channel number on which the classification should be performed channel_selection: 4 # Number of threads to use for dataloader dataloader_worker_number: 24 # Batch size to pass to GPU batch_size: 900 # Path to PyTorch checkpoint that should be used for inference network: "path/to/model/" # Classifier architecture implemented in scPortrait # Choose one of VGG1, VGG2, VGG1_old, VGG2_old classifier_architecture: "VGG2_old" # If more than one checkpoint is provided in the network directory, which checkpoint should be chosen # Should either be "max" or a numeric value indicating the epoch number epoch: "max" # Name of the classifier used for saving the classification results to a directory label: "Autophagy_15h_classifier1" # List of which inference methods should be performed # Available: "forward" and "encoder" # If "forward": images are passed through all layers of the model and the final inference results are written to file # If "encoder": activations at the end of the CNN are written to file encoders: ["forward", "encoder"] # On which device inference should be performed # For speed, should be "cuda" inference_device: "cuda" #define dataset transforms transforms: resize: 128
CellFeaturizer#
- class scportrait.pipeline.featurization.CellFeaturizer(*args, **kwargs)#
Class for extracting general image features from SPARCS single-cell image datasets. The extracted features are saved to a CSV file. The features are calculated on the basis of a specified channel.
The features which are calculated are:
Area of the masks in pixels
Mean intensity of the chosen channel in the regions labelled by each of the masks
Median intensity of the chosen channel in the regions labelled by each of the masks
75% quantile of the chosen channel in the regions labelled by each of the masks
25% quantile of the chosen channel in the regions labelled by each of the masks
Summed intensity of the chosen channel in the regions labelled by each of the masks
Summed intensity of the chosen channel in the region labelled by each of the masks normalized for area
The features are outputed in this order in the CSV file.
- __call__(*args, debug=None, overwrite=None, **kwargs)#
Call the processing step.
- Parameters:
debug (bool, optional, default
None
) – Allows overriding the value set on initiation. When set to True debug outputs will be printed where applicable.overwrite (bool, optional, default
None
) – Allows overriding the value set on initiation. When set to True, the processing step directory will be completely deleted and newly created when called.
- process(extraction_dir, size=0)#
Perform featurization on the provided HDF5 dataset.
- Parameters:
extraction_dir (str) – Directory containing the extracted HDF5 files from the project. If this class is used as part of a project processing workflow this argument will be provided automatically.
size (int, optional, default=0) – How many cells should be selected for inference. Default is 0, meaning all cells are selected.
- Returns:
Results are written to CSV files located in the project directory.
- Return type:
None
Important
If this class is used as part of a project processing workflow, the first argument will be provided by the
Project
class based on the previous single-cell extraction. Therefore, only the second and third argument need to be provided. The Project class will automatically provide the most recent extraction results together with the supplied parameters.Examples
# Define accessory dataset: additional HDF5 datasets that you want to perform an inference on # Leave empty if you only want to infer on all extracted cells in the current project project.classify()
Notes
The following parameters are required in the config file:
CellFeaturizer: # Channel number on which the featurization should be performed channel_selection: 4 # Number of threads to use for dataloader dataloader_worker_number: 0 # needs to be 0 if using CPU # Batch size to pass to GPU batch_size: 900 # On which device inference should be performed # For speed should be "cuda" inference_device: "cpu" # Label under which the results should be saved screen_label: "Ch3_Featurization"