Featurization

Featurization#

MLClusterClassifier#

class scportrait.pipeline.featurization.MLClusterClassifier(*args, **kwargs)#

Class for classifying single cells using a pre-trained machine learning model.

This class takes a pre-trained model and uses it to classify single cells, using the model’s forward function or encoder function, depending on the user’s choice. The classification results are saved to a CSV file.

__call__(*args, debug=None, overwrite=None, **kwargs)#

Call the processing step.

Parameters:

debug (bool, optional, default None) – Allows overriding the value set on initiation. When set to True debug outputs will be printed where applicable.
overwrite (bool, optional, default None) – Allows overriding the value set on initiation. When set to True, the processing step directory will be completely deleted and newly created when called.

DEFAULT_MODEL_CLASS#: alias of MultilabelSupervisedModel

DEFAULT_DATA_LOADER#: alias of HDF5SingleCellDataset

process(extraction_dir: str, size: int = 0)#

Perform classification on the provided HDF5 dataset.

Parameters:

extraction_dir (str) – Directory containing the extracted HDF5 files from the project. If this class is used as part of a project processing workflow, this argument will be provided automatically.
size (int, optional) – How many cells should be selected for inference. Default is 0, which means all cells are selected.

Returns:

Results are written to CSV files located in the project directory.

Return type:

None

Important

If this class is used as part of a project processing workflow, the first argument will be provided by the Project class based on the previous single-cell extraction. Therefore, only the second and third arguments need to be provided. The Project class will automatically provide the most recent extracted single-cell dataset together with the supplied parameters.

Examples

project.classify()

Notes

The following parameters are required in the config file:

MLClusterClassifier:
    # Channel number on which the classification should be performed
    channel_selection: 4

    # Number of threads to use for dataloader
    dataloader_worker_number: 24

    # Batch size to pass to GPU
    batch_size: 900

    # Path to PyTorch checkpoint that should be used for inference
    network: "path/to/model/"

    # Classifier architecture implemented in scPortrait
    # Choose one of VGG1, VGG2, VGG1_old, VGG2_old
    classifier_architecture: "VGG2_old"

    # If more than one checkpoint is provided in the network directory, which checkpoint should be chosen
    # Should either be "max" or a numeric value indicating the epoch number
    epoch: "max"

    # Name of the classifier used for saving the classification results to a directory
    label: "Autophagy_15h_classifier1"

    # List of which inference methods should be performed
    # Available: "forward" and "encoder"
    # If "forward": images are passed through all layers of the model and the final inference results are written to file
    # If "encoder": activations at the end of the CNN are written to file
    encoders: ["forward", "encoder"]

    # On which device inference should be performed
    # For speed, should be "cuda"
    inference_device: "cuda"

    #define dataset transforms
    transforms:
        resize: 128

CellFeaturizer#

class scportrait.pipeline.featurization.CellFeaturizer(*args, **kwargs)#

Class for extracting general image features from SPARCS single-cell image datasets. The extracted features are saved to a CSV file. The features are calculated on the basis of a specified channel.

The features which are calculated are:

Area of the masks in pixels
Mean intensity of the chosen channel in the regions labelled by each of the masks
Median intensity of the chosen channel in the regions labelled by each of the masks
75% quantile of the chosen channel in the regions labelled by each of the masks
25% quantile of the chosen channel in the regions labelled by each of the masks
Summed intensity of the chosen channel in the regions labelled by each of the masks
Summed intensity of the chosen channel in the region labelled by each of the masks normalized for area

The features are outputed in this order in the CSV file.

__call__(*args, debug=None, overwrite=None, **kwargs)#

Call the processing step.

Parameters:

debug (bool, optional, default None) – Allows overriding the value set on initiation. When set to True debug outputs will be printed where applicable.
overwrite (bool, optional, default None) – Allows overriding the value set on initiation. When set to True, the processing step directory will be completely deleted and newly created when called.

process(extraction_dir, size=0)#

Perform featurization on the provided HDF5 dataset.

Parameters:

extraction_dir (str) – Directory containing the extracted HDF5 files from the project. If this class is used as part of a project processing workflow this argument will be provided automatically.
size (int, optional, default=0) – How many cells should be selected for inference. Default is 0, meaning all cells are selected.

Returns:

Results are written to CSV files located in the project directory.

Return type:

None

Important

If this class is used as part of a project processing workflow, the first argument will be provided by the Project class based on the previous single-cell extraction. Therefore, only the second and third argument need to be provided. The Project class will automatically provide the most recent extraction results together with the supplied parameters.

Examples

# Define accessory dataset: additional HDF5 datasets that you want to perform an inference on
# Leave empty if you only want to infer on all extracted cells in the current project

project.classify()

Notes

The following parameters are required in the config file:

CellFeaturizer:
    # Channel number on which the featurization should be performed
    channel_selection: 4

    # Number of threads to use for dataloader
    dataloader_worker_number: 0 # needs to be 0 if using CPU

    # Batch size to pass to GPU
    batch_size: 900

    # On which device inference should be performed
    # For speed should be "cuda"
    inference_device: "cpu"

    # Label under which the results should be saved
    screen_label: "Ch3_Featurization"

Featurization

Contents

Featurization#

MLClusterClassifier#

CellFeaturizer#