Featurization#

MLClusterClassifier#

class scportrait.pipeline.featurization.MLClusterClassifier(*args, **kwargs)#

Class for classifying single cells using a pre-trained machine learning model.

This class takes a pre-trained model and uses it to classify single cells, using the model’s forward function or encoder function, depending on the user’s choice. The classification results are saved to a CSV file.

__call__(*args, debug=None, overwrite=None, **kwargs)#

Call the processing step.

Parameters:
  • debug (bool, optional, default None) – Allows overriding the value set on initiation. When set to True debug outputs will be printed where applicable.

  • overwrite (bool, optional, default None) – Allows overriding the value set on initiation. When set to True, the processing step directory will be completely deleted and newly created when called.

DEFAULT_MODEL_CLASS#

alias of MultilabelSupervisedModel

DEFAULT_DATA_LOADER#

alias of HDF5SingleCellDataset

process(extraction_dir: str, size: int = 0)#

Perform classification on the provided HDF5 dataset.

Parameters:
  • extraction_dir (str) – Directory containing the extracted HDF5 files from the project. If this class is used as part of a project processing workflow, this argument will be provided automatically.

  • size (int, optional) – How many cells should be selected for inference. Default is 0, which means all cells are selected.

Returns:

Results are written to CSV files located in the project directory.

Return type:

None

Important

If this class is used as part of a project processing workflow, the first argument will be provided by the Project class based on the previous single-cell extraction. Therefore, only the second and third arguments need to be provided. The Project class will automatically provide the most recent extracted single-cell dataset together with the supplied parameters.

Examples

project.classify()

Notes

The following parameters are required in the config file:

MLClusterClassifier:
    # Channel number on which the classification should be performed
    channel_selection: 4

    # Number of threads to use for dataloader
    dataloader_worker_number: 24

    # Batch size to pass to GPU
    batch_size: 900

    # Path to PyTorch checkpoint that should be used for inference
    network: "path/to/model/"

    # Classifier architecture implemented in scPortrait
    # Choose one of VGG1, VGG2, VGG1_old, VGG2_old
    classifier_architecture: "VGG2_old"

    # If more than one checkpoint is provided in the network directory, which checkpoint should be chosen
    # Should either be "max" or a numeric value indicating the epoch number
    epoch: "max"

    # Name of the classifier used for saving the classification results to a directory
    label: "Autophagy_15h_classifier1"

    # List of which inference methods should be performed
    # Available: "forward" and "encoder"
    # If "forward": images are passed through all layers of the model and the final inference results are written to file
    # If "encoder": activations at the end of the CNN are written to file
    encoders: ["forward", "encoder"]

    # On which device inference should be performed
    # For speed, should be "cuda"
    inference_device: "cuda"

    #define dataset transforms
    transforms:
        resize: 128

CellFeaturizer#

class scportrait.pipeline.featurization.CellFeaturizer(*args, **kwargs)#

Class for extracting general image features from SPARCS single-cell image datasets. The extracted features are saved to a CSV file. The features are calculated on the basis of a specified channel.

The features which are calculated are:

  • Area of the masks in pixels

  • Mean intensity of the chosen channel in the regions labelled by each of the masks

  • Median intensity of the chosen channel in the regions labelled by each of the masks

  • 75% quantile of the chosen channel in the regions labelled by each of the masks

  • 25% quantile of the chosen channel in the regions labelled by each of the masks

  • Summed intensity of the chosen channel in the regions labelled by each of the masks

  • Summed intensity of the chosen channel in the region labelled by each of the masks normalized for area

The features are outputed in this order in the CSV file.

__call__(*args, debug=None, overwrite=None, **kwargs)#

Call the processing step.

Parameters:
  • debug (bool, optional, default None) – Allows overriding the value set on initiation. When set to True debug outputs will be printed where applicable.

  • overwrite (bool, optional, default None) – Allows overriding the value set on initiation. When set to True, the processing step directory will be completely deleted and newly created when called.

process(extraction_dir, size=0)#

Perform featurization on the provided HDF5 dataset.

Parameters:
  • extraction_dir (str) – Directory containing the extracted HDF5 files from the project. If this class is used as part of a project processing workflow this argument will be provided automatically.

  • size (int, optional, default=0) – How many cells should be selected for inference. Default is 0, meaning all cells are selected.

Returns:

Results are written to CSV files located in the project directory.

Return type:

None

Important

If this class is used as part of a project processing workflow, the first argument will be provided by the Project class based on the previous single-cell extraction. Therefore, only the second and third argument need to be provided. The Project class will automatically provide the most recent extraction results together with the supplied parameters.

Examples

# Define accessory dataset: additional HDF5 datasets that you want to perform an inference on
# Leave empty if you only want to infer on all extracted cells in the current project

project.classify()

Notes

The following parameters are required in the config file:

CellFeaturizer:
    # Channel number on which the featurization should be performed
    channel_selection: 4

    # Number of threads to use for dataloader
    dataloader_worker_number: 0 # needs to be 0 if using CPU

    # Batch size to pass to GPU
    batch_size: 900

    # On which device inference should be performed
    # For speed should be "cuda"
    inference_device: "cpu"

    # Label under which the results should be saved
    screen_label: "Ch3_Featurization"