Featurization#
MLClusterClassifier#
- class scportrait.pipeline.featurization.MLClusterClassifier(*args, **kwargs)#
Perform classification on scPortrait’s single-cell image datasets using a pretrained machine learning model.
- Parameters:
config – Configuration for the extraction passed over from the
pipeline.Project
.directory – Directory for the extraction log and results. Will be created if not existing yet.
debug – Flag used to output debug information and map images.
overwrite – Flag used to overwrite existing results.
- process(dataset_paths: str | list[str], dataset_labels: int | list[int] = 0, size: int = 0, return_results: bool = False) None | list[DataFrame] #
- Parameters:
dataset_paths – Path(s) to the single-cell dataset files on which inference should be performed. If this class is used as part of a project processing workflow this argument will be provided automatically.
dataset_labels – Int Label(s) for the dataset(s) provided in dataset_paths
size – number of cells that should be selected for inference. Default is 0, which means all cells are selected.
return_results – boolean value indicating if the classification results should be returned as a list of pandas DataFrames or directly written to disk.
- Returns:
None unless return_results is True, then the results are returned as a list of pandas DataFrames. Otherwise, the results are written to directly to file.
Important
If this class is used as part of a project processing workflow, the Project class will automatically provide the most recent extracted single-cell dataset. Therefore, only the second and third arguments need to be provided.
Example
Note
The following parameters are required in the config file:
MLClusterClassifier: # channel number on which the classification should be performed channel_selection: 4 # batch size for inference batch_size: 900 # device on which the inference should be performed inference_device: "cpu" # number of workers for the dataloader dataloader_worker_number: 10 #needs to be 0 if using cpu # pretrained model to use for classification network: "autophagy_classifier" # label that should be applied to the results label: "Autophagy_15h_classifier2_1" # which output of the model should be returned encoders: ["forward"]
EnsembleClassifier#
- class scportrait.pipeline.featurization.EnsembleClassifier(*args, **kwargs)#
Perform classification on scPortrait’s single-cell image datasets using an ensemble of pretrained machine learning models.
- Parameters:
config – Configuration for the extraction passed over from the
pipeline.Project
.directory – Directory for the extraction log and results. Will be created if not existing yet.
debug – Flag used to output debug information and map images.
overwrite – Flag used to overwrite existing results.
- process(dataset_paths: str, dataset_labels: int | list[int] = 0, size: int = 0, return_results: bool = False) None | dict #
- Parameters:
dataset_paths – Path(s) to the single-cell dataset files on which inference should be performed. If this class is used as part of a project processing workflow this argument will be provided automatically.
dataset_labels – Int Label(s) for the dataset(s) provided in dataset_paths
size – number of cells that should be selected for inference. Default is 0, which means all cells are selected.
return_results – boolean value indicating if the classification results should be returned as a list of pandas DataFrames or directly written to disk.
- Returns:
None unless return_results is True, then the results are returned as a list of pandas DataFrames. Otherwise, the results are written to directly
Important
If this class is used as part of a project processing workflow, the first argument will be provided by the
Project
class based on the previous single-cell extraction. Therefore, no parameters need to be providedExample
Note
The following parameters are required in the config file:
EnsembleClassifier: # channel number on which the classification should be performed channel_selection: 4 #number of threads to use for dataloader dataloader_worker_number: 24 #batch size to pass to GPU batch_size: 900 #path to pytorch checkpoint that should be used for inference networks: model1: "path/to/model1/" model2: "path/to/model2/" #specify input size that the models expect, provided images will be rescaled to this size input_image_px: 128 #label under which the results will be saved classification_label: "Autophagy_15h_classifier1" # on which device inference should be performed # for speed should be "cuda" inference_device: "cuda"
CellFeaturizer#
- class scportrait.pipeline.featurization.CellFeaturizer(*args, **kwargs)#
Class for extracting general image features from scPortrait’s single-cell image datasets. The extracted features are saved to a CSV file. The features are calculated on the basis of all channels.
The features which are calculated are:
Area of the masks in pixels
Mean intensity in the regions labelled by each of the masks
Median intensity in the regions labelled by each of the masks
75% quantile in the regions labelled by each of the masks
25% quantile in the regions labelled by each of the masks
Summed intensity in the regions labelled by each of the masks
Summed intensity in the region labelled by each of the masks normalized for area
- Parameters:
config – Configuration for the extraction passed over from the
pipeline.Project
.directory – Directory for the extraction log and results. Will be created if not existing yet.
debug – Flag used to output debug information and map images.
overwrite – Flag used to overwrite existing results.
- process(dataset_paths: str | list[str], dataset_labels: int | list[int] = 0, size: int = 0, return_results: bool = False) None | DataFrame #
- Parameters:
dataset_paths – Paths to the single-cell dataset files on which inference should be performed. If this class is used as part of a project processing workflow this argument will be provided automatically.
dataset_labels – labels for the provided single-cell image datasets
size – How many cells should be selected for inference. Default is 0, meaning all cells are selected.
return_results – If True, the results are returned as a pandas DataFrame. Otherwise the results are written out to file.
- Returns:
None if return_results is False, otherwise a pandas DataFrame containing the results.
Important
If this class is used as part of a project processing workflow, the first argument will be provided by the
Project
class based on the previous single-cell extraction. Therefore, only the second and third argument need to be provided. The Project class will automatically provide the most recent extraction results together with the supplied parameters.Note
The following parameters are required in the config file:
CellFeaturizer: # Number of threads to use for dataloader dataloader_worker_number: 0 # needs to be 0 if using CPU # Batch size to pass to GPU batch_size: 900 # On which device inference should be performed # For speed should be "cuda" inference_device: "cpu" # Label under which the results should be saved screen_label: "all_channels"
- class scportrait.pipeline.featurization.CellFeaturizer_single_channel(*args, **kwargs)#
Class for extracting general image features from scPortrait’s single-cell image datasets. The extracted features are saved to a CSV file. The features are calculated on the basis of a single specified channel.
The features which are calculated are:
Area of the masks in pixels
Mean intensity of the chosen channel in the regions labelled by each of the masks
Median intensity of the chosen channel in the regions labelled by each of the masks
75% quantile of the chosen channel in the regions labelled by each of the masks
25% quantile of the chosen channel in the regions labelled by each of the masks
Summed intensity of the chosen channel in the regions labelled by each of the masks
Summed intensity of the chosen channel in the region labelled by each of the masks normalized for area
- Parameters:
config – Configuration for the extraction passed over from the
pipeline.Project
.directory – Directory for the extraction log and results. Will be created if not existing yet.
debug – Flag used to output debug information and map images.
overwrite – Flag used to overwrite existing results.
- process(dataset_paths: str | list[str], dataset_labels: int | list[int] = 0, size=0, return_results: bool = False) None | DataFrame #
- Parameters:
dataset_paths – Paths to the single-cell dataset files on which inference should be performed. If this class is used as part of a project processing workflow this argument will be provided automatically.
dataset_labels – labels for the provided single-cell image datasets
size – How many cells should be selected for inference. Default is 0, meaning all cells are selected.
return_results – If True, the results are returned as a pandas DataFrame. Otherwise the results are written out to file.
- Returns:
None if return_results is False, otherwise a pandas DataFrame containing the results.
Important
If this class is used as part of a project processing workflow, the first argument will be provided by the
Project
class based on the previous single-cell extraction. Therefore, only the second and third argument need to be provided. The Project class will automatically provide the most recent extraction results together with the supplied parameters.Note
The following parameters are required in the config file:
CellFeaturizer: # Channel number on which the featurization should be performed channel_selection: 4 # Number of threads to use for dataloader dataloader_worker_number: 0 # needs to be 0 if using CPU # Batch size to pass to GPU batch_size: 900 # On which device inference should be performed # For speed should be "cuda" inference_device: "cpu" # Label under which the results should be saved screen_label: "Ch3_Featurization"
ConvNeXtFeaturizer#
- class scportrait.pipeline.featurization.ConvNeXtFeaturizer(*args, **kwargs)#
- CLEAN_LOG = True#
Compute ConvNeXt features from scPortrait’s single-cell image datasets.
This class uses the pretrained ConvNeXt model available from the Huggingface transformers library to extract features from single-cell image datasets. To be able to use this class you will need to install the optional dependenices for the transformers library. You can do this with pip install “scportrait[convnext]”.
This method will not work with Python 3.12 or later as the required version of the transformers library is not compatible with these Python Versions.
- Parameters:
config – Configuration for the extraction passed over from the
pipeline.Project
.directory – Directory for the extraction log and results. Will be created if not existing yet.
debug – Flag used to output debug information and map images.
overwrite – Flag used to overwrite existing results.
- process(dataset_paths: str | list[str], dataset_labels: int | list[int] = 0, size: int = 0, return_results: bool = False) None | DataFrame #
- Args
dataset_paths: Path(s) to the single-cell dataset files on which inference should be performed. If this class is used as part of a project processing workflow this argument will be provided automatically. dataset_labels: Int Label(s) for the dataset(s) provided in dataset_paths size: number of cells that should be selected for inference. Default is 0, which means all cells are selected. return_results: boolean value indicating if the classification results should be returned as a list of pandas DataFrames or directly written to disk.
- Returns:
None if return_results is False, otherwise a pandas DataFrame containing the results.
Important
If this class is used as part of a project processing workflow, the first argument will be provided by the
Project
class based on the previous single-cell extraction. Therefore, only the second and third arguments need to be provided. The Project class will automatically provide the most recent extracted single-cell dataset together with the supplied parameters.Example
Note
The following parameters are required in the config file:
ConvNeXtFeaturizer: # number of cells in a minibatch batch_size: 900 # number of threads to use for dataloader dataloader_worker_number: 10 #needs to be 0 if using cpu # what device should be used for inference inference_device: "auto" # how the results should be saved label: "ConvNeXtFeaturizer" # which channels to run inference on channel_selection: 4