project#
At the core of scPortrait is the concept of a Project. A Project is a Python class that orchestrates all scPortrait processing steps, serving as the central element for all operations. Each Project corresponds to a directory on the file system, which houses the input data for a specific scPortrait run along with the generated outputs. The choice of the appropriate Project class depends on the structure of the data to be processed.
For more details, refer to here.
- class scportrait.pipeline.project.Project(project_location: str, config_path: str = None, segmentation_f=None, extraction_f=None, featurization_f=None, selection_f=None, overwrite: bool = False, debug: bool = False)#
Base implementation for a scPortrait
project
.This class is designed to handle single-timepoint, single-location data, like e.g. whole-slide images.
Segmentation Methods should be based on
Segmentation
orShardedSegmentation
. Extraction Methods should be based onHDF5CellExtraction
.- config#
Dictionary containing the config file.
- Type:
dict
- nuc_seg_name#
Name of the nucleus segmentation object.
- Type:
str
- cyto_seg_name#
Name of the cytosol segmentation object.
- Type:
str
- sdata_path#
Path to the spatialdata object.
- Type:
str
- filehander#
Filehandler for the spatialdata object which manages all calls or updates to the spatialdata object.
- Type:
sdata_filehandler
- DEFAULT_IMAGE_DTYPE#
alias of
uint16
- DEFAULT_SEGMENTATION_DTYPE#
alias of
uint64
- DEFAULT_SINGLE_CELL_IMAGE_DTYPE#
alias of
float16
- property sdata: SpatialData#
Shape of data matrix (
n_obs
,n_vars
).
- update_featurization_f(featurization_f)#
Update the featurization method chosen for the project without reinitializing the entire project.
- Parameters:
featurization_f – The featurization method that should be used for the project.
- Returns:
the featurization method is updated in the project object.
- Return type:
None
Examples
Update the featurization method for a project:
from scportrait.pipeline.featurization import CellFeaturizer project.update_featurization_f(CellFeaturizer)
- print_project_status()#
Print the current project status.
- view_sdata()#
Start an interactive napari viewer to look at the sdata object associated with the project. .. note:: This only works in sessions with a visual interface.
- plot_input_image(max_width: int = 1000, select_region: tuple[int, int] | None = None, channels: list[int] | list[str] | None = None, normalize: bool = False, normalization_percentile: tuple[float, float] = (0.01, 0.99), fontsize: int = 20, figsize_single_tile=(8, 8), return_fig: bool = False, image_name='input_image') Figure | None #
Plot the input image associated with the project. If the image is large it will automatically plot a subset in the center
- Parameters:
max_size – Maximum size of the image to be plotted in pixels.
select_region – Tuple containing the x and y coordinates of the center of the region to be plotted. If not set it will use the center of the image.
channels – List of channel names or indices to be plotted. If not set, the first 4 channels will be plotted.
fontsize – Fontsize of the title of the plot.
figsize_single_tile – Size of the single tile in the plot.
return_fig – If set to
True
, the function returns the figure object instead of displaying it.
- Returns:
A matplotlib figure object if return_fig is set to
True
.
Examples
Plot the input image of a project:
project.plot_input_image()
- plot_he_image(image_name: str = 'he_image', max_width: int | None = None, select_region: tuple[int, int] | None = None, return_fig: bool = False, fontsize: int = 20) None | Figure #
Plot the hematoxylin and eosin (HE) channel of the input image.
- Parameters:
image_name – Name of the element containing the H&E image in the spatialdata object.
max_width – Maximum width of the image to be plotted in pixels.
select_region – Tuple containing the x and y coordinates of the region to be plotted. If not set it will use the center of the image.
return_fig – If set to
True
, the function returns the figure object instead of displaying it.
- Returns:
A matplotlib figure object if return_fig is set to
True
.
Examples
Plot the HE channel of a project:
project.plot_he()
- plot_segmentation_masks(max_width: int = 1500, select_region: tuple[int, int] | None = None, normalize: bool = False, normalization_percentile: tuple[float, float] = (0.01, 0.99), image_name: str = 'input_image', mask_names: list[str] | None = None, fontsize: int = 20, linewidth: int = 1, return_fig: bool = False) None | Figure #
Plot the generated segmentation masks. If the image is large it will automatically plot a subset cropped to the center of the spatialdata object.
- Parameters:
return_fig – If set to
True
, the function returns the figure object instead of displaying it.max_width – Maximum width of the image to be plotted in pixels.
select_region – Tuple containing the x and y coordinates of the region to be plotted. If not set it will use the center of the image.
- Returns:
A matplotlib figure object if return_fig is set to
True
.
Examples
Plot the segmentation masks of a project:
project.plot_segmentation_masks()
- load_input_from_array(array: ndarray, channel_names: list[str] = None, overwrite: bool | None = None, remap: list[int] = None) None #
Load input image from a numpy array.
In the array the channels should be specified in the following order: nucleus, cytosol other channels.
- Parameters:
array (np.ndarray) – Input image as a numpy array.
channel_names – List of channel names. Default is
["channel_0", "channel_1", ...]
.overwrite (bool, None, optional) – If set to
None
, will read the overwrite value from the associated project. Otherwise can be set to a boolean value to override project specific settings for image loading.remap – List of integers that can be used to shuffle the order of the channels. For example
[1, 0, 2]
to invert the first two channels. Default isNone
in which case no reordering is performed. This transform is also applied to the channel names.
- Returns:
Image is written to the project associated sdata object.
The input image can be accessed using the project object:
project.input_image
- Return type:
None
Examples
Load input images from tif files and attach them to an scportrait project:
from scportrait.pipeline.project import Project project = Project("path/to/project", config_path="path/to/config.yml", overwrite=True, debug=False) array = np.random.rand(3, 1000, 1000) channel_names = ["cytosol", "nucleus", "other_channel"] project.load_input_from_array(array, channel_names=channel_names, remap=[1, 0, 2])
- load_input_from_tif_files(file_paths: list[str], channel_names: list[str] = None, crop: list[tuple[int, int]] | None = None, overwrite: bool | None = None, remap: list[int] = None, cache: str | None = None)#
Load input image from a list of files. The channels need to be specified in the following order: nucleus, cytosol other channels.
- Parameters:
file_paths – List containing paths to each channel tiff file, like
["path1/img.tiff", "path2/img.tiff", "path3/img.tiff"]
channel_names – List of channel names. Default is
["channel_0", "channel_1", ...]
.crop (None, List[Tuple], optional) – When set, it can be used to crop the input image. The first element refers to the first dimension of the image and so on. For example use
[(0,1000),(0,2000)]
to crop the image to 1000 px height and 2000 px width from the top left corner.overwrite (bool, None, optional) – If set to
None
, will read the overwrite value from the associated project. Otherwise can be set to a boolean value to override project specific settings for image loading.remap – List of integers that can be used to shuffle the order of the channels. For example
[1, 0, 2]
to invert the first two channels. Default isNone
in which case no reordering is performed. This transform is also applied to the channel names.cache – path to a directory where the temporary files should be stored. Default is
None
then the current working directory will be used.
- Returns:
Image is written to the project associated sdata object.
The input image can be accessed using the project object:
project.input_image
- Return type:
None
Examples
Load input images from tif files and attach them to an scportrait project:
from scportrait.data._datasets import dataset_3 from scportrait.pipeline.project import Project project = Project("path/to/project", config_path="path/to/config.yml", overwrite=True, debug=False) path = dataset_3() image_paths = [ f"{path}/Ch2.tif", f"{path}/Ch1.tif", f"{path}/Ch3.tif", ] channel_names = ["cytosol", "nucleus", "other_channel"] project.load_input_from_tif_files(image_paths, channel_names=channel_names, remap=[1, 0, 2])
- load_input_from_omezarr(ome_zarr_path: str, overwrite: None | bool = None, channel_names: None | list[str] = None, remap: list[int] = None) None #
Load input image from an ome-zarr file.
- Parameters:
ome_zarr_path – Path to the ome-zarr file.
overwrite (bool, None, optional) – If set to
None
, will read the overwrite value from the associated project. Otherwise can be set to a boolean value to override project specific settings for image loading.remap – List of integers that can be used to shuffle the order of the channels. For example
[1, 0, 2]
to invert the first two channels. Default isNone
in which case no reordering is performed. This transform is also applied to the channel names.
- Returns:
Image is written to the project associated sdata object.
The input image can be accessed using the project object:
project.input_image
- Return type:
None
Examples
Load input images from an ome-zarr file and attach them to an scportrait project:
from scportrait.pipeline.project import Project project = Project("path/to/project", config_path="path/to/config.yml", overwrite=True, debug=False) ome_zarr_path = "path/to/ome.zarr" project.load_input_from_omezarr(ome_zarr_path, remap=[1, 0, 2])
- load_input_from_dask(dask_array, channel_names: list[str], overwrite: bool | None = None) None #
Load input image from a dask array.
- Parameters:
dask_array – Dask array containing the input image.
channel_names – List of channel names. Default is
["channel_0", "channel_1", ...]
.overwrite (bool, None, optional) – If set to
None
, will read the overwrite value from the associated project. Otherwise can be set to a boolean value to override project specific settings for image loading.
- Returns:
Image is written to the project associated sdata object.
The input image can be accessed using the project object:
project.input_image
- Return type:
None
Examples
Load input images from a dask array and attach them to an scportrait project:
from scportrait.pipeline.project import Project project = Project("path/to/project", config_path="path/to/config.yml", overwrite=True, debug=False) dask_array = da.random.random((3, 1000, 1000)) channel_names = ["cytosol", "nucleus", "other_channel"] project.load_input_from_dask(dask_array, channel_names=channel_names)
- load_input_from_sdata(sdata_path, input_image_name: str, nucleus_segmentation_name: str | None = None, cytosol_segmentation_name: str | None = None, cell_id_identifier: str | None = None, overwrite: bool | None = None, keep_all: bool = True, remove_duplicates: bool = True) None #
Load input image from a spatialdata object.
- Parameters:
sdata_path – Path to the spatialdata object.
input_image_name – Name of the element in the spatial data object containing the input image.
nucleus_segmentation_name – Name of the element in the spatial data object containing the nucleus segmentation mask. Default is
None
.cytosol_segmentation_name – Name of the element in the spatial data object containing the cytosol segmentation mask. Default is
None
.cell_id_identifier – column of annotating tables that contain the values that match a segmentation mask. If not provided it will assume this column carries the same name as the segmentation mask before parsing.
overwrite (bool, None, optional) – If set to
None
, will read the overwrite value from the associated project. Otherwise can be set to a boolean value to override project specific settings for image loading.keep_all – If set to
True
, will keep all existing elements in the sdata object in addition to renaming the desired ones. Default isTrue
.remove_duplicates – If keep_all and remove_duplicates is True then only one copy of the spatialdata elements selected for use with scportrait processing steps will be kept. Otherwise, the element will be saved both under the original as well as the new name.
- Returns:
Image is written to the project associated sdata object and self.sdata is updated.
- Return type:
None
- complete_segmentation(overwrite: bool | None = None, force_run: bool = False)#
If a sharded Segmentation was run but individual tiles failed to segment properly, this method can be called to repeat the segmentation on the failed tiles only. Already calculated segmentation masks will not be recalculated.
- Parameters:
overwrite – If set to
None
, will read the overwrite value from the associated project. Otherwise can be set to a boolean value to override project specific settings for image loading.force_run – If set to
True
, will force complete_segmentation to run even if a finalized segmentation mask is already found in the spatialdata object.
- extract(partial=False, n_cells=None, seed: int = 42, overwrite: bool | None = None) None #
Extract single-cell images from the input image using the defined extraction method.
- Parameters:
partial – If set to
True
, will run the extraction on a subset of the image. Default isFalse
.n_cells – Number of cells to extract if partial is
True
seed – Seed for the random number generator during a partial extraction. Default is
42
.overwrite – If set to
None
, will read the overwrite value from the associated project. Otherwise can be set to a boolean value to override project specific settings for image loading
- Returns:
Single-cell images are written to HDF5 files in the project associated extraction directory. File path can be accessed via
project.extraction_f.output_path
.- Return type:
None
- select(cell_sets: list[dict], calibration_marker: ndarray | None = None, name: str | None = None)#
Select specified classes using the defined selection method.