project#

At the core of scPortrait is the concept of a Project. A Project is a Python class that orchestrates all scPortrait processing steps, serving as the central element for all operations. Each Project corresponds to a directory on the file system, which houses the input data for a specific scPortrait run along with the generated outputs. The choice of the appropriate Project class depends on the structure of the data to be processed.

For more details, refer to here.

class scportrait.pipeline.project.Project(project_location: str, config_path: str = None, segmentation_f=None, extraction_f=None, featurization_f=None, selection_f=None, overwrite: bool = False, debug: bool = False)#

Base implementation for a scPortrait project.

This class is designed to handle single-timepoint, single-location data, like e.g. whole-slide images.

Segmentation Methods should be based on Segmentation or ShardedSegmentation. Extraction Methods should be based on HDF5CellExtraction.

config#

Dictionary containing the config file.

Type:

dict

nuc_seg_name#

Name of the nucleus segmentation object.

Type:

str

cyto_seg_name#

Name of the cytosol segmentation object.

Type:

str

sdata_path#

Path to the spatialdata object.

Type:

str

filehander#

Filehandler for the spatialdata object which manages all calls or updates to the spatialdata object.

Type:

sdata_filehandler

DEFAULT_IMAGE_DTYPE#

alias of uint16

DEFAULT_SEGMENTATION_DTYPE#

alias of uint64

DEFAULT_SINGLE_CELL_IMAGE_DTYPE#

alias of float16

property sdata: SpatialData#

Shape of data matrix (n_obs, n_vars).

update_featurization_f(featurization_f)#

Update the featurization method chosen for the project without reinitializing the entire project.

Parameters:

featurization_f – The featurization method that should be used for the project.

Returns:

the featurization method is updated in the project object.

Return type:

None

Examples

Update the featurization method for a project:

from scportrait.pipeline.featurization import CellFeaturizer

project.update_featurization_f(CellFeaturizer)
print_project_status()#

Print the current project status.

view_sdata()#

Start an interactive napari viewer to look at the sdata object associated with the project. .. note:: This only works in sessions with a visual interface.

plot_input_image(max_width: int = 1000, select_region: tuple[int, int] | None = None, channels: list[int] | list[str] | None = None, normalize: bool = False, normalization_percentile: tuple[float, float] = (0.01, 0.99), fontsize: int = 20, figsize_single_tile=(8, 8), return_fig: bool = False, image_name='input_image') Figure | None#

Plot the input image associated with the project. If the image is large it will automatically plot a subset in the center

Parameters:
  • max_size – Maximum size of the image to be plotted in pixels.

  • select_region – Tuple containing the x and y coordinates of the center of the region to be plotted. If not set it will use the center of the image.

  • channels – List of channel names or indices to be plotted. If not set, the first 4 channels will be plotted.

  • fontsize – Fontsize of the title of the plot.

  • figsize_single_tile – Size of the single tile in the plot.

  • return_fig – If set to True, the function returns the figure object instead of displaying it.

Returns:

A matplotlib figure object if return_fig is set to True.

Examples

Plot the input image of a project:

project.plot_input_image()
plot_he_image(image_name: str = 'he_image', max_width: int | None = None, select_region: tuple[int, int] | None = None, return_fig: bool = False, fontsize: int = 20) None | Figure#

Plot the hematoxylin and eosin (HE) channel of the input image.

Parameters:
  • image_name – Name of the element containing the H&E image in the spatialdata object.

  • max_width – Maximum width of the image to be plotted in pixels.

  • select_region – Tuple containing the x and y coordinates of the region to be plotted. If not set it will use the center of the image.

  • return_fig – If set to True, the function returns the figure object instead of displaying it.

Returns:

A matplotlib figure object if return_fig is set to True.

Examples

Plot the HE channel of a project:

project.plot_he()
plot_segmentation_masks(max_width: int = 1500, select_region: tuple[int, int] | None = None, normalize: bool = False, normalization_percentile: tuple[float, float] = (0.01, 0.99), image_name: str = 'input_image', mask_names: list[str] | None = None, fontsize: int = 20, linewidth: int = 1, return_fig: bool = False) None | Figure#

Plot the generated segmentation masks. If the image is large it will automatically plot a subset cropped to the center of the spatialdata object.

Parameters:
  • return_fig – If set to True, the function returns the figure object instead of displaying it.

  • max_width – Maximum width of the image to be plotted in pixels.

  • select_region – Tuple containing the x and y coordinates of the region to be plotted. If not set it will use the center of the image.

Returns:

A matplotlib figure object if return_fig is set to True.

Examples

Plot the segmentation masks of a project:

project.plot_segmentation_masks()
load_input_from_array(array: ndarray, channel_names: list[str] = None, overwrite: bool | None = None, remap: list[int] = None) None#

Load input image from a numpy array.

In the array the channels should be specified in the following order: nucleus, cytosol other channels.

Parameters:
  • array (np.ndarray) – Input image as a numpy array.

  • channel_names – List of channel names. Default is ["channel_0", "channel_1", ...].

  • overwrite (bool, None, optional) – If set to None, will read the overwrite value from the associated project. Otherwise can be set to a boolean value to override project specific settings for image loading.

  • remap – List of integers that can be used to shuffle the order of the channels. For example [1, 0, 2] to invert the first two channels. Default is None in which case no reordering is performed. This transform is also applied to the channel names.

Returns:

Image is written to the project associated sdata object.

The input image can be accessed using the project object:

project.input_image

Return type:

None

Examples

Load input images from tif files and attach them to an scportrait project:

from scportrait.pipeline.project import Project

project = Project("path/to/project", config_path="path/to/config.yml", overwrite=True, debug=False)
array = np.random.rand(3, 1000, 1000)
channel_names = ["cytosol", "nucleus", "other_channel"]
project.load_input_from_array(array, channel_names=channel_names, remap=[1, 0, 2])
load_input_from_tif_files(file_paths: list[str], channel_names: list[str] = None, crop: list[tuple[int, int]] | None = None, overwrite: bool | None = None, remap: list[int] = None, cache: str | None = None)#

Load input image from a list of files. The channels need to be specified in the following order: nucleus, cytosol other channels.

Parameters:
  • file_paths – List containing paths to each channel tiff file, like ["path1/img.tiff", "path2/img.tiff", "path3/img.tiff"]

  • channel_names – List of channel names. Default is ["channel_0", "channel_1", ...].

  • crop (None, List[Tuple], optional) – When set, it can be used to crop the input image. The first element refers to the first dimension of the image and so on. For example use [(0,1000),(0,2000)] to crop the image to 1000 px height and 2000 px width from the top left corner.

  • overwrite (bool, None, optional) – If set to None, will read the overwrite value from the associated project. Otherwise can be set to a boolean value to override project specific settings for image loading.

  • remap – List of integers that can be used to shuffle the order of the channels. For example [1, 0, 2] to invert the first two channels. Default is None in which case no reordering is performed. This transform is also applied to the channel names.

  • cache – path to a directory where the temporary files should be stored. Default is None then the current working directory will be used.

Returns:

Image is written to the project associated sdata object.

The input image can be accessed using the project object:

project.input_image

Return type:

None

Examples

Load input images from tif files and attach them to an scportrait project:

from scportrait.data._datasets import dataset_3
from scportrait.pipeline.project import Project

project = Project("path/to/project", config_path="path/to/config.yml", overwrite=True, debug=False)
path = dataset_3()
image_paths = [
    f"{path}/Ch2.tif",
    f"{path}/Ch1.tif",
    f"{path}/Ch3.tif",
]
channel_names = ["cytosol", "nucleus", "other_channel"]
project.load_input_from_tif_files(image_paths, channel_names=channel_names, remap=[1, 0, 2])
load_input_from_omezarr(ome_zarr_path: str, overwrite: None | bool = None, channel_names: None | list[str] = None, remap: list[int] = None) None#

Load input image from an ome-zarr file.

Parameters:
  • ome_zarr_path – Path to the ome-zarr file.

  • overwrite (bool, None, optional) – If set to None, will read the overwrite value from the associated project. Otherwise can be set to a boolean value to override project specific settings for image loading.

  • remap – List of integers that can be used to shuffle the order of the channels. For example [1, 0, 2] to invert the first two channels. Default is None in which case no reordering is performed. This transform is also applied to the channel names.

Returns:

Image is written to the project associated sdata object.

The input image can be accessed using the project object:

project.input_image

Return type:

None

Examples

Load input images from an ome-zarr file and attach them to an scportrait project:

from scportrait.pipeline.project import Project

project = Project("path/to/project", config_path="path/to/config.yml", overwrite=True, debug=False)
ome_zarr_path = "path/to/ome.zarr"
project.load_input_from_omezarr(ome_zarr_path, remap=[1, 0, 2])
load_input_from_dask(dask_array, channel_names: list[str], overwrite: bool | None = None) None#

Load input image from a dask array.

Parameters:
  • dask_array – Dask array containing the input image.

  • channel_names – List of channel names. Default is ["channel_0", "channel_1", ...].

  • overwrite (bool, None, optional) – If set to None, will read the overwrite value from the associated project. Otherwise can be set to a boolean value to override project specific settings for image loading.

Returns:

Image is written to the project associated sdata object.

The input image can be accessed using the project object:

project.input_image

Return type:

None

Examples

Load input images from a dask array and attach them to an scportrait project:

from scportrait.pipeline.project import Project

project = Project("path/to/project", config_path="path/to/config.yml", overwrite=True, debug=False)
dask_array = da.random.random((3, 1000, 1000))
channel_names = ["cytosol", "nucleus", "other_channel"]
project.load_input_from_dask(dask_array, channel_names=channel_names)
load_input_from_sdata(sdata_path, input_image_name: str, nucleus_segmentation_name: str | None = None, cytosol_segmentation_name: str | None = None, cell_id_identifier: str | None = None, overwrite: bool | None = None, keep_all: bool = True, remove_duplicates: bool = True) None#

Load input image from a spatialdata object.

Parameters:
  • sdata_path – Path to the spatialdata object.

  • input_image_name – Name of the element in the spatial data object containing the input image.

  • nucleus_segmentation_name – Name of the element in the spatial data object containing the nucleus segmentation mask. Default is None.

  • cytosol_segmentation_name – Name of the element in the spatial data object containing the cytosol segmentation mask. Default is None.

  • cell_id_identifier – column of annotating tables that contain the values that match a segmentation mask. If not provided it will assume this column carries the same name as the segmentation mask before parsing.

  • overwrite (bool, None, optional) – If set to None, will read the overwrite value from the associated project. Otherwise can be set to a boolean value to override project specific settings for image loading.

  • keep_all – If set to True, will keep all existing elements in the sdata object in addition to renaming the desired ones. Default is True.

  • remove_duplicates – If keep_all and remove_duplicates is True then only one copy of the spatialdata elements selected for use with scportrait processing steps will be kept. Otherwise, the element will be saved both under the original as well as the new name.

Returns:

Image is written to the project associated sdata object and self.sdata is updated.

Return type:

None

complete_segmentation(overwrite: bool | None = None, force_run: bool = False)#

If a sharded Segmentation was run but individual tiles failed to segment properly, this method can be called to repeat the segmentation on the failed tiles only. Already calculated segmentation masks will not be recalculated.

Parameters:
  • overwrite – If set to None, will read the overwrite value from the associated project. Otherwise can be set to a boolean value to override project specific settings for image loading.

  • force_run – If set to True, will force complete_segmentation to run even if a finalized segmentation mask is already found in the spatialdata object.

extract(partial=False, n_cells=None, seed: int = 42, overwrite: bool | None = None) None#

Extract single-cell images from the input image using the defined extraction method.

Parameters:
  • partial – If set to True, will run the extraction on a subset of the image. Default is False.

  • n_cells – Number of cells to extract if partial is True

  • seed – Seed for the random number generator during a partial extraction. Default is 42.

  • overwrite – If set to None, will read the overwrite value from the associated project. Otherwise can be set to a boolean value to override project specific settings for image loading

Returns:

Single-cell images are written to HDF5 files in the project associated extraction directory. File path can be accessed via project.extraction_f.output_path.

Return type:

None

select(cell_sets: list[dict], calibration_marker: ndarray | None = None, name: str | None = None)#

Select specified classes using the defined selection method.