io#
- scportrait.io.read_h5sc(filename: str | Path) AnnData#
Read scportrait’s single-cell image dataset format.
- Parameters:
filename – Path to the file to read.
mode – Mode in which to open the file.
- Returns:
An AnnData object with obsm[“single_cell_images”] containing a memory-backed array of the single-cell images.
- scportrait.io.numpy_to_h5sc(mask_names: ~collections.abc.Sequence[str], channel_names: ~collections.abc.Sequence[str], mask_imgs: ~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy._typing._array_like._ScalarType_co]], channel_imgs: ~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy._typing._array_like._ScalarType_co]], output_path: str | ~pathlib.Path, cell_ids: ~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy.integer[~typing.Any]]], cell_metadata: ~pandas.DataFrame | None = None, image_dtype=<class 'numpy.float16'>, compression_type: ~typing.Literal['gzip', 'lzf'] = 'gzip') None#
Create and write an scPortrait-style .h5sc file from NumPy arrays of single-cell masks and image channels, with optional per-cell metadata.
This function builds a valid AnnData-backed HDF5 container following the scPortrait “H5SC” convention. Internally, the file is a standard AnnData .h5ad structure whose filename ends in .h5sc, and which contains a 4D image tensor stored at:
/obsm/single_cell_images
with shape:
(N, C, H, W)
- where:
N = number of cells C = n_masks + n_image_channels H = image height W = image width
The mask channels are stored first, followed by the image channels. All data are written as a single float16 HDF5 dataset, with mask values encoded as 0.0 and 1.0.
Cell identifiers and optional per-cell metadata are written to adata.obs.
- Metadata are written redundantly:
At the AnnData level in adata.uns[…]
At the HDF5 level as attributes on /obsm/single_cell_images
This allows the file to be read both via AnnData and as a standalone HDF5 image container.
- Parameters:
mask_names – Names of the mask channels. Length must match mask_imgs.shape[1].
channel_names – Names of the image channels. Length must match channel_imgs.shape[1].
mask_imgs – Array of mask images with shape (N, n_masks, H, W). Masks are expected to be binary (0 or 1) and will be stored as float16.
channel_imgs – Array of image channels with shape (N, n_image_channels, H, W). Images should already be normalized (e.g., to [0, 1]) before writing.
output_path – Path of the .h5sc file to create, e.g. “/path/to/file.h5sc”. The file will be overwritten if it already exists.
cell_ids – Array of segmentation cell identifiers with shape (N,). These values are written into adata.obs[DEFAULT_CELL_ID_NAME] and define the mapping between row index and original segmentation label.
cell_metadata – Optional per-cell metadata to be written into adata.obs. Must have exactly N rows. Columns will be merged into obs alongside the cell ID column. The index is ignored and replaced by AnnData’s internal index.
compression_type – HDF5 compression algorithm used for the image tensor. - “gzip”: better compression, slower I/O - “lzf” : faster I/O, lower compression ratio
- File layout created:
The resulting file contains:
- /obs
Per-cell metadata including cell IDs and optional user-provided metadata.
- /var
Channel metadata (channel names and channel mapping).
- /uns
scPortrait metadata describing the image container.
- /obsm/single_cell_images
HDF5 dataset with shape (N, C, H, W), dtype float16, chunked as (1, 1, H, W), compressed.
Notes
The file is technically an AnnData .h5ad file with a .h5sc extension.
Masks and image channels share a single dataset and dtype (float16).
The function performs a single-threaded write; no file locking is used.
All input arrays are cast to the storage dtype before writing.
Warning
- UserWarning: If mask_imgs or channel_imgs contain values outside [0, 1].
Mask images or channel images are outside the expected [0, 1] range. This does not align with scPortrait’s convention and unscaled data can produce unexpected results in downstream functions or require additional preprocessing before passing images to deep learning models.
- Raises:
Exception – If: - mask_imgs or channel_imgs do not have 4 dimensions (N, C, H, W), - mask_imgs and channel_imgs have different numbers of cells, - mask_imgs and channel_imgs have different image sizes, - the number of provided channel names does not match the array shapes, - cell_metadata does not have N rows, - an unsupported compression type is requested.