io#

memory mapped file handling#

scportrait.io.daskmmap.dask_array_from_path(file_path: str, container_name: str = 'array') Array#

Create a Dask array from a HDF5 file, supporting both contiguous and chunked datasets.

Parameters:
  • file_path – Path pointing to the HDF5 file

  • container_name – Name of the dataset in the HDF5 file

Returns:

Dask array representing the dataset

scportrait.io.daskmmap.calculate_chunk_sizes(shape: tuple[int, ...], dtype: dtype | str, target_size_gb: int = 5) tuple[int, ...]#

Calculate chunk sizes that result in chunks of approximately the target size in GB.

Parameters:
  • shape – Shape of the array

  • dtype – Data type of the array

  • target_size_gb – Target size of each chunk in gigabytes

Returns:

Calculated chunk sizes for the Dask array

scportrait.io.daskmmap.calculate_chunk_sizes_chunks(shape: tuple[int, ...], dtype: dtype | str, HDF5_chunk_size: tuple[int, ...], target_size_gb: int = 5) tuple[int, ...]#

Calculate chunk sizes that result in chunks of approximately the target size in GB.

Parameters:
  • shape – Shape of the array

  • dtype – Data type of the array

  • HDF5_chunk_size – Chunk sizes of the existing HDF5 data container

  • target_size_gb – Target size of each chunk in gigabytes

Returns:

Calculated chunk sizes for the Dask array

scportrait.io.daskmmap.mmap_dask_array_contigious(filename: str, shape: tuple[int, ...], dtype: dtype | str, offset: int = 0, chunks: tuple[int, ...] = (5,)) Array#

Create a Dask array from raw binary data in filename by memory mapping.

Parameters:
  • filename – Path to the raw binary data file

  • shape – Shape of the array

  • dtype – Data type of the array

  • offset – Offset in bytes from the beginning of the file

  • chunks – Chunk sizes for the Dask array

Returns:

Dask array that is memory-mapped to disk

scportrait.io.daskmmap.mmap_dask_array_chunked(filename: str, shape: tuple[int, ...], dtype: dtype | str, container_name: str, chunks: tuple[int, ...] = (5,)) Array#

Create a Dask array from raw binary data in filename by memory mapping.

Parameters:
  • filename – Path to the raw binary data file

  • shape – Shape of the array

  • dtype – Data type of the array

  • container_name – Name of the dataset in the HDF5 file

  • chunks – Chunk sizes for the Dask array

Returns:

Dask array that is memory-mapped to disk

scportrait.io.daskmmap.load_hdf5_contigious(filename: str, shape: tuple[int, ...], dtype: dtype | str, offset: int, slices: tuple[slice, ...]) ndarray#

Memory map the given file with overall shape and dtype and return a slice.

Parameters:
  • filename – Path to the raw binary data file

  • shape – Shape of the array

  • dtype – Data type of the array

  • offset – Offset in bytes from the beginning of the file

  • slices – Tuple of slices specifying the chunk to load

Returns:

The sliced chunk from the memory-mapped array

scportrait.io.daskmmap.load_hdf5_chunk(file_path: str, container_name: str, slices: tuple[slice, ...]) ndarray#

Load a chunk of data from a chunked HDF5 dataset.

Parameters:
  • file_path – Path to the HDF5 file

  • container_name – Name of the dataset in the HDF5 file

  • slices – Tuple of slices specifying the chunk to load

Returns:

The sliced chunk from the HDF5 dataset

file readers#

scportrait.io.read.read_ome_zarr(path: str, magnification: str = '0', array: ndarray[Any, dtype[_ScalarType_co]] | None = None) ndarray[Any, dtype[_ScalarType_co]] | None#

Reads an OME-Zarr file from a given path.

Parameters:
  • path – Path to the OME-Zarr file

  • magnification – Magnification level to be read

  • array – Optional numpy array to store the image data. If None, returns a new array

Returns:

The image data as a numpy array if array is None, otherwise None after updating the provided array

Example

>>> image = read_ome_zarr("path/to/file.zarr")
>>> # Or with existing array:
>>> existing_array = np.zeros((100, 100))
>>> read_ome_zarr("path/to/file.zarr", array=existing_array)