io#

memory mapped file handling#

scportrait.io.daskmmap.dask_array_from_path(file_path, container_name='array')#

Create a Dask array from a HDF5 file, supporting both contiguous and chunked datasets.

Parameters:

file_path (str) – Path pointing to the HDF5 file.

Returns:

Dask array representing the dataset.

Return type:

dask.array.Array

scportrait.io.daskmmap.calculate_chunk_sizes(shape, dtype, target_size_gb=5)#

Calculate chunk sizes that result in chunks of approximately the target size in GB.

Parameters:
  • shape (tuple) – Shape of the array.

  • dtype (np.dtype) – Data type of the array.

  • target_size_gb (int) – Target size of each chunk in gigabytes.

Returns:

Calculated chunk sizes for the Dask array.

Return type:

tuple

scportrait.io.daskmmap.calculate_chunk_sizes_chunks(shape, dtype, HDF5_chunk_size, target_size_gb=5)#

Calculate chunk sizes that result in chunks of approximately the target size in GB which are equal multiples of the existing chunk sizes.

Parameters:
  • shape (tuple) – Shape of the array.

  • dtype (np.dtype) – Data type of the array.

  • target_size_gb (int) – Target size of each chunk in gigabytes.

  • chunk_size (tuple) – Chunk sizes of the existing HDF5 data container.

Returns:

Calculated chunk sizes for the Dask array.

Return type:

tuple

scportrait.io.daskmmap.mmap_dask_array_contigious(filename, shape, dtype, offset=0, chunks=(5,))#

Create a Dask array from raw binary data in filename by memory mapping.

Parameters:
  • filename (str) – Path to the raw binary data file.

  • shape (tuple) – Shape of the array.

  • dtype (np.dtype) – Data type of the array.

  • offset (int, optional) – Offset in bytes from the beginning of the file.

  • chunks (tuple, optional) – Chunk sizes for the Dask array.

Returns:

Dask array that is memory-mapped to disk.

Return type:

dask.array.Array

scportrait.io.daskmmap.mmap_dask_array_chunked(filename, shape, dtype, container_name, chunks=(5,))#

Create a Dask array from raw binary data in filename by memory mapping.

Parameters:
  • filename (str) – Path to the raw binary data file.

  • shape (tuple) – Shape of the array.

  • dtype (np.dtype) – Data type of the array.

  • offset (int, optional) – Offset in bytes from the beginning of the file.

  • chunks (tuple, optional) – Chunk sizes for the Dask array.

Returns:

Dask array that is memory-mapped to disk.

Return type:

dask.array.Array

scportrait.io.daskmmap.load_hdf5_contigious(filename, shape, dtype, offset, slices)#

Memory map the given file with overall shape and dtype and return a slice specified by slices.

Parameters:
  • filename (str) – Path to the raw binary data file.

  • shape (tuple) – Shape of the array.

  • dtype (np.dtype) – Data type of the array.

  • offset (int) – Offset in bytes from the beginning of the file.

  • slices (tuple) – Tuple of slices specifying the chunk to load.

Returns:

The sliced chunk from the memory-mapped array.

Return type:

np.ndarray

scportrait.io.daskmmap.load_hdf5_chunk(file_path, container_name, slices)#

Load a chunk of data from a chunked HDF5 dataset.

Parameters:
  • file_path (str) – Path to the HDF5 file.

  • container_name (str) – Name of the dataset in the HDF5 file.

  • slices (tuple) – Tuple of slices specifying the chunk to load.

Returns:

The sliced chunk from the HDF5 dataset.

Return type:

np.ndarray

file readers#

scportrait.io.read.read_ome_zarr(path, magnification='0', array=None)#

Reads an OME-Zarr file from a given path.

Parameters:
  • path (str) – Path to the OME-Zarr file.

  • magnification (str) – Magnification level to be read.

  • array (None or np.array) – If None, the image data is read into memory and returned. If not None, the image data is read into the supplied numpy array

Returns:

If array is None, the image data is read into memory and returned. Otherwise the provided variable is updated.

Return type:

np.array or None