io#
memory mapped file handling#
- scportrait.io.daskmmap.dask_array_from_path(file_path, container_name='array')#
Create a Dask array from a HDF5 file, supporting both contiguous and chunked datasets.
- Parameters:
file_path (str) – Path pointing to the HDF5 file.
- Returns:
Dask array representing the dataset.
- Return type:
dask.array.Array
- scportrait.io.daskmmap.calculate_chunk_sizes(shape, dtype, target_size_gb=5)#
Calculate chunk sizes that result in chunks of approximately the target size in GB.
- Parameters:
shape (tuple) – Shape of the array.
dtype (np.dtype) – Data type of the array.
target_size_gb (int) – Target size of each chunk in gigabytes.
- Returns:
Calculated chunk sizes for the Dask array.
- Return type:
tuple
- scportrait.io.daskmmap.calculate_chunk_sizes_chunks(shape, dtype, HDF5_chunk_size, target_size_gb=5)#
Calculate chunk sizes that result in chunks of approximately the target size in GB which are equal multiples of the existing chunk sizes.
- Parameters:
shape (tuple) – Shape of the array.
dtype (np.dtype) – Data type of the array.
target_size_gb (int) – Target size of each chunk in gigabytes.
chunk_size (tuple) – Chunk sizes of the existing HDF5 data container.
- Returns:
Calculated chunk sizes for the Dask array.
- Return type:
tuple
- scportrait.io.daskmmap.mmap_dask_array_contigious(filename, shape, dtype, offset=0, chunks=(5,))#
Create a Dask array from raw binary data in filename by memory mapping.
- Parameters:
filename (str) – Path to the raw binary data file.
shape (tuple) – Shape of the array.
dtype (np.dtype) – Data type of the array.
offset (int, optional) – Offset in bytes from the beginning of the file.
chunks (tuple, optional) – Chunk sizes for the Dask array.
- Returns:
Dask array that is memory-mapped to disk.
- Return type:
dask.array.Array
- scportrait.io.daskmmap.mmap_dask_array_chunked(filename, shape, dtype, container_name, chunks=(5,))#
Create a Dask array from raw binary data in filename by memory mapping.
- Parameters:
filename (str) – Path to the raw binary data file.
shape (tuple) – Shape of the array.
dtype (np.dtype) – Data type of the array.
offset (int, optional) – Offset in bytes from the beginning of the file.
chunks (tuple, optional) – Chunk sizes for the Dask array.
- Returns:
Dask array that is memory-mapped to disk.
- Return type:
dask.array.Array
- scportrait.io.daskmmap.load_hdf5_contigious(filename, shape, dtype, offset, slices)#
Memory map the given file with overall shape and dtype and return a slice specified by slices.
- Parameters:
filename (str) – Path to the raw binary data file.
shape (tuple) – Shape of the array.
dtype (np.dtype) – Data type of the array.
offset (int) – Offset in bytes from the beginning of the file.
slices (tuple) – Tuple of slices specifying the chunk to load.
- Returns:
The sliced chunk from the memory-mapped array.
- Return type:
np.ndarray
- scportrait.io.daskmmap.load_hdf5_chunk(file_path, container_name, slices)#
Load a chunk of data from a chunked HDF5 dataset.
- Parameters:
file_path (str) – Path to the HDF5 file.
container_name (str) – Name of the dataset in the HDF5 file.
slices (tuple) – Tuple of slices specifying the chunk to load.
- Returns:
The sliced chunk from the HDF5 dataset.
- Return type:
np.ndarray
file readers#
- scportrait.io.read.read_ome_zarr(path, magnification='0', array=None)#
Reads an OME-Zarr file from a given path.
- Parameters:
path (str) – Path to the OME-Zarr file.
magnification (str) – Magnification level to be read.
array (None or np.array) – If None, the image data is read into memory and returned. If not None, the image data is read into the supplied numpy array
- Returns:
If array is None, the image data is read into memory and returned. Otherwise the provided variable is updated.
- Return type:
np.array or None