alphapepttools.pp.impute_gaussian

alphapepttools.pp.impute_gaussian#

alphapepttools.pp.impute_gaussian(adata, group_column=None, layer=None, std_offset=1.8, std_factor=0.3, random_state=42, *, copy=False)#

Impute missing values in each column by random sampling from a gaussian distribution.

The distribution is centered at std_offset * feature standard deviation below the feature mean and has a standard deviation of std_factor * feature standard deviation. Can perform global imputation using all samples or group-wise imputation using subsets of samples defined by a categorical variable.

Parameters:
  • adata (AnnData) – AnnData object containing the data to be imputed.

  • group_column (Optional[str] (default: None)) – Column name in adata.obs defining groups for group-wise imputation. If None (default), computes statistics across all samples. If specified, computes statistics separately for each group and imputes missing values using the group-specific gaussian distribution. If group_column contains NaNs, the respective observations are ignored.

  • layer (Optional[str] (default: None)) – Name of the layer to impute. If None (default), the data matrix X is used.

  • std_offset (float (default: 1.8)) – Number of standard deviations below the mean to center the gaussian distribution.

  • std_factor (float (default: 0.3)) – Factor to multiply the feature’s standard deviation with to get the standard deviation of the gaussian distribution.

  • random_state (int (default: 42)) – Random seed for reproducibility

  • copy (bool (default: False)) – Whether to return a modified copy (True) of the anndata object. If False (default) modifies the object inplace

Return type:

AnnData | None

Returns:

None | anndata.AnnData AnnData object with imputed values in layer. If copy=False modifies the anndata object at layer inplace and returns None. If copy=True, returns a modified copy.

Raises:

Notes

Features that are fully missing will not be imputed. Appropriate filtering of features with at.pp.filter_data_completeness() is critical.

Example

Impute the values in the .X matrix

adata = at.pp.impute_gaussian(adata)
assert np.sum(np.isnan(adata.X)) == 0

Impute data in a specific layer

adata = at.pp.impute_gaussian(adata, layer="layer2")
assert np.sum(np.isnan(adata.layers["layer2"])) == 0

Impute groupwise based on a categorical column:

adata = at.pp.impute_gaussian(adata, group_column="cell_type")
# Imputes group-wise gaussian distributions