alphapepttools.pp.impute_gaussian#
- alphapepttools.pp.impute_gaussian(adata, group_column=None, layer=None, std_offset=1.8, std_factor=0.3, random_state=42, *, copy=False)#
Impute missing values in each column by random sampling from a gaussian distribution.
The distribution is centered at std_offset * feature standard deviation below the feature mean and has a standard deviation of std_factor * feature standard deviation. Can perform global imputation using all samples or group-wise imputation using subsets of samples defined by a categorical variable.
- Parameters:
adata (
AnnData) – AnnData object containing the data to be imputed.group_column (
Optional[str] (default:None)) – Column name inadata.obsdefining groups for group-wise imputation. IfNone(default), computes statistics across all samples. If specified, computes statistics separately for each group and imputes missing values using the group-specific gaussian distribution. Ifgroup_columncontains NaNs, the respective observations are ignored.layer (
Optional[str] (default:None)) – Name of the layer to impute. If None (default), the data matrix X is used.std_offset (
float(default:1.8)) – Number of standard deviations below the mean to center the gaussian distribution.std_factor (
float(default:0.3)) – Factor to multiply the feature’s standard deviation with to get the standard deviation of the gaussian distribution.random_state (
int(default:42)) – Random seed for reproducibilitycopy (
bool(default:False)) – Whether to return a modified copy (True) of the anndata object. If False (default) modifies the object inplace
- Return type:
- Returns:
None | anndata.AnnData AnnData object with imputed values in layer. If
copy=Falsemodifies the anndata object at layer inplace and returns None. Ifcopy=True, returns a modified copy.- Raises:
ValueError – If
group_columncontains NaNsValueError – If a feature contains only NaNs
Notes
Features that are fully missing will not be imputed. Appropriate filtering of features with
at.pp.filter_data_completeness()is critical.Example
Impute the values in the
.Xmatrixadata = at.pp.impute_gaussian(adata) assert np.sum(np.isnan(adata.X)) == 0
Impute data in a specific layer
adata = at.pp.impute_gaussian(adata, layer="layer2") assert np.sum(np.isnan(adata.layers["layer2"])) == 0
Impute groupwise based on a categorical column:
adata = at.pp.impute_gaussian(adata, group_column="cell_type") # Imputes group-wise gaussian distributions