alphapepttools.pp.impute_median

Contents

alphapepttools.pp.impute_median#

alphapepttools.pp.impute_median(adata, group_column=None, *, layer=None, copy=True)#

Impute missing values using median imputation

Replace missing (NaN) values in the data matrix with the median of non-missing values for each feature. Can perform global imputation using all samples or group-wise imputation using subsets of samples defined by a categorical variable.

Parameters:
  • adata (AnnData) – AnnData object

  • layer (Optional[str] (default: None)) – Layer to use for imputation

  • group_column (Optional[str] (default: None)) – Column name in adata.obs defining groups for group-wise imputation. If None (default), computes median across all samples. Defines a group column that is used to subset the samples that should be used for imputation. If specified, computes median separately for each group and imputes missing values using the group-specific median. If group_column contains NaNs, the respective observations are ignored.

  • copy (bool (default: True)) – Whether to return a modified copy (True) of the anndata object. If False (default) modifies the object inplace

Return type:

AnnData

Returns:

ad.AnnData Copy of anndata object with modified layer

Raises:
  • Warning – If group_column contains NaNs

  • Warning – If a feature contains only NaNs

Notes

Features that are fully missing will not be imputed. Appropriate filtering of features with at.pp.filter_data_completeness() is critical.

Example

Impute the values in the .X matrix

adata = at.pp.impute_median(adata)
assert np.sum(np.isnan(adata.X)) == 0

Impute data in a specific layer

adata = at.pp.impute_median(adata, layer="layer2")
assert np.sum(np.isnan(adata.layers["layer2"])) == 0

Impute groupwise based on a categorical column:

adata = at.pp.impute_median(adata, group_column="cell_type")
# Imputes group-wise medians