alphapepttools.pp.impute_knn

Contents

alphapepttools.pp.impute_knn#

alphapepttools.pp.impute_knn(adata, group_column=None, layer=None, *, n_neighbors=2, weights='distance', copy=False, **kwargs)#

Impute missing values using median imputation

Replace missing (NaN) values for each feature in the data matrix with the estimate based on non-missing values in the k nearest observations. Can perform global imputation using all samples or group-wise imputation using subsets of samples defined by a categorical variable.

Parameters:
  • adata (AnnData) – AnnData object

  • layer (Optional[str] (default: None)) – Layer to use for imputation

  • group_column (Optional[str] (default: None)) –

    Column name in adata.obs defining groups for group-wise imputation.
    • None (default), imputes all samples.

    • str Computes median separately for each group

    If group_column contains NaNs, the respective observations are ignored.

  • n_neighbors (int (default: 2)) – Number of neighbors to consider during imputation

  • weights (Literal['distance', 'uniform'] (default: 'distance')) –

    Weighting strategy for kNN imputation.
    • uniform: All k-nearest neighbors are weighted equally for feature imputation

    • distance: The k-nearest neighbors are weighted based on their inverse distance to the imputed observation

  • copy (bool (default: False)) – Whether to return a modified copy (True) of the anndata object. If False (default) modifies the object inplace

  • **kwargs – Passed to sklearn.impute.KNNImputer

Return type:

AnnData

Returns:

None | anndata.AnnData AnnData object with imputed values in layer. If copy=False modifies the anndata object at layer inplace and returns None. If copy=True, returns a modified copy.

Raises:
  • Warning – If group_column contains NaNs

  • Warning – If a feature contains only NaNs

  • ValueError – If any group has less members than n_neighbors

Notes

Features that are fully missing will not be imputed. Appropriate filtering of features with at.pp.filter_data_completeness() is critical. Nearest neighbors imputation assumes that the data is missing at random. This means that it is not appropriate for values that are missing not at random, e.g. due to insufficient instrument sensitivity. In this case, kNN imputation will systematically overestimate the intensities of the features.

Example

Impute the values in the .X matrix

adata = at.pp.impute_knn(adata)
assert np.sum(np.isnan(adata.X)) == 0

Impute data in a specific layer

adata = at.pp.impute_knn(adata, layer="layer2")
assert np.sum(np.isnan(adata.layers["layer2"])) == 0

Impute group-wise based on a categorical column:

adata = at.pp.impute_knn(adata, group_column="cell_type")
# Group-wise imputation