alphapepttools.pp.impute_knn#
- alphapepttools.pp.impute_knn(adata, group_column=None, layer=None, *, n_neighbors=2, weights='distance', copy=False, **kwargs)#
Impute missing values using median imputation
Replace missing (NaN) values for each feature in the data matrix with the estimate based on non-missing values in the k nearest observations. Can perform global imputation using all samples or group-wise imputation using subsets of samples defined by a categorical variable.
- Parameters:
adata (
AnnData) – AnnData objectlayer (
Optional[str] (default:None)) – Layer to use for imputationgroup_column (
Optional[str] (default:None)) –- Column name in
adata.obsdefining groups for group-wise imputation. None(default), imputes all samples.strComputes median separately for each group
If
group_columncontains NaNs, the respective observations are ignored.- Column name in
n_neighbors (
int(default:2)) – Number of neighbors to consider during imputationweights (
Literal['distance','uniform'] (default:'distance')) –- Weighting strategy for kNN imputation.
uniform: All k-nearest neighbors are weighted equally for feature imputation
distance: The k-nearest neighbors are weighted based on their inverse distance to the imputed observation
copy (
bool(default:False)) – Whether to return a modified copy (True) of the anndata object. If False (default) modifies the object inplace**kwargs – Passed to
sklearn.impute.KNNImputer
- Return type:
- Returns:
None | anndata.AnnData AnnData object with imputed values in layer. If
copy=Falsemodifies the anndata object at layer inplace and returns None. Ifcopy=True, returns a modified copy.- Raises:
Warning – If
group_columncontains NaNsWarning – If a feature contains only NaNs
ValueError – If any group has less members than
n_neighbors
Notes
Features that are fully missing will not be imputed. Appropriate filtering of features with
at.pp.filter_data_completeness()is critical. Nearest neighbors imputation assumes that the data is missing at random. This means that it is not appropriate for values that are missing not at random, e.g. due to insufficient instrument sensitivity. In this case, kNN imputation will systematically overestimate the intensities of the features.Example
Impute the values in the
.Xmatrixadata = at.pp.impute_knn(adata) assert np.sum(np.isnan(adata.X)) == 0
Impute data in a specific layer
adata = at.pp.impute_knn(adata, layer="layer2") assert np.sum(np.isnan(adata.layers["layer2"])) == 0
Impute group-wise based on a categorical column:
adata = at.pp.impute_knn(adata, group_column="cell_type") # Group-wise imputation