alphapepttools.pp.impute_bpca

Contents

alphapepttools.pp.impute_bpca#

alphapepttools.pp.impute_bpca(adata, *, n_components=50, layer=None, group_column=None, copy=False, **kwargs)#

Impute missing values using Bayesian Principal Component Analysis (BPCA)

Estimates the latent covariance structure of the log-transformed data via Bayesian Principal Component Analysis. The imputation method uses the obtained principal component usages \(U\) and components \(L\) to reconstruct the full log-transformed data matrix

Where \(\overline{X}\) is the mean of the data. Missing values are replaced with the estimated values from this procedure.

Parameters:
  • adata (AnnData) – AnnData object.

  • n_components (int (default: 50)) – Number of components to use for the model fit. The more components are used, the more granular the model fits the data. This might increase model accuracy but also propagates more measurement noise in the data reconstruction.

  • layer (Optional[str] (default: None)) – Layer to use for imputation. The data should be log transformed to match the noise model of the BPCA method. If None, uses the adata.X attribute.

  • group_column (Optional[str] (default: None)) –

    Column name in adata.obs defining groups for group-wise imputation.
    • None (default, recommended), imputes all samples.

    • str Computes median separately for each group

    If group_column contains NaNs, the respective observations are ignored.

  • copy (bool (default: False)) – Whether to return a modified copy (True) of the anndata object. If False (default) modifies the object inplace

  • **kwargs – Passed to bpca.BPCA

Return type:

AnnData | None

Returns:

None | anndata.AnnData AnnData object with imputed values in layer. If copy=False modifies the anndata object at layer inplace and returns None. If copy=True, returns a modified copy.

Raises:
  • Warning – If group_column contains NaNs

  • Warning – If a feature contains only NaNs

Example

# Log transform data
at.pp.nanlog(adata)

# Imputes .X inplace
at.pp.impute_bpca(adata, n_components=50, layer=None)

# Returns a new anndata object with imputed .X layer
adata_new = at.pp.impute_bpca(adata, n_components=50, layer=None, copy=True)

References

This implementation follows the reference implementation in [SRS+07]

See also

bpca.BPCA