alphapepttools.metrics.principal_component_regression#

alphapepttools.metrics.principal_component_regression(adata, covariate, n_components=None, pca_key='X_pca', pca_key_uns='pca')#

Compute principal component regression (PCR) score.

Estimates how much of the variation in a given covariate is captured in PCA space, based on the correlation between the covariate and each principal component (PC). The final score is computed as a weighted sum of squared correlations between the covariate and the first n_components PCs, with weights given by the variance explained by each PC:

\[\mathrm{PCR} = \sum_{n=1}^{N} \left( \mathrm{PCC}(C, PC_n)^2 \cdot \mathrm{Var}(PC_n) \right)\]

where \(\mathrm{PCC}(C, PC_n)\) is the Pearson correlation coefficient between the covariate \(C\) and the \(n\)-th principal component, and \(\mathrm{Var}(PC_n)\) is the proportion of variance explained by that component.

Parameters:

adata (AnnData) – ad.AnnData object
covariate (str) – Covariate of interest as column in adata.obs. For continuous covariates, the pearson correlation coefficient (PCC) is computed between covariate and principal component. Categorical covariates (dtype=category) are one hot encoded.
n_components (int | None (default: None)) – Number of principal components to consider. If None, uses all available components.
pca_key (str (default: 'X_pca')) – Key in adata.obsm that stores PCA embeddings.
pca_key_uns (str (default: 'pca')) – Key in adata.uns that stores information on the PCA.

Return type:

float

Returns:

float Principal component regression score: an estimate of how much variance in the covariate is explained by the principal components.

Raises:

KeyError – For missing keys
TypeError – If covariate dtype is not numeric or categorical

Example

import alphapepttools as at

at.pp.pca(adata)
at.metrics.principal_component_regression(adata, covariate="batch")

# With custom PCA keys
at.pp.pca(adata, layer="layer_batch_corrected", key_added="pca_batch_corrected")
at.metrics.principal_component_regression(
    adata, covariate="batch", pca_key="pca_batch_corrected", pca_uns_key="pca_batch_corrected"
)

Notes

As originally discussed in Büttner et al. (2019), principal component regression assumes a linear relationship between the covariate and the principal components. This assumption may not hold in all cases. Furthermore, because this method captures both true and spurious correlations, it can potentially overestimate the contribution of the covariate to variation in PCA space.

References

Luecken, M.D., Büttner, M., Chaichoompu, K. et al. Benchmarking atlas-level data integration in single-cell genomics. Nat Methods 19, 41-50 (2022). https://doi.org/10.1038/s41592-021-01336-8
Büttner, M., Miao, Z., Wolf, F.A. et al. A test metric for assessing single-cell RNA-seq batch correction. Nat Methods 16, 43-49 (2019). https://doi.org/10.1038/s41592-018-0254-1

alphapepttools.metrics.principal_component_regression

Contents

alphapepttools.metrics.principal_component_regression#