alphapepttools.metrics.principal_component_regression#
- alphapepttools.metrics.principal_component_regression(adata, covariate, n_components=None, pca_key='X_pca', pca_key_uns='pca')#
Compute principal component regression (PCR) score.
Estimates how much of the variation in a given covariate is captured in PCA space, based on the correlation between the covariate and each principal component (PC). The final score is computed as a weighted sum of squared correlations between the covariate and the first
n_componentsPCs, with weights given by the variance explained by each PC:\[\mathrm{PCR} = \sum_{n=1}^{N} \left( \mathrm{PCC}(C, PC_n)^2 \cdot \mathrm{Var}(PC_n) \right)\]where \(\mathrm{PCC}(C, PC_n)\) is the Pearson correlation coefficient between the covariate \(C\) and the \(n\)-th principal component, and \(\mathrm{Var}(PC_n)\) is the proportion of variance explained by that component.
- Parameters:
adata (
AnnData) –ad.AnnDataobjectcovariate (
str) – Covariate of interest as column inadata.obs. For continuous covariates, the pearson correlation coefficient (PCC) is computed between covariate and principal component. Categorical covariates (dtype=category) are one hot encoded.n_components (
Optional[int] (default:None)) – Number of principal components to consider. IfNone, uses all available components.pca_key (
str(default:'X_pca')) – Key inadata.obsmthat stores PCA embeddings.pca_key_uns (
str(default:'pca')) – Key inadata.unsthat stores information on the PCA.
- Return type:
- Returns:
float Principal component regression score: an estimate of how much variance in the covariate is explained by the principal components.
- Raises:
Example
import alphapepttools as at at.pp.pca(adata) at.metrics.principal_component_regression(adata, covariate="batch") # With custom PCA keys at.pp.pca(adata, layer="layer_batch_corrected", key_added="pca_batch_corrected") at.metrics.principal_component_regression( adata, covariate="batch", pca_key="pca_batch_corrected", pca_uns_key="pca_batch_corrected" )
Notes
As originally discussed in Büttner et al. (2019), principal component regression assumes a linear relationship between the covariate and the principal components. This assumption may not hold in all cases. Furthermore, because this method captures both true and spurious correlations, it can potentially overestimate the contribution of the covariate to variation in PCA space.
References
Luecken, M.D., Büttner, M., Chaichoompu, K. et al. Benchmarking atlas-level data integration in single-cell genomics. Nat Methods 19, 41-50 (2022). https://doi.org/10.1038/s41592-021-01336-8
Büttner, M., Miao, Z., Wolf, F.A. et al. A test metric for assessing single-cell RNA-seq batch correction. Nat Methods 16, 43-49 (2019). https://doi.org/10.1038/s41592-018-0254-1