alphapepttools.tl.pca

Contents

alphapepttools.tl.pca#

alphapepttools.tl.pca(adata, layer=None, dim_space='obs', embeddings_name=None, n_comps=None, meta_data_mask_column_name=None, **pca_kwargs)#

Principal component analysis [].

Computes PCA coordinates, loadings and variance decomposition. The passed adata will be changed as a result to include the pca calculations. depending on the dim_space parameter, the PCA result is dimensional reduction projection of samples (obs) or of features (var). After PCA, the updated adata object will include adata.obsm layer for the PCA coordinates,`adata.varm` layer (for PCA feature loadings), and adata.uns layer (for PCA variance decomposition) for PCA done on the feature space. For PCA done on the sample space, the PCA coordinates will be stored in adata.varm, the PCA loadings in adata.obsm, and the variance decomposition in adata.uns. Uses the implementation of Scanpy, which in turn uses implementation of scikit-learn [].

Parameters:
  • adata (ad.AnnData) – The (annotated) data matrix of shape n_obs X n_vars. Rows correspond to cells and columns to genes.

  • layer (str, optional (default: None)) – If provided, which element of layers to use for PCA. If None, the .X attribute of adata is used.

  • dim_space (str, optional (default: "obs")) – The dimension to project PCA on. Can be either “obs” (default) for sample projection or “var” for feature projection.

  • embeddings_name (str, optional (default: None)) – If provided, this will be used as the key under which to store the PCA results in adata.obsm, adata.varm, and adata.uns (see Returns). If None, the default keys will be used: - For dim_space='obs': X_pca_obs for PC coordinates, PCs_obs for the feature loadings, variance_pca_obs for the variance. - For dim_space='var': X_pca_var for PC corrdinates, PCs_var for the sample loadings, variance_pca_var for the variance. If provided, the keys will be embeddings_name for all three data frames.

  • n_comps (int, optional (default: 50)) – Number of principal components to compute. Defaults to 50, or 1 - minimum dimension size of selected representation.

  • meta_data_mask_column_name (str, optional (default: None)) – If provided, the colname in adata.var to use as a mask for the features to be used in PCA. This is useful for running PCA with the core proteome as “mask_var” to remove nan values. Must be of boolean dtype. If None, all features are used (data should not include NaNs!).

  • **pca_kwargs (dict, optional) – Additional keyword arguments for the scanpy.pp.pca() By default None.

Return type:

AnnData | ndarray

Returns:

(as output from the scanpy.pp.pca function) unless changed in the kwargs passed on to scanpy, an updated AnnData object. Sets the following fields: for dim_space='obs' (sample projection): .obsm['X_pca_obs' | embeddings_name] : csr_matrix | csc_matrix | ndarray (shape (adata.n_obs, n_comps))

PCA representation of data.

.varm['PCs_obs' | embeddings_name]ndarray (shape (adata.n_vars, n_comps))

The principal components containing the loadings.

.uns['variance_pca_obs' | embeddings_name]['variance_ratio']ndarray (shape (n_comps,))

Ratio of explained variance.

.uns['variance_pca_obs' | embeddings_name]['variance']ndarray (shape (n_comps,))

Explained variance, equivalent to the eigenvalues of the covariance matrix.

for dim_space='var' (sample projection): .varm['X_pca_var' | embeddings_name] : csr_matrix | csc_matrix | ndarray (shape (adata.n_obs, n_comps))

PCA representation of data.

.obsm['PCs_var' | embeddings_name]ndarray (shape (adata.n_vars, n_comps))

The principal components containing the loadings.

.uns['variance_pca_var' | embeddings_name]['variance_ratio']ndarray (shape (n_comps,))

Ratio of explained variance.

.uns['variance_pca_var' | embeddings_name]['variance']ndarray (shape (n_comps,))

Explained variance, equivalent to the eigenvalues of the covariance matrix.