alphapepttools.tl.pca#
- alphapepttools.tl.pca(adata, layer=None, dim_space='obs', embeddings_name=None, n_comps=None, meta_data_mask_column_name=None, **pca_kwargs)#
Principal component analysis [].
Computes PCA coordinates, loadings and variance decomposition. The passed adata will be changed as a result to include the pca calculations. depending on the
dim_spaceparameter, the PCA result is dimensional reduction projection of samples (obs) or of features (var). After PCA, the updated adata object will includeadata.obsmlayer for the PCA coordinates,`adata.varm` layer (for PCA feature loadings), andadata.unslayer (for PCA variance decomposition) for PCA done on the feature space. For PCA done on the sample space, the PCA coordinates will be stored inadata.varm, the PCA loadings inadata.obsm, and the variance decomposition inadata.uns. Uses the implementation of Scanpy, which in turn uses implementation of scikit-learn [].- Parameters:
adata (ad.AnnData) – The (annotated) data matrix of shape
n_obsXn_vars. Rows correspond to cells and columns to genes.layer (str, optional (default: None)) – If provided, which element of layers to use for PCA. If None, the
.Xattribute ofadatais used.dim_space (str, optional (default: "obs")) – The dimension to project PCA on. Can be either “obs” (default) for sample projection or “var” for feature projection.
embeddings_name (str, optional (default: None)) – If provided, this will be used as the key under which to store the PCA results in
adata.obsm,adata.varm, andadata.uns(see Returns). If None, the default keys will be used: - Fordim_space='obs':X_pca_obsfor PC coordinates,PCs_obsfor the feature loadings,variance_pca_obsfor the variance. - Fordim_space='var':X_pca_varfor PC corrdinates,PCs_varfor the sample loadings,variance_pca_varfor the variance. If provided, the keys will beembeddings_namefor all three data frames.n_comps (int, optional (default: 50)) – Number of principal components to compute. Defaults to 50, or 1 - minimum dimension size of selected representation.
meta_data_mask_column_name (str, optional (default: None)) – If provided, the colname in
adata.varto use as a mask for the features to be used in PCA. This is useful for running PCA with the core proteome as “mask_var” to remove nan values. Must be of boolean dtype. If None, all features are used (data should not include NaNs!).**pca_kwargs (dict, optional) – Additional keyword arguments for the
scanpy.pp.pca()By default None.
- Return type:
- Returns:
(as output from the scanpy.pp.pca function) unless changed in the kwargs passed on to scanpy, an updated
AnnDataobject. Sets the following fields: fordim_space='obs'(sample projection):.obsm['X_pca_obs' | embeddings_name]:csr_matrix|csc_matrix|ndarray(shape(adata.n_obs, n_comps))PCA representation of data.
.varm['PCs_obs' | embeddings_name]ndarray(shape(adata.n_vars, n_comps))The principal components containing the loadings.
.uns['variance_pca_obs' | embeddings_name]['variance_ratio']ndarray(shape(n_comps,))Ratio of explained variance.
.uns['variance_pca_obs' | embeddings_name]['variance']ndarray(shape(n_comps,))Explained variance, equivalent to the eigenvalues of the covariance matrix.
for
dim_space='var'(sample projection):.varm['X_pca_var' | embeddings_name]:csr_matrix|csc_matrix|ndarray(shape(adata.n_obs, n_comps))PCA representation of data.
.obsm['PCs_var' | embeddings_name]ndarray(shape(adata.n_vars, n_comps))The principal components containing the loadings.
.uns['variance_pca_var' | embeddings_name]['variance_ratio']ndarray(shape(n_comps,))Ratio of explained variance.
.uns['variance_pca_var' | embeddings_name]['variance']ndarray(shape(n_comps,))Explained variance, equivalent to the eigenvalues of the covariance matrix.