alphapepttools.tl.prepare_pca_2d_loadings_data_to_plot#
- alphapepttools.tl.prepare_pca_2d_loadings_data_to_plot(data, loadings_name, pc_x, pc_y, nfeatures, dim_space)#
Prepare a DataFrame with PCA feature loadings for the 2D plotting.
This function extracts the loadings of two specified principal components (PCs) from an AnnData object, filters features that contributed to the PCA (non-zero loadings), and flags the top nfeatures for each selected PC dimension.
- Parameters:
data (
AnnData) – The AnnData object containing PCA results.loadings_name (
str) – The key where PCA loadings are stored.pc_x (
int) – The first principal component index (1-based) to extract loadings for.pc_y (
int) – The second principal component index (1-based) to extract loadings for.nfeatures (
int) – Number of top features per PC to highlight based on absolute loadings.dim_space (
str) – The dimension space used in PCA. Can be either “obs” or “var”.
- Return type:
DataFrame- Returns:
pd.DataFrame DataFrame containing loadings for the selected PCs, feature names, boolean columns indicating if a feature was used in PCA and whether it is among the top features in either dimension.
Examples
Prepare 2D loadings data for biplot visualization:
import anndata as ad import pandas as pd import numpy as np import alphapepttools as at # Create a 5x5 dataset where 4 proteins are core (no missing values) X = np.array( [ [10.5, 12.3, 11.8, 9.2, np.nan], # Sample 1 [11.2, 13.1, 12.5, 10.1, 7.5], # Sample 2 [9.8, 11.9, 10.2, 8.9, np.nan], # Sample 3 [12.1, 14.2, 13.3, 11.3, 8.2], # Sample 4 [10.9, 12.7, 11.5, 9.8, np.nan], # Sample 5 ] ) adata = ad.AnnData( X=X, obs=pd.DataFrame({"sample": ["S1", "S2", "S3", "S4", "S5"]}), var=pd.DataFrame({"protein": ["P1", "P2", "P3", "P4", "P5"], "is_core": [True, True, True, True, False]}), ) # Run PCA on observation space at.tl.pca(adata, meta_data_mask_column_name="is_core", n_comps=2, dim_space="obs") # Get loadings for PC1 vs PC2 with top 2 features highlighted loadings_2d = at.tl.prepare_pca_2d_loadings_data_to_plot( adata, loadings_name="PCs_obs", # Default loadings key pc_x=1, # PC1 pc_y=2, # PC2 nfeatures=2, # Top 2 features per PC dim_space="obs", ) display(loadings_2d) # DataFrame contains: # - feature: Protein names (only P1-P4, P5 excluded as not core) # - dim1_loadings: Loading values for PC1 # - dim2_loadings: Loading values for PC2 # - abs_dim1, abs_dim2: Absolute loading values # - is_top: Boolean flag for top features in either dimension