alphapepttools.tl.prepare_pca_1d_loadings_data_to_plot#
- alphapepttools.tl.prepare_pca_1d_loadings_data_to_plot(data, dim_space, dim, nfeatures, embeddings_name=None)#
Prepare the gene loadings (1d) of a PC for plotting.
- Parameters:
data (
AnnData|DataFrame) – AnnData to plot.dim_space (
str) – The dimension space used in PCA. Can be either “obs” (default) for sample projection or “var” for feature projection.dim (
int) – The PC number from which to get loadings (1-indexed, i.e. the first PC is 1, not 0).nfeatures (
int) – The number of top absolute loadings features to plot.embeddings_name (
str|None(default:None)) – The custom embeddings name used in PCA. If None, uses default naming convention.
- Return type:
DataFrame- Returns:
pd.DataFrame DataFrame containing the top nfeatures loadings for the specified PC dimension.
Examples
Get top contributing features for a principal component:
import anndata as ad import pandas as pd import numpy as np import alphapepttools as at # Create a 5x5 dataset where 4 proteins are core (no missing values) X = np.array( [ [10.5, 12.3, 11.8, 9.2, np.nan], # Sample 1 [11.2, 13.1, 12.5, 10.1, 7.5], # Sample 2 [9.8, 11.9, 10.2, 8.9, np.nan], # Sample 3 [12.1, 14.2, 13.3, 11.3, 8.2], # Sample 4 [10.9, 12.7, 11.5, 9.8, np.nan], # Sample 5 ] ) adata = ad.AnnData( X=X, obs=pd.DataFrame({"sample": ["S1", "S2", "S3", "S4", "S5"]}), var=pd.DataFrame({"protein": ["P1", "P2", "P3", "P4", "P5"], "is_core": [True, True, True, True, False]}), ) # Run PCA on observation space at.tl.pca(adata, meta_data_mask_column_name="is_core", n_comps=2, dim_space="obs") # Get top 3 protein loadings for PC1 loadings_df = at.tl.prepare_pca_1d_loadings_data_to_plot( adata, dim_space="obs", # Since PCA was on obs, loadings are in varm dim=1, # PC1 nfeatures=3, # Top 3 proteins ) display(loadings_df) # DataFrame contains: # - feature: Protein names (P1, P2, P3, P4) # - dim_loadings: Loading values for PC1 # - abs_loadings: Absolute loading values # - index_int: Ranking index for plotting