alphapepttools.tl.prepare_pca_1d_loadings_data_to_plot

alphapepttools.tl.prepare_pca_1d_loadings_data_to_plot#

alphapepttools.tl.prepare_pca_1d_loadings_data_to_plot(data, dim_space, dim, nfeatures, embeddings_name=None)#

Prepare the gene loadings (1d) of a PC for plotting.

Parameters:
  • data (AnnData | DataFrame) – AnnData to plot.

  • dim_space (str) – The dimension space used in PCA. Can be either “obs” (default) for sample projection or “var” for feature projection.

  • dim (int) – The PC number from which to get loadings (1-indexed, i.e. the first PC is 1, not 0).

  • nfeatures (int) – The number of top absolute loadings features to plot.

  • embeddings_name (str | None (default: None)) – The custom embeddings name used in PCA. If None, uses default naming convention.

Return type:

DataFrame

Returns:

pd.DataFrame DataFrame containing the top nfeatures loadings for the specified PC dimension.

Examples

Get top contributing features for a principal component:

import anndata as ad
import pandas as pd
import numpy as np
import alphapepttools as at

# Create a 5x5 dataset where 4 proteins are core (no missing values)
X = np.array(
    [
        [10.5, 12.3, 11.8, 9.2, np.nan],  # Sample 1
        [11.2, 13.1, 12.5, 10.1, 7.5],  # Sample 2
        [9.8, 11.9, 10.2, 8.9, np.nan],  # Sample 3
        [12.1, 14.2, 13.3, 11.3, 8.2],  # Sample 4
        [10.9, 12.7, 11.5, 9.8, np.nan],  # Sample 5
    ]
)

adata = ad.AnnData(
    X=X,
    obs=pd.DataFrame({"sample": ["S1", "S2", "S3", "S4", "S5"]}),
    var=pd.DataFrame({"protein": ["P1", "P2", "P3", "P4", "P5"], "is_core": [True, True, True, True, False]}),
)

# Run PCA on observation space
at.tl.pca(adata, meta_data_mask_column_name="is_core", n_comps=2, dim_space="obs")

# Get top 3 protein loadings for PC1
loadings_df = at.tl.prepare_pca_1d_loadings_data_to_plot(
    adata,
    dim_space="obs",  # Since PCA was on obs, loadings are in varm
    dim=1,  # PC1
    nfeatures=3,  # Top 3 proteins
)
display(loadings_df)

# DataFrame contains:
# - feature: Protein names (P1, P2, P3, P4)
# - dim_loadings: Loading values for PC1
# - abs_loadings: Absolute loading values
# - index_int: Ranking index for plotting