alphapepttools.tl.prepare_pca_2d_loadings_data_to_plot

alphapepttools.tl.prepare_pca_2d_loadings_data_to_plot#

alphapepttools.tl.prepare_pca_2d_loadings_data_to_plot(data, loadings_name, pc_x, pc_y, nfeatures, dim_space)#

Prepare a DataFrame with PCA feature loadings for the 2D plotting.

This function extracts the loadings of two specified principal components (PCs) from an AnnData object, filters features that contributed to the PCA (non-zero loadings), and flags the top nfeatures for each selected PC dimension.

Parameters:
  • data (AnnData) – The AnnData object containing PCA results.

  • loadings_name (str) – The key where PCA loadings are stored.

  • pc_x (int) – The first principal component index (1-based) to extract loadings for.

  • pc_y (int) – The second principal component index (1-based) to extract loadings for.

  • nfeatures (int) – Number of top features per PC to highlight based on absolute loadings.

  • dim_space (str) – The dimension space used in PCA. Can be either “obs” or “var”.

Return type:

DataFrame

Returns:

pd.DataFrame DataFrame containing loadings for the selected PCs, feature names, boolean columns indicating if a feature was used in PCA and whether it is among the top features in either dimension.

Examples

Prepare 2D loadings data for biplot visualization:

import anndata as ad
import pandas as pd
import numpy as np
import alphapepttools as at

# Create a 5x5 dataset where 4 proteins are core (no missing values)
X = np.array(
    [
        [10.5, 12.3, 11.8, 9.2, np.nan],  # Sample 1
        [11.2, 13.1, 12.5, 10.1, 7.5],  # Sample 2
        [9.8, 11.9, 10.2, 8.9, np.nan],  # Sample 3
        [12.1, 14.2, 13.3, 11.3, 8.2],  # Sample 4
        [10.9, 12.7, 11.5, 9.8, np.nan],  # Sample 5
    ]
)

adata = ad.AnnData(
    X=X,
    obs=pd.DataFrame({"sample": ["S1", "S2", "S3", "S4", "S5"]}),
    var=pd.DataFrame({"protein": ["P1", "P2", "P3", "P4", "P5"], "is_core": [True, True, True, True, False]}),
)

# Run PCA on observation space
at.tl.pca(adata, meta_data_mask_column_name="is_core", n_comps=2, dim_space="obs")

# Get loadings for PC1 vs PC2 with top 2 features highlighted
loadings_2d = at.tl.prepare_pca_2d_loadings_data_to_plot(
    adata,
    loadings_name="PCs_obs",  # Default loadings key
    pc_x=1,  # PC1
    pc_y=2,  # PC2
    nfeatures=2,  # Top 2 features per PC
    dim_space="obs",
)
display(loadings_2d)

# DataFrame contains:
# - feature: Protein names (only P1-P4, P5 excluded as not core)
# - dim1_loadings: Loading values for PC1
# - dim2_loadings: Loading values for PC2
# - abs_dim1, abs_dim2: Absolute loading values
# - is_top: Boolean flag for top features in either dimension