alphapepttools.tl.prepare_scree_data_to_plot

alphapepttools.tl.prepare_scree_data_to_plot#

alphapepttools.tl.prepare_scree_data_to_plot(adata, n_pcs, dim_space, embeddings_name=None)#

Prepare scree plot data from AnnData object.

Parameters:
  • adata (AnnData) – AnnData object containing PCA results.

  • n_pcs (int) – Number of principal components to include.

  • dim_space (str) – The dimension space used in PCA. Can be either “obs” or “var”.

  • embeddings_name (str | None (default: None)) – Custom embeddings name or None for default.

Return type:

DataFrame

Returns:

pd.DataFrame DataFrame with PC numbers and explained variance values.

Examples

Prepare data for a scree plot after running PCA:

import anndata as ad
import pandas as pd
import numpy as np
import alphapepttools as at

# Create a 5x5 dataset where 4 proteins are core (no missing values)
X = np.array(
    [
        [10.5, 12.3, 11.8, 9.2, np.nan],  # Sample 1
        [11.2, 13.1, 12.5, 10.1, 7.5],  # Sample 2
        [9.8, 11.9, 10.2, 8.9, np.nan],  # Sample 3
        [12.1, 14.2, 13.3, 11.3, 8.2],  # Sample 4
        [10.9, 12.7, 11.5, 9.8, np.nan],  # Sample 5
    ]
)

adata = ad.AnnData(
    X=X,
    obs=pd.DataFrame({"sample": ["S1", "S2", "S3", "S4", "S5"]}),
    var=pd.DataFrame({"protein": ["P1", "P2", "P3", "P4", "P5"], "is_core": [True, True, True, True, False]}),
)

# Run PCA on observation space (samples)
at.tl.pca(adata, meta_data_mask_column_name="is_core", n_comps=2, dim_space="obs")

# Prepare scree plot data
scree_data = at.tl.prepare_scree_data_to_plot(adata, n_pcs=2, dim_space="obs")
display(scree_data)

# DataFrame contains:
# - PC: Principal component number (1, 2)
# - explained_variance: Proportion of variance explained (0-1)
# - explained_variance_percent: Variance explained as percentage (0-100)