alphapepttools.pl.volcano

Contents

alphapepttools.pl.volcano#

alphapepttools.pl.volcano(data, x_column='log2fc', y_column='-log10(p_value)', ax=None, layers=None, color_dict=None, x_thresholds=(-1, 1), y_thresholds=np.float64(1.3010299956639813), label_layers=None, display_id_column=None, max_labels=None, x_label_anchors=None, y_display_start=1, y_padding_factor=4, xlims=None, ylims=None, scatter_kwargs=None, line_kwargs=None, label_kwargs=None, legend=None, legend_kwargs=None, default_color=(0.8274509803921568, 0.8274509803921568, 0.8274509803921568, 1.0), default_group='data')#

Create a volcano plot for differential expression visualization

Volcano plots visualize differential expression results by plotting fold change (x-axis) against statistical significance (y-axis). This function creates layered scatter plots with threshold lines and optional point labeling.

Parameters:
  • data (AnnData | DataFrame) – Data containing expression values and statistics

  • x_column (str (default: 'log2fc')) – Column name for x-axis values (typically log fold change)

  • y_column (str (default: '-log10(p_value)')) – Column name for y-axis values (typically -log10 p-value)

  • ax (Axes | None (default: None)) – Axes to plot on. If None, creates new figure

  • layers (list[tuple] | None (default: None)) –

    List of layer specifications for hierarchical plotting. Each tuple contains (column_name, value(s), color_key[, scatter_kwargs]). Points are plotted in reverse order (first layer on top). Example: [(“gene_type”, “housekeeping”, “hk_color”),

    (“significance”, “significant”, “sig_color”, {“s”: 100})]

  • color_dict (dict[str, str | tuple] | None (default: None)) – Maps color keys from layers to actual colors. Example: {“hk_color”: “blue”, “sig_color”: “red”}

  • x_thresholds (float | tuple (default: (-1, 1))) – X-axis values for vertical threshold lines. Default (-1, 1) for fold change cutoffs

  • y_thresholds (float | tuple (default: np.float64(1.3010299956639813))) – Y-axis values for horizontal threshold lines. Default (-log10(0.05),) for p-value cutoff

  • label_layers (list[str] | None (default: None)) – Color keys of layers to label. Only points in these layers will have text labels added

  • display_id_column (str | None (default: None)) – Column containing labels to display. If None, uses data index

  • max_labels (int | None (default: None)) – Maximum number of labels to show. Labels are prioritized by y-value

  • x_label_anchors (list[float] | None (default: None)) – X-positions to anchor labels to (for alignment). If None, labels appear at data point positions

  • y_display_start (float (default: 1)) – Starting y-position for stacked labels (1=top, 0=bottom). Default 1

  • y_padding_factor (float (default: 4)) – Vertical spacing multiplier between stacked labels. Default 4

  • xlims (tuple[float, float] | None (default: None)) – X-axis limits. If None, calculated from data with padding

  • ylims (tuple[float, float] | None (default: None)) – Y-axis limits. If None, calculated from data with padding

  • scatter_kwargs (dict | None (default: None)) – Additional arguments passed to scatter plot (e.g., {“s”: 50, “alpha”: 0.5})

  • line_kwargs (dict | None (default: None)) – Additional arguments for threshold lines (e.g., {“linewidth”: 2, “linestyle”: “–“})

  • label_kwargs (dict | None (default: None)) – Additional arguments for axis labels

  • legend (str | Legend | None (default: None)) – Legend specification. If “auto”, creates legend from color_dict

  • legend_kwargs (dict | None (default: None)) – Additional arguments for legend

  • default_color (str | tuple (default: (0.8274509803921568, 0.8274509803921568, 0.8274509803921568, 1.0))) – Color for points not matching any layer. Default grey

  • default_group (str (default: 'data')) – Name for the default layer containing unassigned points. Default “data”

Return type:

None

Returns:

None

See also

layered_plot

Core layering functionality

add_lines

Add threshold lines

label_plot

Add text labels

Notes

The layering system ensures each point appears in exactly one layer. Points are assigned to the first matching layer in the list. Unassigned points go to the default layer (plotted in background).

Examples

Create a volcano plot with differential expression data:

import numpy as np
import pandas as pd
import alphapepttools as apt
from alphapepttools.pl import BaseColors

# Generate example differential expression data
rng = np.random.default_rng(seed=42)
testx = rng.normal(0, 1, 300)
testy = -np.cos(testx) + rng.normal(0, 0.2, 300)
testp = 10 ** -(testy - min(testy))

data = pd.DataFrame(
    {
        "id": [f"P{10000 + i}" for i in range(300)],
        "gene": [f"gene_{i}" for i in range(300)],
        "log2fc": testx,
        "pval": testp,
        "neg_log10pval": -np.log10(testp),
    }
)
data.index = data["id"]

# Add differential expression status
data["diff_exp_status"] = data["log2fc"].apply(
    lambda x: "upregulated" if x > 1 else ("downregulated" if x < -1 else "unchanged")
)

# Mark first 10 genes as proteins of interest
data["label"] = "other"
data.loc[data.index[:10], "label"] = "POI"

# Define specific proteins to highlight
pois = ["P10291", "P10292", "P10293", "P10294", "P10295"]

# Define visualization layers (plotted in reverse order)
plot_layers = [
    ("id", pois, "POI_hypothesis"),  # Specific hypothesis proteins on top
    ("label", "POI", "POI"),  # General POI proteins
    ("diff_exp_status", "upregulated", "upregulated"),  # Upregulated
    ("diff_exp_status", "downregulated", "downregulated"),  # Downregulated
    ("diff_exp_status", "unchanged", "unchanged"),  # Background
]

# Define colors for each layer
color_dict = {
    "upregulated": BaseColors.get("orange"),
    "downregulated": BaseColors.get("blue"),
    "unchanged": BaseColors.get("grey"),
    "POI": "black",
    "POI_hypothesis": BaseColors.get("purple", lighten=0.7),
}

# Specify which layers to label
label_layers = ["POI", "POI_hypothesis"]

# Create volcano plot
apt.pl.volcano(
    data=data,
    x_column="log2fc",
    y_column="neg_log10pval",
    color_dict=color_dict,
    layers=plot_layers,
    label_layers=label_layers,
    x_label_anchors=[-3.5, 3.5],  # Anchor labels to left/right
    y_padding_factor=1.7,  # Vertical spacing between labels
    y_display_start=0.75,  # Start labels at 75% from bottom
    xlims=(-6, 6),
)

This creates a volcano plot where:

  • Background points (unchanged) appear in grey

  • Differentially expressed genes are colored orange (up) or blue (down)

  • Proteins of interest (POI) are highlighted in black

  • Specific hypothesis proteins are emphasized in purple on top

  • Only POI and hypothesis proteins receive text labels