alphapepttools.pl.volcano#
- alphapepttools.pl.volcano(data, x_column='log2fc', y_column='-log10(p_value)', ax=None, layers=None, color_dict=None, x_thresholds=(-1, 1), y_thresholds=np.float64(1.3010299956639813), label_layers=None, display_id_column=None, max_labels=None, x_label_anchors=None, y_display_start=1, y_padding_factor=4, xlims=None, ylims=None, scatter_kwargs=None, line_kwargs=None, label_kwargs=None, legend=None, legend_kwargs=None, default_color=(0.8274509803921568, 0.8274509803921568, 0.8274509803921568, 1.0), default_group='data')#
Create a volcano plot for differential expression visualization
Volcano plots visualize differential expression results by plotting fold change (x-axis) against statistical significance (y-axis). This function creates layered scatter plots with threshold lines and optional point labeling.
- Parameters:
data (
AnnData|DataFrame) – Data containing expression values and statisticsx_column (
str(default:'log2fc')) – Column name for x-axis values (typically log fold change)y_column (
str(default:'-log10(p_value)')) – Column name for y-axis values (typically -log10 p-value)ax (
Axes|None(default:None)) – Axes to plot on. If None, creates new figurelayers (
list[tuple] |None(default:None)) –List of layer specifications for hierarchical plotting. Each tuple contains (column_name, value(s), color_key[, scatter_kwargs]). Points are plotted in reverse order (first layer on top). Example: [(“gene_type”, “housekeeping”, “hk_color”),
(“significance”, “significant”, “sig_color”, {“s”: 100})]
color_dict (
dict[str,str|tuple] |None(default:None)) – Maps color keys from layers to actual colors. Example: {“hk_color”: “blue”, “sig_color”: “red”}x_thresholds (
float|tuple(default:(-1, 1))) – X-axis values for vertical threshold lines. Default (-1, 1) for fold change cutoffsy_thresholds (
float|tuple(default:np.float64(1.3010299956639813))) – Y-axis values for horizontal threshold lines. Default (-log10(0.05),) for p-value cutofflabel_layers (
list[str] |None(default:None)) – Color keys of layers to label. Only points in these layers will have text labels addeddisplay_id_column (
str|None(default:None)) – Column containing labels to display. If None, uses data indexmax_labels (
int|None(default:None)) – Maximum number of labels to show. Labels are prioritized by y-valuex_label_anchors (
list[float] |None(default:None)) – X-positions to anchor labels to (for alignment). If None, labels appear at data point positionsy_display_start (
float(default:1)) – Starting y-position for stacked labels (1=top, 0=bottom). Default 1y_padding_factor (
float(default:4)) – Vertical spacing multiplier between stacked labels. Default 4xlims (
tuple[float,float] |None(default:None)) – X-axis limits. If None, calculated from data with paddingylims (
tuple[float,float] |None(default:None)) – Y-axis limits. If None, calculated from data with paddingscatter_kwargs (
dict|None(default:None)) – Additional arguments passed to scatter plot (e.g., {“s”: 50, “alpha”: 0.5})line_kwargs (
dict|None(default:None)) – Additional arguments for threshold lines (e.g., {“linewidth”: 2, “linestyle”: “–“})label_kwargs (
dict|None(default:None)) – Additional arguments for axis labelslegend (
str|Legend|None(default:None)) – Legend specification. If “auto”, creates legend from color_dictlegend_kwargs (
dict|None(default:None)) – Additional arguments for legenddefault_color (
str|tuple(default:(0.8274509803921568, 0.8274509803921568, 0.8274509803921568, 1.0))) – Color for points not matching any layer. Default greydefault_group (
str(default:'data')) – Name for the default layer containing unassigned points. Default “data”
- Return type:
- Returns:
None
See also
layered_plotCore layering functionality
add_linesAdd threshold lines
label_plotAdd text labels
Notes
The layering system ensures each point appears in exactly one layer. Points are assigned to the first matching layer in the list. Unassigned points go to the default layer (plotted in background).
Examples
Create a volcano plot with differential expression data:
import numpy as np import pandas as pd import alphapepttools as apt from alphapepttools.pl import BaseColors # Generate example differential expression data rng = np.random.default_rng(seed=42) testx = rng.normal(0, 1, 300) testy = -np.cos(testx) + rng.normal(0, 0.2, 300) testp = 10 ** -(testy - min(testy)) data = pd.DataFrame( { "id": [f"P{10000 + i}" for i in range(300)], "gene": [f"gene_{i}" for i in range(300)], "log2fc": testx, "pval": testp, "neg_log10pval": -np.log10(testp), } ) data.index = data["id"] # Add differential expression status data["diff_exp_status"] = data["log2fc"].apply( lambda x: "upregulated" if x > 1 else ("downregulated" if x < -1 else "unchanged") ) # Mark first 10 genes as proteins of interest data["label"] = "other" data.loc[data.index[:10], "label"] = "POI" # Define specific proteins to highlight pois = ["P10291", "P10292", "P10293", "P10294", "P10295"] # Define visualization layers (plotted in reverse order) plot_layers = [ ("id", pois, "POI_hypothesis"), # Specific hypothesis proteins on top ("label", "POI", "POI"), # General POI proteins ("diff_exp_status", "upregulated", "upregulated"), # Upregulated ("diff_exp_status", "downregulated", "downregulated"), # Downregulated ("diff_exp_status", "unchanged", "unchanged"), # Background ] # Define colors for each layer color_dict = { "upregulated": BaseColors.get("orange"), "downregulated": BaseColors.get("blue"), "unchanged": BaseColors.get("grey"), "POI": "black", "POI_hypothesis": BaseColors.get("purple", lighten=0.7), } # Specify which layers to label label_layers = ["POI", "POI_hypothesis"] # Create volcano plot apt.pl.volcano( data=data, x_column="log2fc", y_column="neg_log10pval", color_dict=color_dict, layers=plot_layers, label_layers=label_layers, x_label_anchors=[-3.5, 3.5], # Anchor labels to left/right y_padding_factor=1.7, # Vertical spacing between labels y_display_start=0.75, # Start labels at 75% from bottom xlims=(-6, 6), )
This creates a volcano plot where:
Background points (unchanged) appear in grey
Differentially expressed genes are colored orange (up) or blue (down)
Proteins of interest (POI) are highlighted in black
Specific hypothesis proteins are emphasized in purple on top
Only POI and hypothesis proteins receive text labels