alphapepttools.pl.Plots#
- class alphapepttools.pl.Plots(config={'axes': {'label_size': 10, 'tick_size': 10, 'title_size': 10}, 'font_sizes': {'large': 12, 'medium': 10, 'small': 8}, 'highlight_colors': {'general': '#5ec962', 'high': '#9ecae1', 'low': '#fdae6b'}, 'legend': {'font_size': 10, 'title_size': 10}, 'linewidths': {'large': 1.25, 'medium': 0.5, 'small': 0.25}, 'marker_sizes': {'large': 15, 'medium': 10, 'small': 5}, 'na_color': (0.8274509803921568, 0.8274509803921568, 0.8274509803921568, 1.0), 'na_identifiers': ['nan'], 'preset_sizes': {'0.25': 22.5, '0.5': 45, '1': 89, '1.5': 135, '2': 183}, 'resolution': {'dpi': 300}})#
Class for creating figures with matplotlib
Configuration for matplotlib plots is loaded from the defaults module as a dictionary and used to generate consistent plots.
Overview#
The Plots class provides alphapepttools styled visualization methods for proteomics and other biological data. All methods accept either pandas DataFrames or AnnData objects and use column names to specify data to plot.
Available Plot Types#
- Distribution plots:
histogram(): Histograms with optional color groupingviolinplot(): Violin plots showing distribution densityboxplot(): Box plots showing quartiles and outliersbarplot(): Bar plots with error bars (mean ± std)
- Relationship plots:
scatter(): Scatter plots with flexible coloring optionsrank_median_plot(): Ranked median intensity plots
Convenience wrapper plots: These plots summarize common visualization tasks in proteomics for ease of use.
plot_pca(): PCA scatter plots with optional labelingscree_plot(): Eigenvalue/variance explained plotsplot_pca_loadings(): 1D loading plots for a single PCplot_pca_loadings_2d(): 2D loading plots for two PCs
Common Parameters#
Most plotting methods share these common parameters:
- data
Input data object
- ax
Axes to plot on (created in alphapepttools style if not provided)
Notes
All methods are class methods and can be called directly without instantiation
Color handling is flexible: direct colors, categorical mapping, or continuous gradients
Plots automatically handle both DataFrame and AnnData inputs
Configuration is loaded as a dictionary from defaults.plot_settings via ‘defaults.plot_settings.to_dict()’
See also
add_legend_to_axes()Add legends to plots
label_plot()Add labels to scatter plots
add_lines()Add reference lines to plots
Methods table#
|
Plot a bar chart from a DataFrame or AnnData object |
|
Plot a box plot from a DataFrame or AnnData object |
|
Plot a histogram from a DataFrame or AnnData object |
|
PCA scatter plot showing principal component projections. |
|
1D loadings plot showing top features contributing to a principal component. |
|
2D loadings plot showing top features contributing to two principal components. |
|
Rank plot showing median intensities across samples. |
|
Plot a scatterplot from a DataFrame or AnnData object |
|
Scree plot showing explained variance for each principal component. |
|
Plot a violin plot from a DataFrame or AnnData object |
Methods#
- classmethod Plots.barplot(ax, data, grouping_column=None, value_column=None, direct_columns=None, color=(np.float64(0.21299500192233756), np.float64(0.5114186851211072), np.float64(0.730795847750865), np.float64(1.0)), color_dict=None)#
Plot a bar chart from a DataFrame or AnnData object
Creates a bar plot showing means with error bars (standard deviation) for grouped data. Each bar represents the mean of values within a group, with error bars showing the standard deviation. Bars have semi-transparent fill with opaque black outlines.
Two modes of operation: 1. Grouping mode: Use grouping_column/value_column to group data by categories 2. Direct mode: Use direct_columns to compare multiple columns directly
- Parameters:
ax (
Axes) – Matplotlib axes object to plot on.data (
AnnData|DataFrame) – Data containing grouping and value columns or direct columns to plot.grouping_column (
list[str] |None(default:None)) – Column containing the groups to compare (categorical). Used with value_column for grouped comparisons. By default None.value_column (
list[str] |None(default:None)) – Column whose values should be plotted (numeric). Used with grouping_column for grouped comparisons. By default None.direct_columns (
list[str] |None(default:None)) – List of column names to compare directly. Each column becomes a separate bar. Overrides grouping_column and value_column. By default None.color (
tuple(default:(np.float64(0.21299500192233756), np.float64(0.5114186851211072), np.float64(0.730795847750865), np.float64(1.0)))) – Default color for all bars. By default BaseColors.get(“blue”).color_dict (
dict|None(default:None)) – Dictionary mapping group labels to specific colors. Overrides the color parameter for specified groups. By default None.
- Return type:
- Returns:
None
Examples
Grouped comparison (long format):
import pandas as pd import anndata as ad from alphapepttools.pl.figure import create_figure from alphapepttools.pl.plots import Plots data = pd.DataFrame({"intensity": [1, 2, 3, 4, 5, 6, 7]}) obs = pd.DataFrame({"group": ["A", "A", "B", "B", "B", "C", "C"]}) adata = ad.AnnData(X=data.values, obs=obs, var=pd.DataFrame(index=data.columns)) fig, axm = create_figure(1, 1, figsize=(6, 4)) ax = axm.next() Plots.barplot( ax=ax, data=adata, grouping_column="group", value_column="intensity", color_dict={"A": "red", "B": "green", "C": "blue"}, )
Direct column comparison (wide format):
import pandas as pd import anndata as ad from alphapepttools.pl.figure import create_figure from alphapepttools.pl.plots import Plots data = pd.DataFrame({"protein1": [1, 2, 3], "protein2": [4, 5, 6], "protein3": [7, 8, 9]}) adata = ad.AnnData(X=data.values, var=pd.DataFrame(index=data.columns)) fig, axm = create_figure(1, 1, figsize=(6, 4)) ax = axm.next() Plots.barplot( ax=ax, data=adata, direct_columns=["protein1", "protein2", "protein3"], )
Notes
Error bars show standard deviation of values within each group
Bars have 50% transparency with opaque black outlines
When using direct_columns, each column’s mean is calculated across all rows
Missing values (NaN) are excluded from mean and std calculations
- classmethod Plots.boxplot(ax, data, grouping_column=None, value_column=None, direct_columns=None, color=(np.float64(0.21299500192233756), np.float64(0.5114186851211072), np.float64(0.730795847750865), np.float64(1.0)), color_dict=None)#
Plot a box plot from a DataFrame or AnnData object
Creates a box plot showing the distribution of values for grouped data. Each box shows the median, quartiles, and outliers for values within a group. Boxes have semi-transparent fill with opaque black outlines, medians, whiskers, and caps.
Two modes of operation: 1. Grouping mode: Use grouping_column/value_column to group data by categories 2. Direct mode: Use direct_columns to compare multiple columns directly
- Parameters:
ax (
Axes) – Matplotlib axes object to plot on.data (
AnnData|DataFrame) – Data containing grouping and value columns or direct columns to plot.grouping_column (
list[str] |None(default:None)) – Column containing the groups to compare (categorical). Used with value_column for grouped comparisons. By default None.value_column (
list[str] |None(default:None)) – Column whose values should be plotted (numeric). Used with grouping_column for grouped comparisons. By default None.direct_columns (
list[str] |None(default:None)) – List of column names to compare directly. Each column becomes a separate box. Overrides grouping_column and value_column. By default None.color (
tuple(default:(np.float64(0.21299500192233756), np.float64(0.5114186851211072), np.float64(0.730795847750865), np.float64(1.0)))) – Default color for all boxes. By default BaseColors.get(“blue”).color_dict (
dict|None(default:None)) – Dictionary mapping group labels to specific colors. Overrides the color parameter for specified groups. By default None.
- Return type:
- Returns:
None
Examples
Grouped comparison (long format):
import pandas as pd import anndata as ad from alphapepttools.pl.figure import create_figure from alphapepttools.pl.plots import Plots data = pd.DataFrame({"intensity": [1, 2, 3, 4, 5, 6, 7]}) obs = pd.DataFrame({"group": ["A", "A", "B", "B", "B", "C", "C"]}) adata = ad.AnnData(X=data.values, obs=obs, var=pd.DataFrame(index=data.columns)) fig, axm = create_figure(1, 1, figsize=(6, 4)) ax = axm.next() Plots.boxplot( ax=ax, data=adata, grouping_column="group", value_column="intensity", color_dict={"A": "red", "B": "green", "C": "blue"}, )
Direct column comparison (wide format):
import pandas as pd import anndata as ad from alphapepttools.pl.figure import create_figure from alphapepttools.pl.plots import Plots data = pd.DataFrame({"protein1": [1, 2, 3], "protein2": [4, 5, 6], "protein3": [7, 8, 9]}) adata = ad.AnnData(X=data.values, var=pd.DataFrame(index=data.columns)) fig, axm = create_figure(1, 1, figsize=(6, 4)) ax = axm.next() Plots.boxplot( ax=ax, data=adata, direct_columns=["protein1", "protein2", "protein3"], )
Notes
Boxes show median (center line), quartiles (box edges), and outliers (points)
Whiskers extend to 1.5 * IQR or the most extreme non-outlier point
Boxes have 50% transparency with opaque black outlines
When using direct_columns, each column’s distribution is shown separately
Missing values (NaN) are excluded from the distribution calculations
- classmethod Plots.histogram(data, value_column, color_map_column=None, bins=10, ax=None, color='blue', palette=None, color_dict=None, legend=None, hist_kwargs=None, legend_kwargs=None, xlim=None, ylim=None)#
Plot a histogram from a DataFrame or AnnData object
Creates a histogram showing the distribution of values, with optional grouping by a categorical column. When grouping is used, overlapping histograms are created with the same bin edges for easy comparison.
- Parameters:
data (
DataFrame|AnnData) – Data to plot, must contain the value_column and optionally the color_map_column for grouping.value_column (
str) – Column containing numeric values to plot in the histogram.color_map_column (
str|None(default:None)) – Column for categorical grouping. Each unique value gets its own colored histogram overlay. NaN values are converted to strings.bins (
int(default:10)) – Number of bins for the histogram. Default is 10.ax (
Axes|None(default:None)) – Matplotlib axes to plot on. If None, a new figure is created.color (
str(default:'blue')) – Single color for ungrouped histogram. Default is “blue”.palette (
list[tuple] |None(default:None)) – Color palette for grouped histograms. Defaults to qualitative palette.color_dict (
dict[str,str|tuple] |None(default:None)) – Explicit mapping of groups to colors. Overrides palette if provided.legend (
str|Legend|None(default:None)) – If “auto”, creates legend for grouped data. Can also pass existing Legend.hist_kwargs (
dict|None(default:None)) – Additional arguments for matplotlib.hist() like: - alpha: transparency (0-1) - histtype: ‘bar’, ‘step’, ‘stepfilled’ - edgecolor: outline color - linewidth: outline widthlegend_kwargs (
dict|None(default:None)) – Additional arguments for legend like title, loc, fontsize.xlim (
tuple[float,float] |None(default:None)) – X-axis limits as (min, max).ylim (
tuple[float,float] |None(default:None)) – Y-axis limits as (min, max).
- Return type:
- Returns:
None
Examples
Simple histogram:
import pandas as pd from alphapepttools.pl.figure import create_figure from alphapepttools.pl.plots import Plots df = pd.DataFrame({"intensity": [1.5, 2.3, 2.8, 1.9, 3.1, 2.5]}) fig, axm = create_figure(1, 1, figsize=(6, 4)) ax = axm.next() Plots.histogram(data=df, value_column="intensity", bins=30, color="skyblue", ax=ax)
Grouped histogram with transparency:
import pandas as pd from alphapepttools.pl.figure import create_figure from alphapepttools.pl.plots import Plots df = pd.DataFrame( { "values": [1.5, 2.3, 2.8, 1.9, 3.1, 2.5, 4.2, 3.8], "condition": ["A", "A", "B", "B", "A", "B", "A", "B"], } ) fig, axm = create_figure(1, 1, figsize=(6, 4)) ax = axm.next() Plots.histogram( data=df, value_column="values", color_map_column="condition", bins=20, legend="auto", hist_kwargs={"alpha": 0.7, "histtype": "stepfilled"}, legend_kwargs={"title": "Condition"}, ax=ax, )
Custom color mapping:
import pandas as pd from alphapepttools.pl.figure import create_figure from alphapepttools.pl.plots import Plots example_df = pd.DataFrame( { "values": [1, 2, 3, 4, 5, 6, 7, 8, 9], "levels": ["A", "B", "C", "A", "B", "C", "A", "B", "C"], } ) fig, axm = create_figure(1, 1, figsize=(6, 4)) ax = axm.next() Plots.histogram( data=example_df, value_column="values", color_map_column="levels", color_dict={"A": "red", "B": "blue", "C": "green"}, bins=20, ax=ax, legend="auto", hist_kwargs={"alpha": 0.7, "histtype": "stepfilled", "edgecolor": "k"}, legend_kwargs={"title": "Levels", "loc": "upper left"}, )
Notes
When grouping data, all groups use the same bin edges for comparison
Unmapped groups in color_dict default to grey
NaN values are excluded from the histogram
- classmethod Plots.plot_pca(data, x_column=1, y_column=2, color='blue', color_map_column=None, color_column=None, dim_space='obs', embeddings_name=None, label=False, label_column=None, ax=None, palette=None, color_dict=None, legend=None, scatter_kwargs=None)#
PCA scatter plot showing principal component projections.
Visualizes PCA results by plotting two principal components against each other. The function retrieves PCA embeddings from the AnnData object based on the dim_space parameter: use “obs” for sample projections (most common, shows how samples relate) or “var” for feature projections (shows how features/genes relate). Axes are automatically labeled with explained variance percentages.
- Parameters:
data (
AnnData) – AnnData object containing PCA results (must have run PCA first).x_column (
int(default:1)) – Principal component number for x-axis (1-indexed, so 1 = PC1, 2 = PC2, etc.).y_column (
int(default:2)) – Principal component number for y-axis (1-indexed).color (
str(default:'blue')) – Single color for all points. Overridden by color_map_column or color_column.color_map_column (
str|None(default:None)) – Column in data.obs (for dim_space=”obs”) or data.var (for dim_space=”var”) to use for color encoding. Values are mapped to colors using palette or color_dict. Overrides the color parameter.color_column (
str|None(default:None)) – Column containing actual color values (hex, RGBA, etc.). Overrides both color and color_map_column parameters.dim_space (
str(default:'obs')) – PCA space to visualize: - “obs”: Sample projections (default) - shows samples in PC space - “var”: Feature projections - shows features/genes in PC spaceembeddings_name (
str|None(default:None)) – Custom embeddings name if non-default name was used in the PCA function. If None, uses default naming convention (“X_pca_obs” or “X_pca_var”).label (
bool(default:False)) – Whether to add text labels to points in the scatter plot.label_column (
str|None(default:None)) – Column to use for point labels. If None and label=True, uses the index (data.obs.index for dim_space=”obs”, data.var.index for dim_space=”var”).ax (
Axes|None(default:None)) – Matplotlib axes to plot on. If None, a new figure is created.palette (
list[str|tuple] |None(default:None)) – List of colors for color encoding. If None, uses default qualitative palette.color_dict (
dict[str,str|tuple] |None(default:None)) – Dictionary mapping category values to specific colors. Overrides palette.legend (
str|Legend|None(default:None)) – Legend specification. Use “auto” to create legend from color_map_column.scatter_kwargs (
dict|None(default:None)) – Additional keyword arguments passed to matplotlib scatter (e.g., s, alpha).
- Return type:
Examples
Basic PCA plot with sample coloring:
fig, ax = plt.subplots() Plots.plot_pca( data=adata, ax=ax, x_column=1, y_column=2, color_map_column="replicate", legend="auto", )
PCA with custom PC axes and labels:
fig, ax = plt.subplots() Plots.plot_pca( data=adata, ax=ax, x_column=2, # PC2 y_column=3, # PC3 label=True, label_column="sample_id", color_map_column="treatment", color_dict={"Control": "gray", "Drug": "red"}, )
Feature space PCA (var projection):
# Show how proteins/genes relate to each other in PC space fig, ax = plt.subplots() Plots.plot_pca( data=adata, ax=ax, x_column=1, y_column=2, dim_space="var", # Feature projection instead of sample color_map_column="protein_type", scatter_kwargs={"s": 20, "alpha": 0.6}, )
Notes
PCA must be run on the AnnData object before calling this function
Axis labels automatically include explained variance percentages (e.g., “PC1 (45.2%)”)
dim_space=”obs” retrieves sample projections from obsm (most common usage)
dim_space=”var” retrieves feature projections from varm (less common)
PC numbers are 1-indexed: x_column=1 corresponds to the first principal component
This is a convenience wrapper around scatter() with automatic PCA data extraction
- classmethod Plots.plot_pca_loadings(data, ax, dim_space='obs', embeddings_name=None, dim=1, nfeatures=20, scatter_kwargs=None)#
1D loadings plot showing top features contributing to a principal component.
Creates a scatter plot displaying the loadings (weights) of the top contributing features for a single principal component. Loadings indicate how much each feature (gene/protein) contributes to the PC. The plot shows the top N features ranked by absolute loading value.
- Parameters:
data (
AnnData|DataFrame) – AnnData object containing PCA results (must have run PCA first).ax (
Axes) – Matplotlib axes object to plot on.dim_space (
str(default:'obs')) – PCA space to retrieve loadings from: - “obs”: Sample space PCA (default) - shows which features drive sample separation - “var”: Feature space PCA - shows which samples drive feature separationembeddings_name (
str|None(default:None)) – Custom embeddings name if non-default name was used in the PCA function. If None, uses default naming convention.dim (
int(default:1)) – Principal component number to show loadings for (1-indexed, so 1 = PC1, 2 = PC2, etc.).nfeatures (
int(default:20)) – Number of top features (by absolute loading value) to display.scatter_kwargs (
dict|None(default:None)) – Additional keyword arguments passed to matplotlib scatter (e.g., s, alpha).
- Return type:
Examples
Basic loadings plot for PC1:
fig, ax = plt.subplots() Plots.plot_pca_loadings( data=adata, ax=ax, dim=1, nfeatures=20, )
Loadings plot for PC3 with more features:
fig, ax = plt.subplots() Plots.plot_pca_loadings(data=adata, ax=ax, dim=3, nfeatures=30, scatter_kwargs={"s": 50, "alpha": 0.8})
Feature space loadings (var projection):
# Show which samples most influence feature PC1 fig, ax = plt.subplots() Plots.plot_pca_loadings( data=adata, ax=ax, dim=1, dim_space="var", nfeatures=15, )
Notes
PCA must be run on the AnnData object before calling this function
Features are ranked by absolute loading value (magnitude, not sign)
Y-axis shows feature names, X-axis shows loading values
dim_space=”obs” shows feature loadings (most common - which proteins/genes matter)
dim_space=”var” shows sample loadings (which samples matter)
This is a convenience wrapper around scatter() with automatic loadings data extraction
- classmethod Plots.plot_pca_loadings_2d(data, ax, dim_space='obs', embeddings_name=None, pc_x=1, pc_y=2, nfeatures=20, *, add_labels=True, add_lines=False, scatter_kwargs=None)#
2D loadings plot showing top features contributing to two principal components.
Creates a scatter plot displaying the first two principal component loadings against each other. Loadings indicate how much each feature (gene/protein) contributes to each PC. The plot shows all features used in the PCA as grey points, with the top N features (by absolute loading value) highlighted in blue. Optionally, labels can be added to the top features.
- Parameters:
data (
AnnData|DataFrame) – AnnData to plot.ax (
Axes) – Matplotlib axes object to plot on.dim_space (
str(default:'obs')) – The dimension space used in PCA. Can be either “obs” (default) for sample projection or “var” for feature projection. By default “obs”.embeddings_name (
str|None(default:None)) – The custom embeddings name used in PCA. If None, uses default naming convention. By default None.pc_x (
int(default:1)) – The PC principal component index to plot on the x axis, by default 1. Corresponds to the principal component order, the first principal is 1 (1-indexed, i.e. the first PC is 1, not 0).pc_y (
int(default:2)) – The principal component index to plot on the y axis, by default 2. Corresponds to the principal component order, the first principal is 1 (1-indexed, i.e. the first PC is 1, not 0).nfeatures (
int(default:20)) – The number of top absolute loadings features to label from each component, by default 20add_labels (
bool(default:True)) – Whether to add feature labels of the topnfeaturesloadings. by defaultTrue.add_lines (
bool(default:False)) – If True, draw lines connecting the origin (0,0) to the points representing the topnfeaturesloadings. Default isFalse.scatter_kwargs (
dict|None(default:None)) – Additional keyword arguments for the matplotlib scatter function. By default None.
- Return type:
Examples
Basic 2D PCA loadings plot:
fig, ax = plt.supplots() Plots.plot_pca_loadings_2d( data=adata, ax=ax, pc_x=1, pc_y=2, nfeatures=20, add_labels=True, add_lines=True, scatter_kwargs=None, )
Notes
PCA must be run on the AnnData object before calling this function
Features are ranked by absolute loading value (magnitude, not sign)
X and Y axes show loading values for the specified principal components
dim_space=”obs” shows feature loadings (most common - which proteins/genes matter)
dim_space=”var” shows sample loadings (which samples matter)
This is a convenience wrapper around scatter() with automatic loadings data extraction
- classmethod Plots.rank_median_plot(data, ax, layer='X', color='blue', color_map_column=None, color_column=None, palette=None, color_dict=None, legend=None, scatter_kwargs=None)#
Rank plot showing median intensities across samples.
Computes the median intensity for each feature (protein/peptide) across all samples, ranks them from highest to lowest, and creates a scatter plot with rank on the x-axis and median intensity on the y-axis (log-scale). Useful for visualizing the dynamic range of detected features and identifying highly abundant vs low-abundance features.
- Parameters:
data (
AnnData|DataFrame) – AnnData or DataFrame containing intensity values.ax (
Axes) – Matplotlib axes object to plot on.layer (
str(default:'X')) – The AnnData layer to use for calculating median intensities. Default is “X”.color (
str(default:'blue')) – Single color for all points. Overridden by color_map_column or color_column.color_map_column (
str|None(default:None)) – Column in data.var (for AnnData) to use for color encoding. Values are mapped to colors using the palette or color_dict. Overrides the color parameter.color_column (
str|None(default:None)) – Column in data.var (for AnnData) containing actual color values (hex, RGBA, etc.). Overrides both color and color_map_column parameters.palette (
list[str|tuple] |None(default:None)) – List of colors to use for color encoding. If None, a default palette is used.color_dict (
dict[str,str|tuple] |None(default:None)) – Dictionary mapping category values to specific colors. If provided, palette is ignored.legend (
str|Legend|None(default:None)) – Legend specification. Use “auto” to automatically create a legend from color_map_column.scatter_kwargs (
dict|None(default:None)) – Additional keyword arguments passed to matplotlib scatter function (e.g., alpha, s).
- Return type:
Examples
Basic rank plot with single color:
fig, ax = plt.subplots() Plots.rank_median_plot( data=adata, ax=ax, color=BaseColors.get("blue"), scatter_kwargs={"alpha": 0.7}, )
Color by protein category:
fig, ax = plt.subplots() Plots.rank_median_plot( data=adata, ax=ax, color_map_column="protein_type", color_dict={"protein_type_A": "red", "protein_type_B": "green", "protein_type_C": "blue"}, legend="auto", scatter_kwargs={"s": 20}, )
Notes
The y-axis is automatically set to log scale
Features are ranked from highest to lowest median intensity
For AnnData objects, var annotations can be used for coloring via color_map_column
This is a convenience wrapper around the scatter() method with automatic data preparation
- classmethod Plots.scatter(data, x_column, y_column, color=None, color_map_column=None, color_column=None, ax=None, palette=None, color_dict=None, legend=None, scatter_kwargs=None, legend_kwargs=None, xlim=None, ylim=None)#
Plot a scatterplot from a DataFrame or AnnData object
Coloring works in three ways, with the following order of precedence: 1. color_column, 2. color_map_column, 3. color. If a color_column is provided, its values are interpreted directly as colors, i.e. they have to be something matplotlib can understand (e.g. RGBA, hex, etc.). If a color_map_column is provided, its values are mapped to colors in combination with palette or color_dict (see color mapping logic below). If neither color_column nor color_map_column is provided, the color parameter is used to color all points the same (defaults to blue).
Color mapping logic#
- color_map_column is non-numeric:
If color_dict is not None: Use color_dict to assign levels of color_map_column to colors (unmapped levels default to grey).
If color_dict is None, and palette is not None: Use palette to automatically assign colors to each level.
If color_dict is None and palette is None: Use a repeating default palette to assign colors to each level.
- color_map_column is numeric:
If palette is a matplotlib colormap: Quantitatively map values to colors using the colormap. This means that e.g. 1 and 3 will be closer in color than 1 and 10.
If palette is not a matplotlib colormap: Treat numeric values as categorical and color as described above.
- type data:
DataFrame|AnnData- param data:
Data to plot, must contain the x_column and y_column and optionally the color_column or color_map_column.
- type x_column:
- param x_column:
Column in data to plot on the x-axis. Must contain numeric data.
- type y_column:
- param y_column:
Column in data to plot on the y-axis. Must contain numeric data.
- type color:
- param color:
Color to use for the scatterplot. By default “blue”.
- type color_map_column:
- param color_map_column:
Column in data to use for color encoding. These values are mapped to the palette or the color_dict (see below). Its values cannot contain NaNs, therefore color_map_column is coerced to string and missing values replaced by a default filler string. Overrides color parameter. By default None.
- type color_column:
- param color_column:
Column in data to plot the colors. This must contain actual color values (RGBA, hex, etc.). Overrides color and color_map_column parameters. By default None.
- type ax:
Axes|None(default:None)- param ax:
Matplotlib axes object to plot on, if None a new figure is created. By default None.
- type palette:
- param palette:
List of colors to use for color encoding, if None a default palette is used. Can be a matplotlib Colormap for continuous gradients. By default None.
- type color_dict:
- param color_dict:
Supercedes palette, a dictionary mapping levels to colors. By default None. If provided, palette is ignored.
- type legend:
- param legend:
Legend to add to the plot, by default None. If “auto”, a legend is created from the color_column. By default None.
- type scatter_kwargs:
- param scatter_kwargs:
Additional keyword arguments for the matplotlib scatter function (s, alpha, edgecolors, etc.). By default None.
- type legend_kwargs:
- param legend_kwargs:
Additional keyword arguments for the matplotlib legend function. By default None.
- type xlim:
- param xlim:
Limits for the x-axis. By default None.
- type ylim:
- param ylim:
Limits for the y-axis. By default None.
- rtype:
- returns:
None
Examples
Simple scatter with single color:
import pandas as pd from alphapepttools.pl.figure import create_figure from alphapepttools.pl.plots import Plots df = pd.DataFrame({"x": [1, 2, 3, 4, 5], "y": [2, 4, 1, 3, 5]}) fig, axm = create_figure(1, 1, figsize=(6, 4)) ax = axm.next() Plots.scatter(data=df, x_column="x", y_column="y", color="red", ax=ax)
Categorical coloring with automatic palette:
import pandas as pd from alphapepttools.pl.figure import create_figure from alphapepttools.pl.plots import Plots df = pd.DataFrame( { "x": [1, 2, 3, 4, 5], "y": [2, 4, 1, 3, 5], "category": ["A", "B", "A", "C", "B"], } ) fig, axm = create_figure(1, 1, figsize=(6, 4)) ax = axm.next() Plots.scatter( data=df, x_column="x", y_column="y", color_map_column="category", legend="auto", ax=ax, )
Custom color dictionary:
import pandas as pd from alphapepttools.pl.figure import create_figure from alphapepttools.pl.plots import Plots df = pd.DataFrame( { "x": [1, 2, 3, 4, 5], "y": [2, 4, 1, 3, 5], "significance": ["significant", "not_significant", "significant", "not_significant", "significant"], } ) fig, axm = create_figure(1, 1, figsize=(6, 4)) ax = axm.next() Plots.scatter( data=df, x_column="x", y_column="y", color_map_column="significance", color_dict={"significant": "red", "not_significant": "gray"}, legend="auto", scatter_kwargs={"s": 50, "alpha": 0.7}, ax=ax, )
Quantitative gradient with numeric data:
import pandas as pd from alphapepttools.pl.figure import create_figure from alphapepttools.pl.plots import Plots from alphapepttools.pl.colors import BaseColormaps df = pd.DataFrame( { "x": [1, 2, 3, 4, 5], "y": [2, 4, 1, 3, 5], "intensity": [1.0, 5.0, 10.0, 15.0, 20.0], } ) fig, axm = create_figure(1, 1, figsize=(6, 4)) ax = axm.next() Plots.scatter( data=df, x_column="x", y_column="y", color_map_column="intensity", palette=BaseColormaps.get("sequential"), ax=ax, )
Direct color values from column:
import pandas as pd from alphapepttools.pl.figure import create_figure from alphapepttools.pl.plots import Plots df = pd.DataFrame( { "x": [1, 2, 3, 4, 5], "y": [2, 4, 1, 3, 5], "my_colors": ["#FF0000", "#00FF00", "#0000FF", "#FFFF00", "#FF00FF"], } ) fig, axm = create_figure(1, 1, figsize=(6, 4)) ax = axm.next() Plots.scatter( data=df, x_column="x", y_column="y", color_column="my_colors", ax=ax, )
Notes
Points are ordered by color frequency (most frequent in back) for better visibility
Unmapped values in color_dict default to grey
NaN values in color columns are handled as strings
- classmethod Plots.scree_plot(adata, ax, n_pcs=20, dim_space='obs', color='blue', embeddings_name=None, scatter_kwargs=None)#
Scree plot showing explained variance for each principal component.
Creates a scatter plot displaying the percentage of variance explained by each principal component. Useful for determining how many PCs capture most of the variation in the data and for deciding how many components to retain for analysis.
- Parameters:
adata (
AnnData|DataFrame) – AnnData object containing PCA results (must have run PCA first).ax (
Axes) – Matplotlib axes object to plot on.n_pcs (
int(default:20)) – Number of principal components to plot on the x-axis.dim_space (
str(default:'obs')) – PCA space to retrieve variance from: - “obs”: Sample space PCA (default) - variance explained across samples - “var”: Feature space PCA - variance explained across featurescolor (
str(default:'blue')) – Color for the scatter points.embeddings_name (
str|None(default:None)) – Custom embeddings name if non-default name was used in the PCA function. If None, uses default naming convention.scatter_kwargs (
dict|None(default:None)) – Additional keyword arguments passed to matplotlib scatter (e.g., s, alpha).
- Return type:
Examples
Basic scree plot:
fig, ax = plt.subplots() Plots.scree_plot(adata=adata, ax=ax, n_pcs=50)
Scree plot with custom styling:
fig, ax = plt.subplots() Plots.scree_plot(adata=adata, ax=ax, n_pcs=30, color="red", scatter_kwargs={"s": 50, "alpha": 0.8})
Feature space scree plot:
# Show variance explained in feature space PCA fig, ax = plt.subplots() Plots.scree_plot(adata=adata, ax=ax, n_pcs=20, dim_space="var")
Notes
PCA must be run on the AnnData object before calling this function
Y-axis shows percentage of total variance explained by each PC
dim_space=”obs” shows variance for sample projections (most common)
dim_space=”var” shows variance for feature projections
This is a convenience wrapper around scatter() with automatic variance data extraction
- classmethod Plots.violinplot(ax, data, grouping_column=None, value_column=None, direct_columns=None, color=(np.float64(0.21299500192233756), np.float64(0.5114186851211072), np.float64(0.730795847750865), np.float64(1.0)), color_dict=None)#
Plot a violin plot from a DataFrame or AnnData object
Creates a violin plot showing the distribution density of values for grouped data. Each violin shows the kernel density estimation of the distribution, along with medians, quartiles, and min/max whiskers. Violins have semi-transparent fill with opaque black outlines and internal statistical markers.
Two modes of operation: 1. Grouping mode: Use grouping_column/value_column to group data by categories 2. Direct mode: Use direct_columns to compare multiple columns directly
- Parameters:
ax (
Axes) – Matplotlib axes object to plot on.data (
AnnData|DataFrame) – Data containing grouping and value columns or direct columns to plot.grouping_column (
list[str] |None(default:None)) – Column containing the groups to compare (categorical). Used with value_column for grouped comparisons. By default None.value_column (
list[str] |None(default:None)) – Column whose values should be plotted (numeric). Used with grouping_column for grouped comparisons. By default None.direct_columns (
list[str] |None(default:None)) – List of column names to compare directly. Each column becomes a separate violin. Overrides grouping_column and value_column. By default None.color (
tuple(default:(np.float64(0.21299500192233756), np.float64(0.5114186851211072), np.float64(0.730795847750865), np.float64(1.0)))) – Default color for all violins. By default BaseColors.get(“blue”).color_dict (
dict|None(default:None)) – Dictionary mapping group labels to specific colors. Overrides the color parameter for specified groups. By default None.
- Return type:
- Returns:
None
Examples
Grouped comparison (long format):
import pandas as pd import anndata as ad from alphapepttools.pl.figure import create_figure from alphapepttools.pl.plots import Plots data = pd.DataFrame({"intensity": [1, 2, 3, 4, 5, 6, 7]}) obs = pd.DataFrame({"group": ["A", "A", "B", "B", "B", "C", "C"]}) adata = ad.AnnData(X=data.values, obs=obs, var=pd.DataFrame(index=data.columns)) fig, axm = create_figure(1, 1, figsize=(6, 4)) ax = axm.next() Plots.violinplot( ax=ax, data=adata, grouping_column="group", value_column="intensity", color_dict={"A": "red", "B": "green", "C": "blue"}, )
Direct column comparison (wide format):
import pandas as pd import anndata as ad from alphapepttools.pl.figure import create_figure from alphapepttools.pl.plots import Plots data = pd.DataFrame({"protein1": [1, 2, 3], "protein2": [4, 5, 6], "protein3": [7, 8, 9]}) adata = ad.AnnData(X=data.values, var=pd.DataFrame(index=data.columns)) fig, axm = create_figure(1, 1, figsize=(6, 4)) ax = axm.next() Plots.violinplot( ax=ax, data=adata, direct_columns=["protein1", "protein2", "protein3"], )
Notes
Violins show kernel density estimation of the distribution
Internal markers show median, quartiles, and min/max values
Violins have 50% transparency with opaque black outlines
When using direct_columns, each column’s distribution is shown separately
Missing values (NaN) are excluded from the distribution calculations