alphapepttools.metrics.coefficient_of_variation

alphapepttools.metrics.coefficient_of_variation#

alphapepttools.metrics.coefficient_of_variation(adata, *, group_column=None, min_valid=3, key_added='cv', layer=None, copy=False)#

Coefficient of variation

Compute the coefficient of variation (CV) per feature, either across all samples or within sample groups.

\[CV = \frac{s(X)}{\hat{X}}\]

with the empirical standard deviation \(s(X)\) of feature \(X\) and the empirical mean \(\hat{X}\)

The coefficient of variation is a scale-invariant measure of dispersion that enables comparison of variability across features with different abundance levels.

Within technical replicates, the CV indicates measurement reproducibility. Lower CVs indicate good technical precision, while high CVs suggest issues with sample preparation, instrument performance, or quantification accuracy.

Between different biological samples, CVs reflect both biological and technical variation. Higher CVs are expected and can indicate genuine biological heterogeneity.

Parameters:
  • adata (AnnData) – AnnData object

  • group_column (str | None (default: None)) – Column name in adata.obs defining groups for groupwise CV computation. If None (default), one CV per feature is computed across all samples and stored as a column in adata.var[key_added]. If specified, one CV per feature is computed within each group separately and stored as a DataFrame in adata.varm[key_added] with adata.var_names as the index and group names as columns.

  • min_valid (int (default: 3)) – Minimum number of samples required to estimate the CV. Will be set to NaN otherwise

  • key_added (str (default: 'cv')) – Name of the column added to adata.var (ungrouped) or the key added to adata.varm (grouped)

  • layer (str | None (default: None)) – Name of the layer to compute metric on. If None (default), the data matrix X is used

  • copy (bool (default: False)) – If False (default), modifies adata inplace and returns None. If True, returns a copy of the adata object.

Return type:

None | AnnData

Returns:

AnnData object with computed CVs. If group_column=None, results are written to adata.var[key_added]. If group_column is set, results are written to adata.varm[key_added] as a DataFrame. If copy=False modifies the anndata object inplace and returns None. If copy=True, returns a modified copy

Raises:

ValueError – If group_column is set and contains NaN values

Examples

Compute one CV per feature across all samples:

import numpy as np
import anndata as ad
import pandas as pd
import alphapepttools as apt

adata = ad.AnnData(
    X=np.array([[1, 2], [5, 1], [6, 6], [9, 3], [4, 8], [7, 4]]),
    obs=pd.DataFrame({"group": ["A", "A", "A", "B", "B", "B"]}),
    var=pd.DataFrame(index=["protein1", "protein2"]),
)

apt.metrics.coefficient_of_variation(adata)
print(adata.var["cv"])  # one CV per feature across all samples

Compute one CV per feature per group, e.g. for replicate groups:

apt.metrics.coefficient_of_variation(adata, group_column="group")
print(adata.varm["cv"])
# DataFrame indexed by var_names, columns are group labels:
#            A    B
# protein1  0.540062  0.308221
# protein2  0.720082  0.432049

Notes

The CV only considers non-missing values and should be computed before imputation. Features with fewer than min_valid non-missing values will return NaN for CV