alphapepttools.metrics.coefficient_of_variation

alphapepttools.metrics.coefficient_of_variation#

alphapepttools.metrics.coefficient_of_variation(adata, *, min_valid=3, key_added='cv', layer=None, copy=False)#

Coefficient of variation

Compute the coefficient of variation (CV) for all features.

\[CV = \frac{s(X)}{\hat{X}}\]

with the empirical standard deviation \(s(X)\) of feature \(X\) and the empirical mean \(\hat{X}\)

The coefficient of variation is a scale-invariant measure of dispersion that enables comparison of variability across features with different abundance levels.

Within technical replicates, the CV indicates measurement reproducibility. Lower CVs indicate good technical precision, while high CVs suggest issues with sample preparation, instrument performance, or quantification accuracy.

Between different biological samples, CVs reflect both biological and technical variation. Higher CVs are expected and can indicate genuine biological heterogeneity.

Parameters:
  • adata (AnnData) – AnnData object

  • min_valid (int (default: 3)) – Minimum number of samples required to estimate the CV. Will be set to NaN otherwise

  • key_added (str (default: 'cv')) – Name of column added to adata.var

  • layer (str | None (default: None)) – Name of the layer to compute metric on. If None (default), the data matrix X is used

  • copy (bool (default: False)) – Whether to return a modified copy (True) of the anndata object. If False (default) modifies the object inplace

Return type:

None | AnnData

Returns:

AnnData object with computed CVs added to adata.var[key_added]. If copy=False modifies the anndata object inplace and returns None. If copy=True, returns a modified copy

Examples

import numpy as np
import anndata as ad
import pandas as pd
import alphapepttools as at

# Create example data
adata = ad.AnnData(
    X=np.array([[1, 2], [5, 1], [6, 6], [9, 3], [4, 8], [7, 4]]),
    obs=pd.DataFrame({"group": ["A", "A", "A", "B", "B", "B"]}),
    var=pd.DataFrame(index=["protein1", "protein2"]),
)

# Compute CV for each protein across samples
at.metrics.coefficient_of_variation(adata)

# CVs are now stored in adata.var['cv']
print(adata.var["cv"])  # protein1: 0.5, protein2: 0.6

Notes

The CV only considers non-missing values and should be computed before imputation. Features with fewer than min_valid non-missing values will return NaN for CV