alphapepttools.metrics.coefficient_of_variation#
- alphapepttools.metrics.coefficient_of_variation(adata, *, group_column=None, min_valid=3, key_added='cv', layer=None, copy=False)#
Coefficient of variation
Compute the coefficient of variation (CV) per feature, either across all samples or within sample groups.
\[CV = \frac{s(X)}{\hat{X}}\]with the empirical standard deviation \(s(X)\) of feature \(X\) and the empirical mean \(\hat{X}\)
The coefficient of variation is a scale-invariant measure of dispersion that enables comparison of variability across features with different abundance levels.
Within technical replicates, the CV indicates measurement reproducibility. Lower CVs indicate good technical precision, while high CVs suggest issues with sample preparation, instrument performance, or quantification accuracy.
Between different biological samples, CVs reflect both biological and technical variation. Higher CVs are expected and can indicate genuine biological heterogeneity.
- Parameters:
adata (
AnnData) – AnnData objectgroup_column (
str|None(default:None)) – Column name inadata.obsdefining groups for groupwise CV computation. IfNone(default), one CV per feature is computed across all samples and stored as a column inadata.var[key_added]. If specified, one CV per feature is computed within each group separately and stored as a DataFrame inadata.varm[key_added]withadata.var_namesas the index and group names as columns.min_valid (
int(default:3)) – Minimum number of samples required to estimate the CV. Will be set toNaNotherwisekey_added (
str(default:'cv')) – Name of the column added toadata.var(ungrouped) or the key added toadata.varm(grouped)layer (
str|None(default:None)) – Name of the layer to compute metric on. If None (default), the data matrix X is usedcopy (
bool(default:False)) – IfFalse(default), modifiesadatainplace and returnsNone. IfTrue, returns a copy of theadataobject.
- Return type:
- Returns:
AnnData object with computed CVs. If
group_column=None, results are written toadata.var[key_added]. Ifgroup_columnis set, results are written toadata.varm[key_added]as a DataFrame. Ifcopy=Falsemodifies the anndata object inplace and returns None. Ifcopy=True, returns a modified copy- Raises:
ValueError – If
group_columnis set and contains NaN values
Examples
Compute one CV per feature across all samples:
import numpy as np import anndata as ad import pandas as pd import alphapepttools as apt adata = ad.AnnData( X=np.array([[1, 2], [5, 1], [6, 6], [9, 3], [4, 8], [7, 4]]), obs=pd.DataFrame({"group": ["A", "A", "A", "B", "B", "B"]}), var=pd.DataFrame(index=["protein1", "protein2"]), ) apt.metrics.coefficient_of_variation(adata) print(adata.var["cv"]) # one CV per feature across all samples
Compute one CV per feature per group, e.g. for replicate groups:
apt.metrics.coefficient_of_variation(adata, group_column="group") print(adata.varm["cv"]) # DataFrame indexed by var_names, columns are group labels: # A B # protein1 0.540062 0.308221 # protein2 0.720082 0.432049
Notes
The CV only considers non-missing values and should be computed before imputation. Features with fewer than
min_validnon-missing values will return NaN for CV