alphapepttools.pp.filter_data_completeness#
- alphapepttools.pp.filter_data_completeness(adata, max_missing, group_column=None, groups=None, action='flag', var_colname='passed_threshold_missing_values')#
Filter features based on missing values
Filters AnnData features (columns) based on the fraction of missing values. If group_column and groups are provided, only missingness of certain metadata levels is considered. This is especially useful for imbalanced classes, where filtering by global missingness may leave too many missing values in the smaller class.
(In case rows should be filtered, it is recommended to transpose the adata object prior to calling this function and reverting the transpose afterwards.)
- Parameters:
max_missing (float) – Maximum fraction of missing values allowed. Compared with the fraction of missing values in a “greater than” fashion, i.e. if max_missing is 0.6 and the fraction of missing values is 0.6, the sample or feature is kept. Greater than comparison is used here since the missing fraction may be 0.0, in which case the sample or feature should be kept.
group_column (str, optional) – Column in obs to determine groups for filtering.
groups (list[str], optional) – List of levels of the group_column to consider in filtering. E.g. if the column has the levels [‘A’, ‘B’, ‘C’], and groups = [‘A’, ‘B’], only missingness of features in these groups is considered. If None, all groups are considered.
action (str, optional) – Action to perform. can be ‘flag’ (default) or ‘drop’. If ‘flag’, a boolean column in
adata.varis added to indicate whether the feature passed the missingness threshold. If ‘drop’, features that do not pass the threshold are dropped from the AnnData object.var_colname (str, optional) – Name of the
adata.varboolean column to add if action is ‘flag’. Default is ‘passed_threshold_missing_values’.
- Return type:
- Returns:
AnnData AnnData object with either a new
adata.varcolumn added (ifflag) or filtered features (ifdrop).