Filtering by metadata

Filtering by metadata#

While filtering AnnData objects with familiar pandas/numpy slicing is possible, we found that applying multiple filters and filters based on ranges of data can get cumbersome, as illustrated by this example.

import pandas as pd
import numpy as np
import anndata as ad
import alphapepttools as at
X = pd.DataFrame(
    {
        **{f"gene_{i}": np.random.randn(6) for i in range(5)},
    },
    index=[f"cell_{i}" for i in range(6)],
)

sample_metadata = pd.DataFrame(
    {
        "column1": ["A", "B", "C", "D", "E", "F"],
        "column2": [50, 200, 50, 200, 50, 200],
    },
    index=[f"cell_{i}" for i in range(6)],
)

test_adata = ad.AnnData(X, obs=sample_metadata)

AnnData objects can be natively handled similarly to pandas.DataFrames

adata_filtered_1 = test_adata[
    (test_adata.obs["column1"].isin(["A", "B", "C"]))
    | ((test_adata.obs["column2"] > 20) & (test_adata.obs["column2"] <= 100)),  # NOQA: PLR2004
    :,
]
adata_filtered_1
View of AnnData object with n_obs × n_vars = 4 × 5
    obs: 'column1', 'column2'

Alphapepttools further facilitates this with its built-in function alphapepttools.pp.filter_by_metadata

# We use this for easy and transparent filtering
adata_filtered_2 = at.pp.filter_by_metadata(
    test_adata, {"column1": ["A", "B", "C"], "column2": (20, 100)}, axis=0, logic="or", action="keep"
)
adata_filtered_2
View of AnnData object with n_obs × n_vars = 4 × 5
    obs: 'column1', 'column2'

Both approaches produce equivalent results

pd.testing.assert_frame_equal(adata_filtered_1.obs, adata_filtered_2.obs)