alphapepttools.io.AnnDataFactory#
- class alphapepttools.io.AnnDataFactory(psm_df, intensity_column, sample_id_column, feature_id_column)#
Factory class to convert AlphaBase PSM DataFrames to AnnData format.
Methods table#
|
Create AnnData object from PSM DataFrame. |
|
Create AnnDataFactory from PSM files. |
Methods#
- AnnDataFactory.create_anndata(var_columns=None, obs_columns=None)#
Create AnnData object from PSM DataFrame.
- Parameters:
- Return type:
- Returns:
AnnData object where: - obs (rows) are samples - var (columns) are features (e.g., proteins, peptides, or genes) - X contains intensity values
Examples
import pandas as pd from alphapepttools.io.anndata_factory import AnnDataFactory # Create sample data with metadata df = pd.DataFrame( { "raw_name": ["sample1"] * 3 + ["sample2"] * 3, "protein_group": ["PROT1", "PROT2", "PROT3"] * 2, "intensity": [100, 200, 150, 120, 210, 160], "gene_names": ["GENE1", "GENE2", "GENE3"] * 2, "condition": ["control"] * 3 + ["treated"] * 3, } ) factory = AnnDataFactory( psm_df=df, intensity_column="intensity", sample_id_column="raw_name", feature_id_column="protein_group" ) # Create AnnData with metadata adata = factory.create_anndata( var_columns=["gene_names"], # Add gene names to var obs_columns=["condition"], # Add condition to obs ) print(adata.shape) # (2, 3) - 2 samples, 3 proteins print(adata.var["gene_names"]) # Gene annotations print(adata.obs["condition"]) # Sample conditions
- classmethod AnnDataFactory.from_files(file_paths, reader_type='maxquant', level='proteins', *, intensity_column=None, feature_id_column=None, sample_id_column=None, additional_columns=None, **reader_kwargs)#
Create AnnDataFactory from PSM files.
- Parameters:
reader_type (
str(default:'maxquant')) – Type of PSM reader to use, by default “maxquant”level (
str(default:'proteins')) – Level of quantification to read. One of “proteins”, “precursors”, or “genes”. Defaults to “proteins”.intensity_column (
str|None(default:None)) – Name of the column storing intensity data. Default is taken frompsm_reader.yamlfeature_id_column (
str|None(default:None)) – Name of the column storing feature ids. Default is taken frompsm_reader.yamlsample_id_column (
str|None(default:None)) – Name of the column storing sample ids. Default is taken frompsm_reader.yamladditional_columns (
list[str] |None(default:None)) – Names of additional columns from the PSM table to retain for experiment-specific metadata. These columns can be added to the resulting AnnData object as annotations. Note that if a column has a higher cardinality than thefeature_id_column(i.e., multiple values per feature), only the first value encountered will be kept.**reader_kwargs – Additional arguments passed to PSM reader
- Return type:
- Returns:
Initialized AnnDataFactory instance
Examples
from alphapepttools.io.anndata_factory import AnnDataFactory # Load DIA-NN data at protein level # assuming a diann report called "report.tsv" exists in the current directory factory = AnnDataFactory.from_files("report.tsv", reader_type="diann", level="proteins") adata = factory.create_anndata() # Load with custom column names and additional metadata columns factory = AnnDataFactory.from_files( report_path, reader_type="diann", intensity_column="Precursor.Quantity", additional_columns=["Precursor.Quantity"], # additional columns need to be specified here. ) adata = factory.create_anndata( var_columns=["charge", "sequence"] ) # Add m/z and stripped sequence via their alphabase-standardized column names in var display(adata.var) # Check that additional columns are included in var