alphapepttools.io.read_psm_table

Contents

alphapepttools.io.read_psm_table#

alphapepttools.io.read_psm_table(file_paths, search_engine, level='proteins', *, intensity_column=None, feature_id_column=None, sample_id_column=None, var_columns=None, obs_columns=None, **reader_kwargs)#

Read peptide spectrum match tables to the anndata.AnnData format

Read peptide spectrum match (PSM) tables from proteomics search engines into the anndata.AnnData format (observations x features). Per default, raw protein intensities are returned. Additionally, custom columns can be selected to be retained in the resulting AnnData object.

Note: The underlying pivoting function will aggregate metadata in a “first” manner, meaning that if the metadata is finer grained than the feature level, information will be lost. An example for this is setting feature_id_column=”protein_ids” and setting “var_columns” to include peptide sequences. This produces a protein-level AnnData object with one peptide sequence per protein, which is likely not desired. Therefore, ensure that the metadata you want to retain is actually applicable to the feature level.

Supported formats include

  • AlphaDIA (alphadia)

  • AlphaPept (alphapept)

  • DIANN (diann)

  • MaxQuant (maxquant)

  • Spectronaut (spectronaut, parquet + tsv)

Parameters:
  • file_paths (str | list[str]) – Path to peptide spectrum match reports. If a list of reports is passed, all must be from the same search engine.

  • search_engine (str) – Name of search engine that generated the output, pass the method name of the corresponding reader.

  • level (str (default: 'proteins')) – Level of quantification to read. One of “proteins”, “precursors”, or “genes”. Defaults to “proteins”.

  • intensity_column (Optional[str] (default: None)) – Column that holds the quantified intensities in the PSM table. Defaults to the pre-configured protein intensities value in alphabase.

  • feature_id_column (Optional[str] (default: None)) – Column that holds the feature identifier in the PSM table. Defaults to proteins and the pre-configured value in alphabase.

  • sample_id_column (Optional[str] (default: None)) – Column that holds the sample identifier in the PSM table. Defaults to the pre-configured value in alphabase.

  • var_columns (Union[str, list[str], None] (default: None)) – Additional columns to annotate features in the adata.var table. Can be a single column name or a list of column names. Defaults to None.

  • obs_columns (Union[str, list[str], None] (default: None)) – Additional columns to annotate observations in the adata.obs table. Can be a single column name or a list of column names. Defaults to None.

  • **reader_kwargs – Keyword arguments passed to alphabase.psm_reader.psm_reader_provider.get_reader()

Return type:

AnnData

Returns:

anndata.AnnData AnnData object that can be further processed with scVerse packages.

  • adata.X

    Stores values of the intensity columns in the report of shape observations x features

  • adata.obs

    Stores observations with protein group matrix sample names as sample_id column.

  • adata.var

    Stores features and feature metadata with standardized alphabase names.

Example

import alphapepttools as at

alphadia_path = ...
adata = at.io.read_psm_table(alhpadia_path, search_engine="alphadia")

See also

alphabase.psm_reader