alphapepttools.io.read_pg_table

Contents

alphapepttools.io.read_pg_table#

alphapepttools.io.read_pg_table(path, search_engine, *, column_mapping=None, measurement_regex=None, **reader_provider_kwargs)#

Read protein group table to the anndata.AnnData format

Read (features x observations) protein group matrices from proteomics search engines into the anndata.AnnData format (observations x features). Per default, raw intensities are returned, which can be modified dependening on the search engine. If a single unique feature index could be derived from the input, the function will assign it as var index. Otherwise, an ascending integer var index will be used.

Supported formats include

  • AlphaDIA (alphadia)

  • AlphaPept (alphapept, csv+hdf)

  • DIANN (diann)

  • MaxQuant (maxquant)

  • Spectronaut (spectronaut, parquet + tsv)

See alphabase.pg_reader module for more information

Parameters:
  • path (str) – Path to protein group matrix

  • search_engine (str) – Name of engine output, pass the method name of the corresponding reader.

  • column_mapping (Optional[dict[str, Any]] (default: None)) – Passed to alphabase.pg_reader.pg_reader_provider.get_reader(). A dictionary of mapping alphabase columns (keys) to the corresponding columns in the other search engine (values). If None will be loaded from the column_mapping key of the respective search engine in pg_reader.yaml.

  • measurement_regex (Optional[str] (default: None)) – Passed to alphabase.pg_reader.pg_reader_provider.get_reader(). Regular expression that identifies correct measurement type. Only relevant if PG matrix contains multiple measurement types. For example, alphapept returns the raw protein intensity per sample in column A and the LFQ corrected value in A_LFQ. If None loads raw intensities.

  • reader_provider_kwargs – Passed to alphabase.pg_reader.pg_reader_provider.get_reader()

Return type:

AnnData

Returns:

anndata.AnnData AnnData object that can be further processed with scVerse packages.

  • adata.X

    Stores values of the intensity columns in the report of shape observations x features

  • adata.obs

    Stores observations with protein group matrix sample names as sample_id column.

  • adata.var

    Stores features and feature metadata.

Example

from alphapepttools.io import read_pg_table

alphadia_path = ...
adata = read_pg_table(alphadia_path, search_engine="alphadia")

maxquant_path = ...
# Read LFQ values from MaxQuant report
adata = read_pg_table(maxquant_path, search_engine="maxquant", measurement_regex="lfq")

Get available regular expressions

from alphabase.pg_reader import pg_reader_provider

alphapept_reader = pg_reader_provider.get_reader("alphapept")
alphapept_reader.get_preconfigured_regex()
> {'raw': '^.*(?<!_LFQ)$', 'lfq': '_LFQ$'}

See also

alphabase.pg_reader