alphapepttools.io.read_pg_table

Contents

alphapepttools.io.read_pg_table#

alphapepttools.io.read_pg_table(path, search_engine, *, additional_column_mapping=None, **reader_provider_kwargs)#

Read protein group table to the anndata.AnnData format

Read (features x observations) protein group matrices from proteomics search engines into the anndata.AnnData format (observations x features). Per default, raw intensities are returned, which can be modified dependening on the search engine. If a single unique feature index could be derived from the input, the function will assign it as var index. Otherwise, an ascending integer var index will be used.

Supported formats include

  • AlphaDIA (alphadia)

  • AlphaPept (alphapept, csv+hdf)

  • DIANN (diann)

  • MaxQuant (maxquant)

  • Spectronaut (spectronaut, parquet + tsv)

See alphabase.pg_reader module for more information

Parameters:
  • path (str) – Path to protein group matrix

  • search_engine (str) – Name of engine output, pass the method name of the corresponding reader.

  • additional_column_mapping (dict[str, Any] | None (default: None)) – Extend the default mapping of protein group table columns to standardized alphabase columns with custom columns. Passed as a dictionary of mapping the new column key to the corresponding columns in the search engine protein group table (values)

  • reader_provider_kwargs

    Passed to alphabase.pg_reader.pg_reader_provider.get_reader(), especially: - column_mapping

    A dictionary of mapping alphabase columns (keys) to the corresponding columns in the other search engine (values). If None will be loaded from the column_mapping key of the respective search engine in pg_reader.yaml.

    • measurement_regex

      Regular expression that identifies correct measurement type. Only relevant if PG matrix contains multiple measurement types. For example, alphapept returns the raw protein intensity per sample in column A and the LFQ corrected value in A_LFQ. If None loads raw intensities.

Return type:

AnnData

Returns:

anndata.AnnData AnnData object that can be further processed with scVerse packages.

  • adata.X

    Stores values of the intensity columns in the report of shape observations x features

  • adata.obs

    Stores observations with protein group matrix sample names as sample_id column.

  • adata.var

    Stores features and feature metadata.

Example

import alphapepttools as apt

alphadia_path = ...
adata = apt.io.read_pg_table(alphadia_path, search_engine="alphadia")

maxquant_path = ...
# Read LFQ values from MaxQuant report
adata = apt.io.read_pg_table(maxquant_path, search_engine="maxquant", measurement_regex="lfq")

If a specific column is missing in the output, you can add it via the add_column_mapping argument:

# Spectronaut reports can contain custom columns, they might be missing in the alphabase default mapping
spectronaut_path = ...
apt.io.read_pg_table(
    spectronaut_path, search_engine="spectronaut", additional_column_mapping={"new_name": "name_in_pg_table"}
)

Get available regular expressions

from alphabase.pg_reader import pg_reader_provider

alphapept_reader = pg_reader_provider.get_reader("alphapept")
alphapept_reader.get_preconfigured_regex()
> {'raw': '^.*(?<!_LFQ)$', 'lfq': '_LFQ$'}

See also

alphabase.pg_reader