Interface

General utilities

Several generic utility functions include:

Callback function to track progress
Logging function
Version/hardware/settings checks

tqdm_wrapper

 tqdm_wrapper (pbar, update:float)

Update a qdm progress bar.

Args: pbar (type): a tqd,.tqdm objet. update (float): The new value for the progressbar.

source

check_version_and_hardware

 check_version_and_hardware (settings:dict)

Show platform and python information and parse settings.

Args: settings (dict): A dictionary with settings how to process the data.

Returns: dict: The parsed settings.

source

wrapped_partial

 wrapped_partial (func:<built-infunctioncallable>, *args, **kwargs)

Wrap a function with partial args and kwargs.

Args: func (callable): The function to be wrapped. *args (type): Args to be wrapped. **kwargs (type): Kwargs to be wrapped.

Returns: callable: The wrapped function.

Functions

The implemented functions are as follows:

Create database
Import raw data
Perform feature finding
Search data with fasta
Recalibrate
Score data with fasta
Perform LFQ
Export results
Run whole workflow

The last command allows to run the whole pipeline at once.

source

create_database

 create_database (settings:dict, logger_set:bool=False,
                  settings_parsed:bool=False, callback:<built-
                  infunctioncallable>=None)

Create the search database.

Args: settings (dict): A dictionary with settings how to process the data. logger_set (bool): If False, reset the default logger. Defaults to False. settings_parsed (bool): If True, reparse the settings. Defaults to False. callback (callable): A function that accepts a float between 0 and 1 as progress. Defaults to None.

Returns: dict: the parsed settings.

Raises: FileNotFoundError: If the FASTA file is not found.

source

import_raw_data

 import_raw_data (settings:dict, logger_set:bool=False,
                  settings_parsed:bool=False, callback:<built-
                  infunctioncallable>=None)

Import raw data.

Returns: dict: the parsed settings.

source

feature_finding

 feature_finding (settings:dict, logger_set:bool=False,
                  settings_parsed:bool=False, callback:<built-
                  infunctioncallable>=None)

Find features.

Returns: dict: the parsed settings.

source

search_data

 search_data (settings:dict, first_search:bool=True,
              logger_set:bool=False, settings_parsed:bool=False,
              callback:<built-infunctioncallable>=None)

Create the search database.

Args: settings (dict): A dictionary with settings how to process the data. first_search (bool): If True, save the intermediary results as first search. Otherwise, calibrate mz_values are used and results are saved as second search. Defaults to True. logger_set (bool): If False, reset the default logger. Defaults to False. settings_parsed (bool): If True, reparse the settings. Defaults to False. callback (callable): A function that accepts a float between 0 and 1 as progress. Defaults to None.

Returns: dict: the parsed settings.

Raises: FileNotFoundError: If the FASTA file is not found.

source

recalibrate_data

 recalibrate_data (settings:dict, logger_set:bool=False,
                   settings_parsed:bool=False, callback:<built-
                   infunctioncallable>=None)

Recalibrate mz values.

Returns: dict: the parsed settings.

source

score

 score (settings:dict, pept_dict:dict=None, fasta_dict:dict=None,
        logger_set:bool=False, settings_parsed:bool=False,
        callback:<built-infunctioncallable>=None)

Score PSMs and calculate FDR.

Args: settings (dict): A dictionary with settings how to process the data. pept_dict (dict): A dictionary with peptides. Defaults to None. fasta_dict (dict): A dictionary with fasta sequences. Defaults to None. logger_set (bool): If False, reset the default logger. Defaults to False. settings_parsed (bool): If True, reparse the settings. Defaults to False. callback (callable): A function that accepts a float between 0 and 1 as progress. Defaults to None.

Returns: dict: the parsed settings.

source

isobaric_labeling

 isobaric_labeling (settings:dict, logger_set:bool=False,
                    settings_parsed:bool=False, callback:<built-
                    infunctioncallable>=None)

Search for isobaric labels.

Returns: dict: the parsed settings.

source

protein_grouping

 protein_grouping (settings:dict, pept_dict:dict=None,
                   fasta_dict:dict=None, logger_set:bool=False,
                   settings_parsed:bool=False, callback:<built-
                   infunctioncallable>=None)

Group peptides into proteins.

Returns: dict: the parsed settings.

source

match

 match (settings:dict, logger_set:bool=False, settings_parsed:bool=False,
        callback:<built-infunctioncallable>=None)

Match datasets.

Returns: dict: the parsed settings.

source

align

 align (settings:dict, logger_set:bool=False, settings_parsed:bool=False,
        callback:<built-infunctioncallable>=None)

Align multiple samples.

Returns: dict: the parsed settings.

source

read_label_intensity

 read_label_intensity (df:pandas.core.frame.DataFrame,
                       label:<class'NamedTuple'>)

Reads the label intensities from peptides and sums them by protein group.

Args: df (pd.DataFrame): Table with peptide information. label (NamedTuple): Label used for the experiment.

Returns: pd.DataFrame: Summary protein table containing proteins and their intensity for each channel.

source

quantification

 quantification (settings:dict, logger_set:bool=False,
                 settings_parsed:bool=False, callback:<built-
                 infunctioncallable>=None)

Normalize and quantify datasets.

Returns: dict: the parsed settings.

source

export

 export (settings:dict, logger_set:bool=False, settings_parsed:bool=False,
         callback:<built-infunctioncallable>=None)

Export settings.

Returns: dict: the parsed settings.

source

run_complete_workflow

 run_complete_workflow (settings:dict, progress:bool=False,
                        logger_set:bool=False, settings_parsed:bool=False,
                        callback:<built-infunctioncallable>=None,
                        callback_overall:<built-infunctioncallable>=None,
                        callback_task:<built-infunctioncallable>=None,
                        logfile:str=None)

Run all AlphaPept steps from a settings dict.

Args: settings (dict): A dictionary with settings how to process the data. progress (bool): Track progress. Defaults to False. logger_set (bool): If False, reset the default logger. Defaults to False. settings_parsed (bool): If True, reparse the settings. Defaults to False. callback (callable): A function that accepts a float between 0 and 1 as progress. Defaults to None. callback_overall (callable): Same as callback, but for the overall progress. Defaults to None. callback_task (callable): Same as callback, but for the task progress. Defaults to None. logfile (str): The name of a logfile. Defaults to None.

Returns: dict: the parsed settings.

source

parallel_execute

 parallel_execute (settings:dict, step:<built-infunctioncallable>,
                   callback:<built-infunctioncallable>=None)

Generic function to execute worklow steps in parallel on a per-file basis.

Args: settings (dict): The settings for processing the step function. step (callable): A function that accepts settings as input parameter. callback (callable): A function that accepts a float between 0 and 1 as progress. Defaults to None.

Returns: dict: The settings after processing.

Raises: NotImplementedError: When the step is feature finding on files other then Thermo or Bruker.

source

get_summary

 get_summary (settings:dict, summary:dict)

Append file summary statistics to a summary dictionary.

Args: settings (dict): A dictionary with settings how to process the data. summary (dict): A dictionary with summary statistics of the experiment.

Returns: dict: The summary in which file summary statistcs are appended.

source

get_file_summary

 get_file_summary (ms_data:alphapept.io.MS_Data_File, fields:list)

Get summarize statitics from an MS_Data file.

Args: ms_data (alphapept.io.MS_Data_File): An MS_Data file which has been fully identified and quantified. fields (list): A list with colum names to calculate summary statistics.

Returns: dict: A dictionary with summary statistics.

source

extract_median_unique

 extract_median_unique (settings:dict, fields:list,
                        summary_type='filename')

Extract the medion protein FDR and number of unique proteins.

Args: settings (dict): A dictionary with settings how to process the data. fields (list): A list with colum names to calculate summary statistics. summary_type (str): A str of column name used for summarizing (‘filename’ or ‘sample_group’)

Returns: tuple: Two arrays with the median protein FDR per file/sample_group and the unique number of protein hits

CLI

All workflow functions can be called with the command line interface (CLI). To implement this CLI, we use the click package.

In brief, click allows to create a CLI with minimal effort by simply adding decorators to already defined functions. These decorators create a help text for each function and describe all their parameters. Functions that are decorated by click can be added to a central run_cli functions to be incorporated in the CLI automatically.

While AlphaTims allows modular execution of individual steps to process MS data, it is common for these steps to be combined and reuse multiple parameters. We therefore opt to use a singe YAML settings file containing all parameters in dictionary format as a single parameter instead of providing all parameters individually to each function.

 <Command gui> (*args:Any, **kwargs:Any)

 <Command workflow> (*args:Any, **kwargs:Any)

 <Command export> (*args:Any, **kwargs:Any)

 <Command quantify> (*args:Any, **kwargs:Any)

 <Command match> (*args:Any, **kwargs:Any)

 <Command align> (*args:Any, **kwargs:Any)

 <Command score> (*args:Any, **kwargs:Any)

 <Command recalibrate> (*args:Any, **kwargs:Any)

 <Command search> (*args:Any, **kwargs:Any)

 <Command features> (*args:Any, **kwargs:Any)

 <Command import> (*args:Any, **kwargs:Any)

 <Command database> (*args:Any, **kwargs:Any)

 <Group cli-overview> (*args:Any, **kwargs:Any)

source

run_cli

 run_cli ()

Run the command line interface.

source

is_port_in_use

 is_port_in_use (port:int)

source

bcolors

 bcolors ()

Initialize self. See help(type(self)) for accurate signature.