Interface
General utilities
Several generic utility functions include:
- Callback function to track progress
- Logging function
- Version/hardware/settings checks
tqdm_wrapper
tqdm_wrapper (pbar, update:float)
Update a qdm progress bar.
Args: pbar (type): a tqd,.tqdm objet. update (float): The new value for the progressbar.
check_version_and_hardware
check_version_and_hardware (settings:dict)
Show platform and python information and parse settings.
Args: settings (dict): A dictionary with settings how to process the data.
Returns: dict: The parsed settings.
wrapped_partial
wrapped_partial (func:<built-infunctioncallable>, *args, **kwargs)
Wrap a function with partial args and kwargs.
Args: func (callable): The function to be wrapped. *args (type): Args to be wrapped. **kwargs (type): Kwargs to be wrapped.
Returns: callable: The wrapped function.
Functions
The implemented functions are as follows:
- Create database
- Import raw data
- Perform feature finding
- Search data with fasta
- Recalibrate
- Score data with fasta
- Perform LFQ
- Export results
- Run whole workflow
The last command allows to run the whole pipeline at once.
create_database
create_database (settings:dict, logger_set:bool=False, settings_parsed:bool=False, callback:<built- infunctioncallable>=None)
Create the search database.
Args: settings (dict): A dictionary with settings how to process the data. logger_set (bool): If False, reset the default logger. Defaults to False. settings_parsed (bool): If True, reparse the settings. Defaults to False. callback (callable): A function that accepts a float between 0 and 1 as progress. Defaults to None.
Returns: dict: the parsed settings.
Raises: FileNotFoundError: If the FASTA file is not found.
import_raw_data
import_raw_data (settings:dict, logger_set:bool=False, settings_parsed:bool=False, callback:<built- infunctioncallable>=None)
Import raw data.
Args: settings (dict): A dictionary with settings how to process the data. logger_set (bool): If False, reset the default logger. Defaults to False. settings_parsed (bool): If True, reparse the settings. Defaults to False. callback (callable): A function that accepts a float between 0 and 1 as progress. Defaults to None.
Returns: dict: the parsed settings.
feature_finding
feature_finding (settings:dict, logger_set:bool=False, settings_parsed:bool=False, callback:<built- infunctioncallable>=None)
Find features.
Args: settings (dict): A dictionary with settings how to process the data. logger_set (bool): If False, reset the default logger. Defaults to False. settings_parsed (bool): If True, reparse the settings. Defaults to False. callback (callable): A function that accepts a float between 0 and 1 as progress. Defaults to None.
Returns: dict: the parsed settings.
search_data
search_data (settings:dict, first_search:bool=True, logger_set:bool=False, settings_parsed:bool=False, callback:<built-infunctioncallable>=None)
Create the search database.
Args: settings (dict): A dictionary with settings how to process the data. first_search (bool): If True, save the intermediary results as first search
. Otherwise, calibrate mz_values are used and results are saved as second search
. Defaults to True. logger_set (bool): If False, reset the default logger. Defaults to False. settings_parsed (bool): If True, reparse the settings. Defaults to False. callback (callable): A function that accepts a float between 0 and 1 as progress. Defaults to None.
Returns: dict: the parsed settings.
Raises: FileNotFoundError: If the FASTA file is not found.
recalibrate_data
recalibrate_data (settings:dict, logger_set:bool=False, settings_parsed:bool=False, callback:<built- infunctioncallable>=None)
Recalibrate mz values.
Args: settings (dict): A dictionary with settings how to process the data. logger_set (bool): If False, reset the default logger. Defaults to False. settings_parsed (bool): If True, reparse the settings. Defaults to False. callback (callable): A function that accepts a float between 0 and 1 as progress. Defaults to None.
Returns: dict: the parsed settings.
score
score (settings:dict, pept_dict:dict=None, fasta_dict:dict=None, logger_set:bool=False, settings_parsed:bool=False, callback:<built-infunctioncallable>=None)
Score PSMs and calculate FDR.
Args: settings (dict): A dictionary with settings how to process the data. pept_dict (dict): A dictionary with peptides. Defaults to None. fasta_dict (dict): A dictionary with fasta sequences. Defaults to None. logger_set (bool): If False, reset the default logger. Defaults to False. settings_parsed (bool): If True, reparse the settings. Defaults to False. callback (callable): A function that accepts a float between 0 and 1 as progress. Defaults to None.
Returns: dict: the parsed settings.
isobaric_labeling
isobaric_labeling (settings:dict, logger_set:bool=False, settings_parsed:bool=False, callback:<built- infunctioncallable>=None)
Search for isobaric labels.
Args: settings (dict): A dictionary with settings how to process the data. logger_set (bool): If False, reset the default logger. Defaults to False. settings_parsed (bool): If True, reparse the settings. Defaults to False. callback (callable): A function that accepts a float between 0 and 1 as progress. Defaults to None.
Returns: dict: the parsed settings.
protein_grouping
protein_grouping (settings:dict, pept_dict:dict=None, fasta_dict:dict=None, logger_set:bool=False, settings_parsed:bool=False, callback:<built- infunctioncallable>=None)
Group peptides into proteins.
Args: settings (dict): A dictionary with settings how to process the data. pept_dict (dict): A dictionary with peptides. Defaults to None. fasta_dict (dict): A dictionary with fasta sequences. Defaults to None. logger_set (bool): If False, reset the default logger. Defaults to False. settings_parsed (bool): If True, reparse the settings. Defaults to False. callback (callable): A function that accepts a float between 0 and 1 as progress. Defaults to None.
Returns: dict: the parsed settings.
match
match (settings:dict, logger_set:bool=False, settings_parsed:bool=False, callback:<built-infunctioncallable>=None)
Match datasets.
Args: settings (dict): A dictionary with settings how to process the data. logger_set (bool): If False, reset the default logger. Defaults to False. settings_parsed (bool): If True, reparse the settings. Defaults to False. callback (callable): A function that accepts a float between 0 and 1 as progress. Defaults to None.
Returns: dict: the parsed settings.
align
align (settings:dict, logger_set:bool=False, settings_parsed:bool=False, callback:<built-infunctioncallable>=None)
Align multiple samples.
Args: settings (dict): A dictionary with settings how to process the data. logger_set (bool): If False, reset the default logger. Defaults to False. settings_parsed (bool): If True, reparse the settings. Defaults to False. callback (callable): A function that accepts a float between 0 and 1 as progress. Defaults to None.
Returns: dict: the parsed settings.
read_label_intensity
read_label_intensity (df:pandas.core.frame.DataFrame, label:<class'NamedTuple'>)
Reads the label intensities from peptides and sums them by protein group.
Args: df (pd.DataFrame): Table with peptide information. label (NamedTuple): Label used for the experiment.
Returns: pd.DataFrame: Summary protein table containing proteins and their intensity for each channel.
quantification
quantification (settings:dict, logger_set:bool=False, settings_parsed:bool=False, callback:<built- infunctioncallable>=None)
Normalize and quantify datasets.
Args: settings (dict): A dictionary with settings how to process the data. logger_set (bool): If False, reset the default logger. Defaults to False. settings_parsed (bool): If True, reparse the settings. Defaults to False. callback (callable): A function that accepts a float between 0 and 1 as progress. Defaults to None.
Returns: dict: the parsed settings.
export
export (settings:dict, logger_set:bool=False, settings_parsed:bool=False, callback:<built-infunctioncallable>=None)
Export settings.
Args: settings (dict): A dictionary with settings how to process the data. logger_set (bool): If False, reset the default logger. Defaults to False. settings_parsed (bool): If True, reparse the settings. Defaults to False. callback (callable): A function that accepts a float between 0 and 1 as progress. Defaults to None.
Returns: dict: the parsed settings.
run_complete_workflow
run_complete_workflow (settings:dict, progress:bool=False, logger_set:bool=False, settings_parsed:bool=False, callback:<built-infunctioncallable>=None, callback_overall:<built-infunctioncallable>=None, callback_task:<built-infunctioncallable>=None, logfile:str=None)
Run all AlphaPept steps from a settings dict.
Args: settings (dict): A dictionary with settings how to process the data. progress (bool): Track progress. Defaults to False. logger_set (bool): If False, reset the default logger. Defaults to False. settings_parsed (bool): If True, reparse the settings. Defaults to False. callback (callable): A function that accepts a float between 0 and 1 as progress. Defaults to None. callback_overall (callable): Same as callback, but for the overall progress. Defaults to None. callback_task (callable): Same as callback, but for the task progress. Defaults to None. logfile (str): The name of a logfile. Defaults to None.
Returns: dict: the parsed settings.
parallel_execute
parallel_execute (settings:dict, step:<built-infunctioncallable>, callback:<built-infunctioncallable>=None)
Generic function to execute worklow steps in parallel on a per-file basis.
Args: settings (dict): The settings for processing the step function. step (callable): A function that accepts settings as input parameter. callback (callable): A function that accepts a float between 0 and 1 as progress. Defaults to None.
Returns: dict: The settings after processing.
Raises: NotImplementedError: When the step is feature finding on files other then Thermo or Bruker.
get_summary
get_summary (settings:dict, summary:dict)
Append file summary statistics to a summary dictionary.
Args: settings (dict): A dictionary with settings how to process the data. summary (dict): A dictionary with summary statistics of the experiment.
Returns: dict: The summary in which file summary statistcs are appended.
get_file_summary
get_file_summary (ms_data:alphapept.io.MS_Data_File, fields:list)
Get summarize statitics from an MS_Data file.
Args: ms_data (alphapept.io.MS_Data_File): An MS_Data file which has been fully identified and quantified. fields (list): A list with colum names to calculate summary statistics.
Returns: dict: A dictionary with summary statistics.
extract_median_unique
extract_median_unique (settings:dict, fields:list, summary_type='filename')
Extract the medion protein FDR and number of unique proteins.
Args: settings (dict): A dictionary with settings how to process the data. fields (list): A list with colum names to calculate summary statistics. summary_type (str): A str of column name used for summarizing (‘filename’ or ‘sample_group’)
Returns: tuple: Two arrays with the median protein FDR per file/sample_group and the unique number of protein hits
CLI
All workflow functions can be called with the command line interface (CLI). To implement this CLI, we use the click package.
In brief, click
allows to create a CLI with minimal effort by simply adding decorators to already defined functions. These decorators create a help text for each function and describe all their parameters. Functions that are decorated by click
can be added to a central run_cli
functions to be incorporated in the CLI automatically.
While AlphaTims allows modular execution of individual steps to process MS data, it is common for these steps to be combined and reuse multiple parameters. We therefore opt to use a singe YAML settings file containing all parameters in dictionary format as a single parameter instead of providing all parameters individually to each function.
<Command gui> (*args:Any, **kwargs:Any)
<Command workflow> (*args:Any, **kwargs:Any)
<Command export> (*args:Any, **kwargs:Any)
<Command quantify> (*args:Any, **kwargs:Any)
<Command match> (*args:Any, **kwargs:Any)
<Command align> (*args:Any, **kwargs:Any)
<Command score> (*args:Any, **kwargs:Any)
<Command recalibrate> (*args:Any, **kwargs:Any)
<Command search> (*args:Any, **kwargs:Any)
<Command features> (*args:Any, **kwargs:Any)
<Command import> (*args:Any, **kwargs:Any)
<Command database> (*args:Any, **kwargs:Any)
<Group cli-overview> (*args:Any, **kwargs:Any)
run_cli
run_cli ()
Run the command line interface.
is_port_in_use
is_port_in_use (port:int)
bcolors
bcolors ()
Initialize self. See help(type(self)) for accurate signature.