This notebook provides a function to import peptide level data from Spectronaut, MaxQuant, AlphaPept, DIA-NN and FragPipe

The preprocessed data is stored in a pandas dataframe with following columns:

  • all_protein_ids: all UniProt IDs the peptide map to separated by ';'
  • modified_sequence: the peptide sequence with all modifications included in square brackets
  • naked_sequence: the naked peptide sequence

It is possible to further select one or more specific samples for import. A single sample can be provided as character string. Multiple samples can be provided as list of character strings. The raw MS filename should match corresponding entries in the "R.FileName", "Raw file", "shortname" or "Run" column of the Spectronaut, MaxQuant, AlphaPept or DIA-NN analysis respectively. In the FragPipe "combined_peptide.tsv" file all 'Spectral Count' columns are used to extract information about individual experiments.

read_file[source]

read_file(file:str, column_names:list)

Load a specified columns of the file as a pandas dataframe.

Args: file (str): The name of a file. column_names (list): The list of three columns that should be extracted from the file.

Raises: NotImplementedError: if a specified file has not a .csv, .txt or .tsv extension. ValueError: if any of the specified columns is not in the file.

Returns: pd.DataFrame: A pandas dataframe with all the data stored in the specified columns.

extract_rawfile_unique_values[source]

extract_rawfile_unique_values(file:str)

Extract the unique raw file names from "R.FileName" (Spectronaut output), "Raw file" (MaxQuant output), "shortname" (AlphaPept output) or "Run" (DIA-NN output) column or from the "Spectral Count" column from the combined_peptide.tsv file without modifications for the FragPipe.

Args: file (str): The name of a file.

Raises: ValueError: if a column with the unique raw file names is not in the file.

Returns: list: A sorted list of unique raw file names from the file.

Import Spectronaut data

import_spectronaut_data[source]

import_spectronaut_data(file:str, sample:Union[str, list, NoneType]=None)

Import peptide level data from Spectronaut.

Args: file (str): The name of a file. sample (Union[str, list, None]): The unique raw file name(s) to filter the original file. Defaults to None. In this case data for all raw files will be extracted.

Returns: pd.DataFrame: A pandas dataframe containing information about: all_protein_ids (str), modified_sequence (str), naked_sequence (str)

Import MaxQuant data

import_maxquant_data[source]

import_maxquant_data(file:str, sample:Union[str, list, NoneType]=None)

Import peptide level data from MaxQuant.

Args: file (str): The name of a file. sample (Union[str, list, None]): The unique raw file name(s) to filter the original file. Defaults to None. In this case data for all raw files will be extracted.

Returns: pd.DataFrame: A pandas dataframe containing information about: all_protein_ids (str), modified_sequence (str), naked_sequence (str)

Import AlphaPept data

convert_ap_mq_mod[source]

convert_ap_mq_mod(sequence:str)

Convert AlphaPept style modifications into MaxQuant style modifications.

Args: sequence (str): The peptide sequence with modification in an AlphaPept style.

Returns: str: The peptide sequence with modification in a similar to MaxQuant style.

import_alphapept_data[source]

import_alphapept_data(file:str, sample:Union[str, list, NoneType]=None)

Import peptide level data from AlphaPept.

Args: file (str): The name of a file. sample (Union[str, list, None]): The unique raw file name(s) to filter the original file. Defaults to None. In this case data for all raw files will be extracted.

Returns: pd.DataFrame: A pandas dataframe containing information about: all_protein_ids (str), modified_sequence (str), naked_sequence (str)

Import DIA-NN data

convert_diann_mq_mod[source]

convert_diann_mq_mod(sequence:str)

Convert DIA-NN style modifications into MaxQuant style modifications.

Args: sequence (str): The peptide sequence with modification in an AlphaPept style.

Returns: str: The peptide sequence with modification in a similar to DIA-NN style.

import_diann_data[source]

import_diann_data(file:str, sample:Union[str, list, NoneType]=None)

Import peptide level data from DIA-NN.

Args: file (str): The name of a file. sample (Union[str, list, None]): The unique raw file name(s) to filter the original file. Defaults to None. In this case data for all raw files will be extracted.

Returns: pd.DataFrame: A pandas dataframe containing information about: all_protein_ids (str), modified_sequence (str), naked_sequence (str)

Import FragPipe/MSFragger data

convert_fragpipe_mq_mod[source]

convert_fragpipe_mq_mod(sequence:str, assigned_modifications:str)

Convert FragPipe style modifications into MaxQuant style modifications.

Args: sequence (str): The peptide sequence with modification. assigned_modifications (str): The string of assigned modifications separated by comma.

Returns: str: The peptide sequence with modification in a similar to DIA-NN style.

import_fragpipe_data[source]

import_fragpipe_data(file:str, sample:Union[str, list, NoneType]=None)

Import peptide level data from FragPipe/MSFragger.

Args: file (str): The name of a file. sample (Union[str, list, None]): The unique raw file name(s) to filter the original file. Defaults to None. In this case data for all raw files will be extracted.

Returns: pd.DataFrame: A pandas dataframe containing information about: all_protein_ids (str), modified_sequence (str), naked_sequence (str)

Aggregated import function

import_data[source]

import_data(file:str, sample:Union[str, list, NoneType]=None, verbose:bool=True, dashboard:bool=False)

Import peptide level data. Depending on available columns in the provided file, the function calls other specific functions for each tool.

Args: file (str): The name of a file. sample (Union[str, list, None]): The unique raw file name(s) to filter the original file. Defaults to None. In this case data for all raw files will be extracted. verbose (bool): if True, print the type of input that is used. Defaults to True. dashboard (bool): If True, the function is used for the dashboard output (StringIO object). Defaults to False.

Raises: TypeError: If the input data format is unknown.

Returns: pd.DataFrame: A pandas dataframe containing information about: all_protein_ids (str), modified_sequence (str), naked_sequence (str)