Settings

A template for settings

AlphaPept stores all settings in *.yaml-files. This notebook contains functions to load, save, and print settings. Additionally, a settings template is defined. Here we define parameters, default values, and a range and what kind of parameter this is (e.g., float value, list, etc.). The idea here is to have definitions to automatically create graphical user interfaces for the settings.

Settings

Saving and Loading

The default scheme for saving settings are *.yaml-files. These files can be easily modified when opening with a text editor.

source

save_settings

 save_settings (settings:dict, path:str)

Save settings file to path.

Args: settings (dict): A yaml dictionary. path (str): Path to the settings file.

source

load_settings_as_template

 load_settings_as_template (path:str)

Loads settings but removes fields that contain summary information.

Args: path (str): Path to the settings file.

source

load_settings

 load_settings (path:str)

Load a yaml settings file.

Args: path (str): Path to the settings file.

source

print_settings

 print_settings (settings:dict)

Print a yaml settings file

Args: settings (dict): A yaml dictionary.

settings = {'field1': 0,'summary':123}
dummy_path = 'to_delete.yaml'

print('--- print_settings ---')
print_settings(settings)

save_settings(settings, dummy_path)

print('--- load_settings ---')

print_settings(load_settings(dummy_path))

print('--- load_settings_as_template ---')

print_settings(load_settings_as_template(dummy_path))

--- print_settings ---
field1: 0
summary: 123

--- load_settings ---
field1: 0
summary: 123

--- load_settings_as_template ---
field1: 0

Settings Template

The settings template defines individual settings. The idea is to provide a template so that a graphical user interface can be automatically generated. The list below represents what each item would be when using streamlit. This could be adapted for any kind of GUI library.

Each entry has a type, default values, and a description.

spinbox -> st.range, range with minimum and maximum values (int)
doublespinbox -> st.range, range with minimum and maximum values (float)
path -> st.button, clickable button to select a path to save / load files.
combobox -> st.selectbox, dropdown menu with values to choose from
checkbox -> st.checkbox, checkbox that can be selected
checkgroup -> st.multiselect, creates a list of options that can be selected
string -> st.text_input, generic string input
list -> Creates a list that is displayed
placeholder -> This just prints the parameter and cannot be changed

Worfklow settings

Workflow settings regarding the workflow - which algorithmic steps should be performed.

print(yaml.dump(SETTINGS_TEMPLATE['workflow']))

align:
  default: false
  description: Flag to align the data.
  type: checkbox
continue_runs:
  default: false
  description: Flag to continue previously computated runs. If False existing ms_data
    will be deleted.
  type: checkbox
create_database:
  default: true
  description: Flag to create a database.
  type: checkbox
find_features:
  default: true
  description: Flag to perform feature finding.
  type: checkbox
import_raw_data:
  default: true
  description: Flag to import the raw data.
  type: checkbox
lfq_quantification:
  default: true
  description: Flag to perfrom lfq normalization.
  type: checkbox
match:
  default: false
  description: Flag to perform match-between runs.
  type: checkbox
recalibrate_data:
  default: true
  description: Flag to perform recalibration.
  type: checkbox
search_data:
  default: true
  description: Flag to perform search.
  type: checkbox

print(yaml.dump(SETTINGS_TEMPLATE['general']))

n_processes:
  default: 60
  description: Maximum number of processes for multiprocessing. If larger than number
    of processors it will be capped.
  max: 60
  min: 1
  type: spinbox

Experimental Settings

Core defintions of the experiment, regarding the filepaths..

print(yaml.dump(SETTINGS_TEMPLATE['experiment']))

database_path:
  default: null
  description: Path to library file (.hdf).
  filetype:
  - hdf
  folder: false
  type: path
fasta_paths:
  default: []
  description: List of paths for FASTA files.
  type: list
file_paths:
  default: []
  description: Filepaths of the experiments.
  type: list
fraction:
  default: []
  description: List of fraction numbers for fractionated samples.
  type: list
matching_group:
  default: []
  description: List of macthing groups for the raw files. This only allows match-between-runs
    of files within the same groups.
  type: list
results_path:
  default: null
  description: Path where the results should be stored.
  filetype:
  - hdf
  folder: false
  type: path
sample_group:
  default: []
  description: Sample group, for raw files that should be quanted together.
  type: list
shortnames:
  default: []
  description: List of shortnames for the raw files.
  type: list

Raw file handling

print(yaml.dump(SETTINGS_TEMPLATE['raw']))

n_most_abundant:
  default: 400
  description: Number of most abundant peaks to be isolated from raw spectra.
  max: 1000
  min: -1
  type: spinbox
use_profile_ms1:
  default: false
  description: Use profile data for MS1 and perform own centroiding.
  type: checkbox

FASTA settings

print(yaml.dump(SETTINGS_TEMPLATE['fasta']))

AL_swap:
  default: false
  description: Swap A and L for decoy generation.
  type: checkbox
KR_swap:
  default: false
  description: Swap K and R (only if terminal) for decoy generation.
  type: checkbox
fasta_block:
  default: 1000
  description: Number of fasta entries to be processed in one block.
  max: 10000
  min: 100
  type: spinbox
fasta_size_max:
  default: 100
  description: Maximum size of FASTA (MB) when switching on-the-fly.
  max: 1000000
  min: 1
  type: spinbox
isoforms_max:
  default: 1024
  description: Maximum number of isoforms per peptide.
  max: 4096
  min: 1
  type: spinbox
mods_fixed:
  default:
  - cC
  description: Fixed modifications.
  type: checkgroup
  value:
    aK: acetylation of lysine
    cC: carbamidomethylation of C
    deamN: deamidation of N
    deamQ: deamidation of Q
    eK: EASItag 6-plex on K
    itraq4K: iTRAQ 4-plex on K
    itraq4Y: iTRAQ 4-plex on Y
    itraq8K: iTRAQ 8-plex on K
    itraq8Y: iTRAQ 8-plex on Y
    oxM: oxidation of M
    pS: phosphorylation of S
    pT: phosphorylation of T
    pY: phosphorylation of Y
    tmt0K: TMT zero on K
    tmt0Y: TMT zero on Y
    tmt2K: TMT duplex on K
    tmt2Y: TMT duplex on Y
    tmt6K: TMT sixplex/tenplex on K
    tmt6Y: TMT sixplex/tenplex on Y
mods_fixed_terminal:
  default: []
  description: Fixed terminal modifications.
  type: checkgroup
  value:
    arg10>R: Arg 10 on peptide C-terminus
    arg6>R: Arg 6 on peptide C-terminus
    cm<C: pyro-cmC
    e<^: EASItag 6-plex on peptide N-terminus
    itraq4K<^: iTRAQ 4-plex on peptide N-terminus
    itraq8K<^: iTRAQ 8-plex on peptide N-terminus
    lys8>K: Lys 8 on peptide C-terminus
    pg<E: pyro-E
    pg<Q: pyro-Q
    tmt0<^: TMT zero on peptide N-terminus
    tmt2<^: TMT duplex on peptide N-terminus
    tmt6<^: TMT sixplex/tenplex on peptide N-terminus
mods_fixed_terminal_prot:
  default: []
  description: Fixed terminal modifications on proteins.
  type: checkgroup
  value:
    a<^: acetylation of protein N-terminus
    am>^: amidation of protein C-terminus
mods_variable:
  default:
  - oxM
  description: Variable modifications.
  type: checkgroup
  value:
    aK: acetylation of lysine
    cC: carbamidomethylation of C
    deamN: deamidation of N
    deamQ: deamidation of Q
    eK: EASItag 6-plex on K
    itraq4K: iTRAQ 4-plex on K
    itraq4Y: iTRAQ 4-plex on Y
    itraq8K: iTRAQ 8-plex on K
    itraq8Y: iTRAQ 8-plex on Y
    oxM: oxidation of M
    pS: phosphorylation of S
    pT: phosphorylation of T
    pY: phosphorylation of Y
    tmt0K: TMT zero on K
    tmt0Y: TMT zero on Y
    tmt2K: TMT duplex on K
    tmt2Y: TMT duplex on Y
    tmt6K: TMT sixplex/tenplex on K
    tmt6Y: TMT sixplex/tenplex on Y
mods_variable_terminal:
  default: []
  description: Varibale terminal modifications.
  type: checkgroup
  value:
    arg10>R: Arg 10 on peptide C-terminus
    arg6>R: Arg 6 on peptide C-terminus
    cm<C: pyro-cmC
    e<^: EASItag 6-plex on peptide N-terminus
    itraq4K<^: iTRAQ 4-plex on peptide N-terminus
    itraq8K<^: iTRAQ 8-plex on peptide N-terminus
    lys8>K: Lys 8 on peptide C-terminus
    pg<E: pyro-E
    pg<Q: pyro-Q
    tmt0<^: TMT zero on peptide N-terminus
    tmt2<^: TMT duplex on peptide N-terminus
    tmt6<^: TMT sixplex/tenplex on peptide N-terminus
mods_variable_terminal_prot:
  default:
  - a<^
  description: Varibale terminal modifications on proteins.
  type: checkgroup
  value:
    a<^: acetylation of protein N-terminus
    am>^: amidation of protein C-terminus
n_missed_cleavages:
  default: 2
  description: Number of missed cleavages.
  max: 99
  min: 0
  type: spinbox
n_modifications_max:
  default: 3
  description: Limit the number of modifications per peptide.
  max: 10
  min: 1
  type: spinbox
pep_length_max:
  default: 27
  description: Maximum peptide length.
  max: 99
  min: 7
  type: spinbox
pep_length_min:
  default: 7
  description: Minimum peptide length.
  max: 99
  min: 7
  type: spinbox
protease:
  default: trypsin
  description: Protease for digestions.
  type: combobox
  value:
  - arg-c
  - asp-n
  - bnps-skatole
  - caspase 1
  - caspase 2
  - caspase 3
  - caspase 4
  - caspase 5
  - caspase 6
  - caspase 7
  - caspase 8
  - caspase 9
  - caspase 10
  - chymotrypsin high specificity
  - chymotrypsin low specificity
  - clostripain
  - cnbr
  - enterokinase
  - factor xa
  - formic acid
  - glutamyl endopeptidase
  - granzyme b
  - hydroxylamine
  - iodosobenzoic acid
  - lys_c
  - lys_c/p
  - lys_n
  - ntcb
  - pepsin ph1.3
  - pepsin ph2.0
  - proline endopeptidase
  - proteinase k
  - staphylococcal peptidase i
  - thermolysin
  - thrombin
  - trypsin_full
  - trypsin_exception
  - non-specific
  - trypsin
pseudo_reverse:
  default: true
  description: Use pseudo-reverse strategy instead of reverse.
  type: checkbox
save_db:
  default: true
  description: Save DB or create on the fly.
  type: checkbox
spectra_block:
  default: 100000
  description: Maximum number of sequences to be collected before theoretical spectra
    are generated.
  max: 1000000
  min: 1000
  type: spinbox

Feature Finding

print(yaml.dump(SETTINGS_TEMPLATE['features']))

centroid_tol:
  default: 8
  max: 25
  min: 1
  type: spinbox
hill_check_large:
  default: 40
  max: 100
  min: 1
  type: spinbox
hill_length_min:
  default: 3
  max: 10
  min: 1
  type: spinbox
hill_nboot:
  default: 150
  max: 500
  min: 1
  type: spinbox
hill_nboot_max:
  default: 300
  max: 500
  min: 1
  type: spinbox
hill_smoothing:
  default: 1
  max: 10
  min: 1
  type: spinbox
hill_split_level:
  default: 1.3
  max: 10.0
  min: 0.1
  type: doublespinbox
iso_charge_max:
  default: 6
  max: 6
  min: 1
  type: spinbox
iso_charge_min:
  default: 1
  max: 6
  min: 1
  type: spinbox
iso_corr_min:
  default: 0.6
  max: 1
  min: 0.1
  type: doublespinbox
iso_mass_range:
  default: 5
  max: 10
  min: 1
  type: spinbox
iso_n_seeds:
  default: 100
  max: 500
  min: 1
  type: spinbox
iso_split_level:
  default: 1.3
  max: 10.0
  min: 0.1
  type: doublespinbox
map_mob_range:
  default: 0.3
  max: 1
  min: 0.1
  type: doublespinbox
map_mz_range:
  default: 1.5
  max: 2
  min: 0.1
  type: doublespinbox
map_n_neighbors:
  default: 5
  max: 10
  min: 1
  type: spinbox
map_rt_range:
  default: 0.5
  max: 1
  min: 0.1
  type: doublespinbox
max_gap:
  default: 2
  max: 10
  min: 1
  type: spinbox
search_unidentified:
  default: false
  description: Search MSMS w/o feature.
  type: checkbox

Search

print(yaml.dump(SETTINGS_TEMPLATE['search']))

calibrate:
  default: true
  description: Recalibrate masses.
  type: checkbox
calibration_std_frag:
  default: 5
  description: Std range for fragment tolerance after calibration.
  max: 10
  min: 1
  type: spinbox
calibration_std_prec:
  default: 5
  description: Std range for precursor tolerance after calibration.
  max: 10
  min: 1
  type: spinbox
frag_tol:
  default: 50
  description: Maximum fragment mass tolerance.
  max: 500
  min: 1
  type: spinbox
min_frag_hits:
  default: 7
  description: Minimum number of fragment hits.
  max: 99
  min: 1
  type: spinbox
parallel:
  default: true
  description: Use parallel processing.
  type: checkbox
peptide_fdr:
  default: 0.01
  description: FDR level for peptides.
  max: 1.0
  min: 0.0
  type: doublespinbox
ppm:
  default: true
  description: Use ppm instead of Dalton.
  type: checkbox
prec_tol:
  default: 20
  description: Maximum allowed precursor mass offset.
  max: 500
  min: 1
  type: spinbox
protein_fdr:
  default: 0.01
  description: FDR level for proteins.
  max: 1.0
  min: 0.0
  type: doublespinbox
recalibration_min:
  default: 100
  description: Minimum number of datapoints to perform calibration.
  max: 10000
  min: 100
  type: spinbox
top_n:
  default: 1
  description: Top n selection of peptides for search.
  max: 50
  min: 1
  type: spinbox

Score

print(yaml.dump(SETTINGS_TEMPLATE['score']))

method:
  default: random_forest
  description: Scoring method.
  type: combobox
  value:
  - x_tandem
  - random_forest
  - generic_score
  - morpheus
ml_ini_score:
  default: hits
  description: Initial score for ML. Hits is equivalent to Morpehus score.
  type: combobox
  value:
  - x_tandem
  - hits
  - generic_score

Calibration

print(yaml.dump(SETTINGS_TEMPLATE['calibration']))

calib_mob_range:
  default: 0.3
  description: Scaling factor for mobility axis.
  max: 1.0
  min: 0.0
  type: doublespinbox
calib_mz_range:
  default: 2000
  description: Scaling factor for mz axis in ppm.
  max: 10000
  min: 1
  type: spinbox
calib_n_neighbors:
  default: 100
  description: Number of neighbors that are used for offset interpolation.
  max: 1000
  min: 1
  type: spinbox
calib_rt_range:
  default: 0.5
  description: Scaling factor for rt axis.
  max: 10
  min: 0.0
  type: doublespinbox
outlier_std:
  default: 3
  description: Number of std. deviations to filter outliers in psms.
  max: 5
  min: 1
  type: spinbox

Matching

print(yaml.dump(SETTINGS_TEMPLATE['matching']))

match_d_min:
  default: 3
  description: Minimum distance cutoff for matching.
  max: 10.0
  min: 0.001
  type: doublespinbox
match_group_tol:
  default: 0
  description: When having matching groups, match neighboring groups.
  max: 100
  min: 0
  type: spinbox
match_p_min:
  default: 0.05
  description: Minimum probability cutoff for matching.
  max: 1.0
  min: 0.001
  type: doublespinbox

Isobaric Labeling

isobaric_label = {}

isobaric_label["label"] = {'type':'combobox', 'value':['None','TMT10plex'], 'default':'None', 'description':"Type of isobaric label present."}
isobaric_label["reporter_frag_tolerance"] = {'type':'spinbox', 'min':1, 'max':500, 'default':15, 'description':"Maximum fragment mass tolerance for a reporter."}
isobaric_label["reporter_frag_tolerance_ppm"] = {'type':'checkbox', 'default':True, 'description':"Use ppm instead of Dalton."}

SETTINGS_TEMPLATE["isobaric_label"] = isobaric_label

Quantification

print(yaml.dump(SETTINGS_TEMPLATE['quantification']))

lfq_ratio_min:
  default: 1
  description: Minimum number of ratios for LFQ.
  max: 10
  min: 1
  type: spinbox
max_lfq:
  default: true
  description: Perform max lfq type quantification.
  type: checkbox
mode:
  default: ms1_int_sum_apex
  description: Column to perform quantification on.
  type: combobox
  value:
  - ms1_int_sum_apex

source

hash_file

 hash_file (path)

Helper function to hash a file Taken from https://stackoverflow.com/questions/22058048/hashing-a-file-in-python

source

create_default_settings

 create_default_settings ()