AlphaPept stores all settings in *.yaml-files. This notebook contains functions to load, save, and print settings. Additionally, a settings template is defined. Here we define parameters, default values, and a range and what kind of parameter this is (e.g., float value, list, etc.). The idea here is to have definitions to automatically create graphical user interfaces for the settings.
Settings
Saving and Loading
The default scheme for saving settings are *.yaml-files. These files can be easily modified when opening with a text editor.
The settings template defines individual settings. The idea is to provide a template so that a graphical user interface can be automatically generated. The list below represents what each item would be when using streamlit. This could be adapted for any kind of GUI library.
Each entry has a type, default values, and a description.
spinbox -> st.range, range with minimum and maximum values (int)
doublespinbox -> st.range, range with minimum and maximum values (float)
path -> st.button, clickable button to select a path to save / load files.
combobox -> st.selectbox, dropdown menu with values to choose from
checkbox -> st.checkbox, checkbox that can be selected
checkgroup -> st.multiselect, creates a list of options that can be selected
string -> st.text_input, generic string input
list -> Creates a list that is displayed
placeholder -> This just prints the parameter and cannot be changed
Worfklow settings
Workflow settings regarding the workflow - which algorithmic steps should be performed.
print(yaml.dump(SETTINGS_TEMPLATE['workflow']))
align:
default: false
description: Flag to align the data.
type: checkbox
continue_runs:
default: false
description: Flag to continue previously computated runs. If False existing ms_data
will be deleted.
type: checkbox
create_database:
default: true
description: Flag to create a database.
type: checkbox
find_features:
default: true
description: Flag to perform feature finding.
type: checkbox
import_raw_data:
default: true
description: Flag to import the raw data.
type: checkbox
lfq_quantification:
default: true
description: Flag to perfrom lfq normalization.
type: checkbox
match:
default: false
description: Flag to perform match-between runs.
type: checkbox
recalibrate_data:
default: true
description: Flag to perform recalibration.
type: checkbox
search_data:
default: true
description: Flag to perform search.
type: checkbox
print(yaml.dump(SETTINGS_TEMPLATE['general']))
n_processes:
default: 60
description: Maximum number of processes for multiprocessing. If larger than number
of processors it will be capped.
max: 60
min: 1
type: spinbox
Experimental Settings
Core defintions of the experiment, regarding the filepaths..
print(yaml.dump(SETTINGS_TEMPLATE['experiment']))
database_path:
default: null
description: Path to library file (.hdf).
filetype:
- hdf
folder: false
type: path
fasta_paths:
default: []
description: List of paths for FASTA files.
type: list
file_paths:
default: []
description: Filepaths of the experiments.
type: list
fraction:
default: []
description: List of fraction numbers for fractionated samples.
type: list
matching_group:
default: []
description: List of macthing groups for the raw files. This only allows match-between-runs
of files within the same groups.
type: list
results_path:
default: null
description: Path where the results should be stored.
filetype:
- hdf
folder: false
type: path
sample_group:
default: []
description: Sample group, for raw files that should be quanted together.
type: list
shortnames:
default: []
description: List of shortnames for the raw files.
type: list
Raw file handling
print(yaml.dump(SETTINGS_TEMPLATE['raw']))
n_most_abundant:
default: 400
description: Number of most abundant peaks to be isolated from raw spectra.
max: 1000
min: -1
type: spinbox
use_profile_ms1:
default: false
description: Use profile data for MS1 and perform own centroiding.
type: checkbox
FASTA settings
print(yaml.dump(SETTINGS_TEMPLATE['fasta']))
AL_swap:
default: false
description: Swap A and L for decoy generation.
type: checkbox
KR_swap:
default: false
description: Swap K and R (only if terminal) for decoy generation.
type: checkbox
fasta_block:
default: 1000
description: Number of fasta entries to be processed in one block.
max: 10000
min: 100
type: spinbox
fasta_size_max:
default: 100
description: Maximum size of FASTA (MB) when switching on-the-fly.
max: 1000000
min: 1
type: spinbox
isoforms_max:
default: 1024
description: Maximum number of isoforms per peptide.
max: 4096
min: 1
type: spinbox
mods_fixed:
default:
- cC
description: Fixed modifications.
type: checkgroup
value:
aK: acetylation of lysine
cC: carbamidomethylation of C
deamN: deamidation of N
deamQ: deamidation of Q
eK: EASItag 6-plex on K
itraq4K: iTRAQ 4-plex on K
itraq4Y: iTRAQ 4-plex on Y
itraq8K: iTRAQ 8-plex on K
itraq8Y: iTRAQ 8-plex on Y
oxM: oxidation of M
pS: phosphorylation of S
pT: phosphorylation of T
pY: phosphorylation of Y
tmt0K: TMT zero on K
tmt0Y: TMT zero on Y
tmt2K: TMT duplex on K
tmt2Y: TMT duplex on Y
tmt6K: TMT sixplex/tenplex on K
tmt6Y: TMT sixplex/tenplex on Y
mods_fixed_terminal:
default: []
description: Fixed terminal modifications.
type: checkgroup
value:
arg10>R: Arg 10 on peptide C-terminus
arg6>R: Arg 6 on peptide C-terminus
cm<C: pyro-cmC
e<^: EASItag 6-plex on peptide N-terminus
itraq4K<^: iTRAQ 4-plex on peptide N-terminus
itraq8K<^: iTRAQ 8-plex on peptide N-terminus
lys8>K: Lys 8 on peptide C-terminus
pg<E: pyro-E
pg<Q: pyro-Q
tmt0<^: TMT zero on peptide N-terminus
tmt2<^: TMT duplex on peptide N-terminus
tmt6<^: TMT sixplex/tenplex on peptide N-terminus
mods_fixed_terminal_prot:
default: []
description: Fixed terminal modifications on proteins.
type: checkgroup
value:
a<^: acetylation of protein N-terminus
am>^: amidation of protein C-terminus
mods_variable:
default:
- oxM
description: Variable modifications.
type: checkgroup
value:
aK: acetylation of lysine
cC: carbamidomethylation of C
deamN: deamidation of N
deamQ: deamidation of Q
eK: EASItag 6-plex on K
itraq4K: iTRAQ 4-plex on K
itraq4Y: iTRAQ 4-plex on Y
itraq8K: iTRAQ 8-plex on K
itraq8Y: iTRAQ 8-plex on Y
oxM: oxidation of M
pS: phosphorylation of S
pT: phosphorylation of T
pY: phosphorylation of Y
tmt0K: TMT zero on K
tmt0Y: TMT zero on Y
tmt2K: TMT duplex on K
tmt2Y: TMT duplex on Y
tmt6K: TMT sixplex/tenplex on K
tmt6Y: TMT sixplex/tenplex on Y
mods_variable_terminal:
default: []
description: Varibale terminal modifications.
type: checkgroup
value:
arg10>R: Arg 10 on peptide C-terminus
arg6>R: Arg 6 on peptide C-terminus
cm<C: pyro-cmC
e<^: EASItag 6-plex on peptide N-terminus
itraq4K<^: iTRAQ 4-plex on peptide N-terminus
itraq8K<^: iTRAQ 8-plex on peptide N-terminus
lys8>K: Lys 8 on peptide C-terminus
pg<E: pyro-E
pg<Q: pyro-Q
tmt0<^: TMT zero on peptide N-terminus
tmt2<^: TMT duplex on peptide N-terminus
tmt6<^: TMT sixplex/tenplex on peptide N-terminus
mods_variable_terminal_prot:
default:
- a<^
description: Varibale terminal modifications on proteins.
type: checkgroup
value:
a<^: acetylation of protein N-terminus
am>^: amidation of protein C-terminus
n_missed_cleavages:
default: 2
description: Number of missed cleavages.
max: 99
min: 0
type: spinbox
n_modifications_max:
default: 3
description: Limit the number of modifications per peptide.
max: 10
min: 1
type: spinbox
pep_length_max:
default: 27
description: Maximum peptide length.
max: 99
min: 7
type: spinbox
pep_length_min:
default: 7
description: Minimum peptide length.
max: 99
min: 7
type: spinbox
protease:
default: trypsin
description: Protease for digestions.
type: combobox
value:
- arg-c
- asp-n
- bnps-skatole
- caspase 1
- caspase 2
- caspase 3
- caspase 4
- caspase 5
- caspase 6
- caspase 7
- caspase 8
- caspase 9
- caspase 10
- chymotrypsin high specificity
- chymotrypsin low specificity
- clostripain
- cnbr
- enterokinase
- factor xa
- formic acid
- glutamyl endopeptidase
- granzyme b
- hydroxylamine
- iodosobenzoic acid
- lys_c
- lys_c/p
- lys_n
- ntcb
- pepsin ph1.3
- pepsin ph2.0
- proline endopeptidase
- proteinase k
- staphylococcal peptidase i
- thermolysin
- thrombin
- trypsin_full
- trypsin_exception
- non-specific
- trypsin
pseudo_reverse:
default: true
description: Use pseudo-reverse strategy instead of reverse.
type: checkbox
save_db:
default: true
description: Save DB or create on the fly.
type: checkbox
spectra_block:
default: 100000
description: Maximum number of sequences to be collected before theoretical spectra
are generated.
max: 1000000
min: 1000
type: spinbox