Feature Finding

Functions related to feature finding

This part describes the implementation of the feature-finding algorithm. The core of the algorithm is described in the MaxQuant-Paper. The supplementary material explains the underlying methodology in great detail and is the foundation of the theoretical background that is described here. A refined version of the algorithm was presented with Dinosaur, which was also used as a reference for the Python implementation.

For the algorithm, we need serval modules:

Connecting Centroids to Hills
Refinement of Hills
Calculating Hill Statistics
Combining Hills to Isotope Patterns
Deconvolution of Isotope Patterns

Loading Data

From the IO library, we already have an *.ms_data.hdf container that contains centroided data. To use it in feature finding, we directly load the data.

Connecting Centroids to Hills

Note

Feature finding relies heavily on the performance function decorator from the performance notebook: @alphapept.performance.performance_function. Part of this is that the functions will not have return values to be GPU compatible. Please check out this notebook for further information.

Connecting centroids

Feature finding starts with connecting centroids. For this we look at subsequent scans and compare peaks that are withing a defined mass tolerance (centroid_tol).

Imagine you have three scans with the following centroids:

Scan 0: 10, 20, 30
Scan 1: 10.2, 40.1
Scan 2: 40, 50, 60

When comparing consecutive scans and defining the maximum delta mass to be 0.5 find the following connections: (Scan No, Centroid No) -> (Scan No, Centroid No). As we cannot easily store tuples in the matrix, we convert tuple containing the position of the connected centroid to an integer. * (0,0) -> (1,0) -> (3): 10 & 10.2 -> delta = 0.2 * (1,1) -> (2,0) -> (6): 40.1 & 40 -> delta = 0.1

Finally, we store this in the results matrix:

\(\begin{bmatrix} 3 & -1 & -1 \\ -1 & 6 & -1\\ -1 & -1 & -1 \end{bmatrix}\)

The coressponding scores matrix will look as follows:

\(\begin{bmatrix} 0.2 & -1 & -1 \\ -1 & 0.1 & -1\\ -1 & -1 & -1 \end{bmatrix}\)

This allows us to not only easily store connections between centroids but also perform a quick lookup for the delta of an existing connection. Note that it also only stores the best connection for each centroid. To extract the connected centroids, we can use np.where(results >= 0). This implementation allows getting millions of connections within seconds.

As we are also allowing gaps, refering to that we can have connections between Scan 0 and Scan 2, we make the aforementioned matrix multdimensional, so that e.g. a first matrix stores the conncetions for no gap, the second matrix the connections with a gap of 1.

The functionality for this step is implemented in connect_centroids_unidirection and the wrapper find_centroid_connections.

Loading Data

Connecting Centroids to Hills

Connecting centroids

find_centroid_connections

connect_centroids_unidirection

connect_centroids

eliminate_overarching_vertex

convert_connections_to_array

Extracting hills.

remove_duplicate_hills

extract_hills

get_hills

fill_path_matrix

find_path_length

find_path_start

path_finder

Hill Splitting

fast_minima

split_hills

split

Filter Hills

filter_hills

check_large_hills

Calculating Hill Statistics

get_hill_data

remove_duplicates

hill_stats

Combining Hills to Isotope Patterns

check_isotope_pattern

Cosine Correlation of two hills

correlate

Extracting pre-Isotope Patterns

edge_correlation

extract_edge

get_pre_isotope_patterns

Extracting Isotope Patterns

get_trails

grow_trail

grow

check_isotope_pattern_directed

plot_pattern

truncate

is_local_minima

get_local_minima

get_minpos

Isolating Isotope_patterns

mz_to_mass

int_list_to_array

cosine_averagine

pattern_to_mz

check_averagine

Isotope Patterns

isolate_isotope_pattern

get_isotope_patterns

report_

feature_finder_report

Data Output

Plotting

External Feature Finder

map_bruker

convert_bruker

extract_bruker

Isotope Export

get_stats

Wrapper

find_features

Mapping

map_ms2

replace_infs