Title: | Spectral Entropy for Mass Spectrometry Data |
---|---|
Description: | Clean the MS/MS spectrum, calculate spectral entropy, unweighted entropy similarity, and entropy similarity for mass spectrometry data. The entropy similarity is a novel similarity measure for MS/MS spectra which outperform the widely used dot product similarity in compound identification. For more details, please refer to the paper: Yuanyue Li et al. (2021) "Spectral entropy outperforms MS/MS dot product similarity for small-molecule compound identification" <doi:10.1038/s41592-021-01331-z>. |
Authors: | Yuanyue Li [aut, cre] |
Maintainer: | Yuanyue Li <[email protected]> |
License: | Apache License (== 2.0) |
Version: | 0.1.4 |
Built: | 2025-02-21 04:14:35 UTC |
Source: | https://github.com/cran/msentropy |
Calculate the entropy similarity between two spectra
calculate_entropy_similarity( peaks_a, peaks_b, ms2_tolerance_in_da, ms2_tolerance_in_ppm, clean_spectra, min_mz, max_mz, noise_threshold, max_peak_num )
calculate_entropy_similarity( peaks_a, peaks_b, ms2_tolerance_in_da, ms2_tolerance_in_ppm, clean_spectra, min_mz, max_mz, noise_threshold, max_peak_num )
peaks_a |
A matrix of spectral peaks, with two columns: mz and intensity |
peaks_b |
A matrix of spectral peaks, with two columns: mz and intensity |
ms2_tolerance_in_da |
The MS2 tolerance in Da, set to -1 to disable |
ms2_tolerance_in_ppm |
The MS2 tolerance in ppm, set to -1 to disable |
clean_spectra |
Whether to clean the spectra before calculating the entropy similarity, see |
min_mz |
The minimum mz value to keep, set to -1 to disable |
max_mz |
The maximum mz value to keep, set to -1 to disable |
noise_threshold |
The noise threshold, set to -1 to disable, all peaks have intensity < noise_threshold * max_intensity will be removed |
max_peak_num |
The maximum number of peaks to keep, set to -1 to disable |
The entropy similarity
mz_a <- c(169.071, 186.066, 186.0769) intensity_a <- c(7.917962, 1.021589, 100.0) mz_b <- c(120.212, 169.071, 186.066) intensity_b <- c(37.16, 66.83, 999.0) peaks_a <- matrix(c(mz_a, intensity_a), ncol = 2, byrow = FALSE) peaks_b <- matrix(c(mz_b, intensity_b), ncol = 2, byrow = FALSE) calculate_entropy_similarity(peaks_a, peaks_b, ms2_tolerance_in_da = 0.02, ms2_tolerance_in_ppm = -1, clean_spectra = TRUE, min_mz = 0, max_mz = 1000, noise_threshold = 0.01, max_peak_num = 100)
mz_a <- c(169.071, 186.066, 186.0769) intensity_a <- c(7.917962, 1.021589, 100.0) mz_b <- c(120.212, 169.071, 186.066) intensity_b <- c(37.16, 66.83, 999.0) peaks_a <- matrix(c(mz_a, intensity_a), ncol = 2, byrow = FALSE) peaks_b <- matrix(c(mz_b, intensity_b), ncol = 2, byrow = FALSE) calculate_entropy_similarity(peaks_a, peaks_b, ms2_tolerance_in_da = 0.02, ms2_tolerance_in_ppm = -1, clean_spectra = TRUE, min_mz = 0, max_mz = 1000, noise_threshold = 0.01, max_peak_num = 100)
Calculate spectral entropy of a spectrum
calculate_spectral_entropy(peaks)
calculate_spectral_entropy(peaks)
peaks |
A matrix of peaks, with two columns: m/z and intensity. |
A double value of spectral entropy.
mz <- c(100.212, 300.321, 535.325) intensity <- c(37.16, 66.83, 999.0) peaks <- matrix(c(mz, intensity), ncol = 2, byrow = FALSE) calculate_spectral_entropy(peaks)
mz <- c(100.212, 300.321, 535.325) intensity <- c(37.16, 66.83, 999.0) peaks <- matrix(c(mz, intensity), ncol = 2, byrow = FALSE) calculate_spectral_entropy(peaks)
Calculate the unweighted entropy similarity between two spectra
calculate_unweighted_entropy_similarity( peaks_a, peaks_b, ms2_tolerance_in_da, ms2_tolerance_in_ppm, clean_spectra, min_mz, max_mz, noise_threshold, max_peak_num )
calculate_unweighted_entropy_similarity( peaks_a, peaks_b, ms2_tolerance_in_da, ms2_tolerance_in_ppm, clean_spectra, min_mz, max_mz, noise_threshold, max_peak_num )
peaks_a |
A matrix of spectral peaks, with two columns: mz and intensity |
peaks_b |
A matrix of spectral peaks, with two columns: mz and intensity |
ms2_tolerance_in_da |
The MS2 tolerance in Da, set to -1 to disable |
ms2_tolerance_in_ppm |
The MS2 tolerance in ppm, set to -1 to disable |
clean_spectra |
Whether to clean the spectra before calculating the entropy similarity, see |
min_mz |
The minimum mz value to keep, set to -1 to disable |
max_mz |
The maximum mz value to keep, set to -1 to disable |
noise_threshold |
The noise threshold, set to -1 to disable, all peaks have intensity < noise_threshold * max_intensity will be removed |
max_peak_num |
The maximum number of peaks to keep, set to -1 to disable |
The unweighted entropy similarity
mz_a <- c(169.071, 186.066, 186.0769) intensity_a <- c(7.917962, 1.021589, 100.0) mz_b <- c(120.212, 169.071, 186.066) intensity_b <- c(37.16, 66.83, 999.0) peaks_a <- matrix(c(mz_a, intensity_a), ncol = 2, byrow = FALSE) peaks_b <- matrix(c(mz_b, intensity_b), ncol = 2, byrow = FALSE) calculate_unweighted_entropy_similarity(peaks_a, peaks_b, ms2_tolerance_in_da = 0.02, ms2_tolerance_in_ppm = -1, clean_spectra = TRUE, min_mz = 0, max_mz = 1000, noise_threshold = 0.01, max_peak_num = 100)
mz_a <- c(169.071, 186.066, 186.0769) intensity_a <- c(7.917962, 1.021589, 100.0) mz_b <- c(120.212, 169.071, 186.066) intensity_b <- c(37.16, 66.83, 999.0) peaks_a <- matrix(c(mz_a, intensity_a), ncol = 2, byrow = FALSE) peaks_b <- matrix(c(mz_b, intensity_b), ncol = 2, byrow = FALSE) calculate_unweighted_entropy_similarity(peaks_a, peaks_b, ms2_tolerance_in_da = 0.02, ms2_tolerance_in_ppm = -1, clean_spectra = TRUE, min_mz = 0, max_mz = 1000, noise_threshold = 0.01, max_peak_num = 100)
Clean a spectrum
This function will clean the peaks by the following steps: 1. Remove empty peaks (mz <= 0 or intensity <= 0). 2. Remove peaks with mz >= max_mz or mz < min_mz. 3. Centroid the spectrum by merging peaks within min_ms2_difference_in_da or min_ms2_difference_in_ppm. 4. Remove peaks with intensity < noise_threshold * max_intensity. 5. Keep only the top max_peak_num peaks. 6. Normalize the intensity to sum to 1.
Note: The only one of min_ms2_difference_in_da and min_ms2_difference_in_ppm should be positive.
clean_spectrum( peaks, min_mz, max_mz, noise_threshold, min_ms2_difference_in_da, min_ms2_difference_in_ppm, max_peak_num, normalize_intensity )
clean_spectrum( peaks, min_mz, max_mz, noise_threshold, min_ms2_difference_in_da, min_ms2_difference_in_ppm, max_peak_num, normalize_intensity )
peaks |
A matrix of spectral peaks, with two columns: mz and intensity |
min_mz |
The minimum mz value to keep, set to -1 to disable |
max_mz |
The maximum mz value to keep, set to -1 to disable |
noise_threshold |
The noise threshold, set to -1 to disable, all peaks have intensity < noise_threshold * max_intensity will be removed |
min_ms2_difference_in_da |
The minimum mz difference in Da to merge peaks, set to -1 to disable, any two peaks with mz difference < min_ms2_difference_in_da will be merged |
min_ms2_difference_in_ppm |
The minimum mz difference in ppm to merge peaks, set to -1 to disable, any two peaks with mz difference < min_ms2_difference_in_ppm will be merged |
max_peak_num |
The maximum number of peaks to keep, set to -1 to disable |
normalize_intensity |
Whether to normalize the intensity to sum to 1 |
A matrix of spectral peaks, with two columns: mz and intensity
mz <- c(100.212, 169.071, 169.078, 300.321) intensity <- c(0.3716, 7.917962, 100., 66.83) peaks <- matrix(c(mz, intensity), ncol = 2, byrow = FALSE) clean_spectrum(peaks, min_mz = 0, max_mz = 1000, noise_threshold = 0.01, min_ms2_difference_in_da = 0.02, min_ms2_difference_in_ppm = -1, max_peak_num = 100, normalize_intensity = TRUE)
mz <- c(100.212, 169.071, 169.078, 300.321) intensity <- c(0.3716, 7.917962, 100., 66.83) peaks <- matrix(c(mz, intensity), ncol = 2, byrow = FALSE) clean_spectrum(peaks, min_mz = 0, max_mz = 1000, noise_threshold = 0.01, min_ms2_difference_in_da = 0.02, min_ms2_difference_in_ppm = -1, max_peak_num = 100, normalize_intensity = TRUE)
msentropy_similarity
calculates the spectral entropy between two spectra
(Li et al. 2021). It is a wrapper function defining defaults for parameters
and calling the calculate_entropy_similarity()
or
calculate_unweighted_entropy_similarity()
functions to perform the
calculation.
msentropy_similarity( peaks_a, peaks_b, ms2_tolerance_in_da = 0.02, ms2_tolerance_in_ppm = -1, clean_spectra = TRUE, min_mz = 0, max_mz = 1000, noise_threshold = 0.01, max_peak_num = 100, weighted = TRUE, ... )
msentropy_similarity( peaks_a, peaks_b, ms2_tolerance_in_da = 0.02, ms2_tolerance_in_ppm = -1, clean_spectra = TRUE, min_mz = 0, max_mz = 1000, noise_threshold = 0.01, max_peak_num = 100, weighted = TRUE, ... )
peaks_a |
A two-column numeric matrix with the m/z and intensity values for peaks of one spectrum. |
peaks_b |
A two-column numeric matrix with the m/z and intensity values for peaks of one spectrum. |
ms2_tolerance_in_da |
The MS2 tolerance in Da, set to -1 to disable.
Defaults to |
ms2_tolerance_in_ppm |
The MS2 tolerance in ppm, set to -1 to disable.
Defaults to |
clean_spectra |
Whether to clean the spectra before calculating the
entropy similarity, see |
min_mz |
The minimum mz value to keep, set to -1 to disable. Defaults to
|
max_mz |
The maximum mz value to keep, set to -1 to disable. Defaults to
|
noise_threshold |
The noise threshold, set to -1 to disable, all peaks
have intensity < noise_threshold * max_intensity will be removed.
Defaults to |
max_peak_num |
The maximum number of peaks to keep, set to -1 to
disable. Defaults to |
weighted |
|
... |
Optional additional parameters (currently ignored) |
The entropy similarity
Li, Y., Kind, T., Folz, J. et al. (2021) Spectral entropy outperforms MS/MS dot product similarity for small-molecule compound identification. Nat Methods 18, 1524-1531. doi:10.1038/s41592-021-01331-z.
peaks_a <- cbind(mz = c(169.071, 186.066, 186.0769), intensity = c(7.917962, 1.021589, 100.0)) peaks_b <- cbind(mz = c(120.212, 169.071, 186.066), intensity <- c(37.16, 66.83, 999.0)) msentropy_similarity(peaks_a, peaks_b, ms2_tolerance_in_da = 0.02)
peaks_a <- cbind(mz = c(169.071, 186.066, 186.0769), intensity = c(7.917962, 1.021589, 100.0)) peaks_b <- cbind(mz = c(120.212, 169.071, 186.066), intensity <- c(37.16, 66.83, 999.0)) msentropy_similarity(peaks_a, peaks_b, ms2_tolerance_in_da = 0.02)