Confidence in compound annotation by untargeted LC-MS/MS
At the UC Davis West Coast Metabolomics Center we acquire data for over 30,000 samples per year. For years, we have relied on matching experimental accurate precursor masses with MS/MS library spectra, followed by manual curation of annotations. However, False Discovery Rates (FDR) for this classic method remain poorly studied. Which adduct species should we expect? At which (relative) retention time should a compound be found? How many alternative isomers and isobar molecules should we consider? Ideally, FDR for MS/MS library matching would be complemented by ion mobility (CCS) information, genetic and biological literature data, and atlases of confirmed presence of compounds in a wide array of biological matrices. Current software does not facilitate the incorporation of these types of information. We work on integrated workflows using 19 published databases and software packages to assist in structural characterizations, including MassBank.us with more than 650,000 public MS/MS spectra. Here, we show how these packages were employed on a regular basis by identifying more than 1,000 metabolites in typical specimen such as blood, urine, microbiome GI tract or brain samples.
(1) After classic MS/MS library matching, NIST hybrid search yields chemical class information on all compounds that did not have direct hits in experimental or in-silico MS/MS libraries.
(2) We have developed "Entropy Similarity" as measure for MS/MS matching that improves FDR over classic dot-product similarity matching and that outperforms 40 further MS/MS similarity algorithms.
(3) We have published retention time libraries for both HILIC and RP liquid chromatography methods for more than 4,000 metabolite standards. We used these data to develop retention time prediction algorithms by Machine Learning and deployed the retip.app for use by the community. For plasma metabolomics we found a 26% reduction in the number of false-positive compound annotations, along with a 21% improvement in the true-positive identification rate.
(4) To have a priori knowledge for which metabolite to find in which matrix, we published the BloodExposome.org database with 42,000 literature-based metabolites and the atlas mouse of the aging mouse brain (https://mouse.atlas.metabolomics.us/) for more than 1,500 identified compounds in 10 mouse brain sections.
(5) For remaining unknowns of biological interest we used hydrogen/deuterium exchange data to limit the chemical search space by the number of acidic protons in MS/MS spectra.
Join Zoom Meeting
Meeting ID: 995 6019 7918
One tap mobile
+13017158592,,99560197918# US (Washington DC)
+13126266799,,99560197918# US (Chicago)
Dial by your location
+1 301 715 8592 US (Washington DC)
+1 312 626 6799 US (Chicago)
+1 929 436 2866 US (New York)
+1 669 900 6833 US (San Jose)
+1 253 215 8782 US (Tacoma)
+1 346 248 7799 US (Houston)
Meeting ID: 995 6019 7918