-
化学工业的发展在给人们生活带来便利的同时,也造成了环境介质中大量的化学品残留[1]。据统计,当前化学文摘(Chemical Abstract Service, CAS)数据库收录的化学品已达到1.25亿种[2]。暴露在环境中的化学品不仅给生态系统造成威胁,也会通过各种途径进入人体进而损害人体健康。如Stehle 和 Schulz [3]指出,杀虫剂的广泛使用造成大型无脊椎动物数量削减30%。通过食物链富集,暴露在环境中的双氯芬酸已造成印第安秃鹰数量的削减[4]。因此,控制环境介质中化学品的含量对降低其对生态系统乃至人类的危害至关重要。
识别环境介质中的化学品是对其进行削减的前提,液相色谱串联高分辨质谱(liquid chromatography coupled to mass spectrometry, LC/MS))是有机化学品检测与识别的最常用手段[2, 5]。由于环境介质较为复杂,环境样品的质谱数据极为复杂,对其进行有效的解析需要一定的策略[6]。非靶向筛查在当前环境样品的液相色谱质谱检测数据中广泛使用[7-8]。非靶向筛查是指在没有标准品及样品先验信息的前提下,仅根据质谱数据识别样品中未知化合物信息的流程及方法[9]。非靶向筛查的流程包括峰提取、去冗余、特征峰筛选、结构注释与鉴定等步骤[10]。化学信息学在每个步骤的数据处理过程中扮演重要的角色。如不同的数据处理步骤会涉及不同的算法[11]。不同的数据处理软件在同一步骤的数据处理过程中使用不同的算法[12]。对于同一步骤,不同的算法在处理数据时会产生一定的差异[13]。因此,对非靶向筛查过程中化学信息学的阐述有利于科研工作者更好的使用不同的非靶向筛查数据处理软件,从而产生更可信数据分析结果。
基于此,针对基于液相色谱串联质谱数据的非靶向筛查流程,本文综述了如图1的分析流程,以及该流程中每个数据处理步骤所涉及的化学信息学知识,对不同非靶向数据处理软件在该过程中使用的算法及该数据处理过程中算法的发展及优劣进行了综述和比较。在此基础上,对非靶向数据处理过程中所使用的化学信息学知识的未来发展方向进行了展望。
化学信息学在液相色谱高分辨质谱联用的化学品非靶向筛查中的应用
Cheminformatics in untargeted screening of liquid chromatography coupled to mass spectrometry data
-
摘要: 化学工业的发展使得环境介质中未知化合物数目巨大,对其进行识别是认识其环境风险进而开发削减策略的关键。液相色谱串联高分辨质谱是化合物识别的常用技术,该技术采集的数据一般较复杂,需要适当的数据解析手段方能呈现复杂环境样品中的化合物信息,化学信息学在高分辨质谱非靶向筛查中的发展为化合物结构解析提供了可能。本文综述了化学信息学在非靶向筛查中的应用。基于非靶向筛查流程中的峰提取、去冗余、特征峰筛选、注释与结构确定步骤,从涉及的算法、软件、化合物数据库、谱图数据库等进行了阐述。在此基础上,对算法和软件工具的参数优化和数据处理一致性进行了阐述。本综述为更好的进行高分辨质谱数据非靶向处理提供了支撑。Abstract: The development of the chemical industry has resulted in the exposure of a huge number of unknown compounds in environment matrices. Identifying the chemicals is the key of assessing environmental risks of them and further attenuating them in environmental matrices. Liquid chromatography coupled to mass spectrometry (LC/MS) is a common technique of identifying compounds. However, the data collected by LC/MS is generally complex, which requires appropriate data analysis to reveal the information unrevealed in environmental samples. The development of analytical chemistry in untargeted screening of high-resolution mass spectrometry provides the possibility for compound structure identification. In this paper, we reviewed the application of analytical chemistry in untargeted screening, focused on the algorithm, software, compound database, spectrometry database and other aspects of the workflow such as process of peak extraction, de-redundancy, prioritization, annotation and structure determination. In addition, the parameter optimization and data processing consistency of algorithms and software tools are discussed. This review provides a better support of untargeted processing of high-resolution mass spectrometry data.
-
表 1 正负离子模式下常见的加和物
Table 1. The common adduct ion in positive and negative ion mode
ESI (+) ESI (−) M+H,M+Na,M+K M-H,M+Na-2H,M+K-2H M+NH4,M+ACN+H M+Cl,M+Br,M+FA-H M+2ACN+H, M+ACN+Na,M+CH3OH+H M+Hac-H,M+TFA-H,M-H2O-H -
[1] FANG W D, PENG Y, MUIR D, et al. A critical review of synthetic chemicals in surface waters of the US, the EU and China [J]. Environment International, 2019, 131: 104994. doi: 10.1016/j.envint.2019.104994 [2] HERNÁNDEZ F, BAKKER J, BIJLSMA L, et al. The role of analytical chemistry in exposure science: Focus on the aquatic environment [J]. Chemosphere, 2019, 222: 564-583. doi: 10.1016/j.chemosphere.2019.01.118 [3] STEHLE S, SCHULZ R. Agricultural insecticides threaten surface waters at the global scale [J]. PNAS, 2015, 112(18): 5750-5755. doi: 10.1073/pnas.1500232112 [4] OAKS J L, GILBERT M, VIRANI M Z, et al. Diclofenac residues as the cause of vulture population decline in Pakistan [J]. Nature, 2004, 427(6975): 630-633. doi: 10.1038/nature02317 [5] TANG Y, CRAVEN C B, WAWRYK N J P, et al. Advances in mass spectrometry-based omics analysis of trace organics in water. Trends in Analytical Chemistry[J], 2020, 128, 115918. [6] ESCHER B I, STAPLETON H M, SCHYMANSKI E L. Tracking complex mixtures of chemicals in our changing environment [J]. Science, 2020, 367(6476): 388-392. doi: 10.1126/science.aay6636 [7] SCHYMANSKI E L, JEON J, GULDE R, et al. Identifying small molecules via high resolution mass spectrometry: Communicating confidence [J]. Environmental Science & Technology, 2014, 48(4): 2097-2098. [8] ALYGIZAKIS N A, GAGO-FERRERO P, HOLLENDER J, et al. Untargeted time-pattern analysis of LC-HRMS data to detect spills and compounds with high fluctuation in influent wastewater [J]. Journal of Hazardous Materials, 2019, 361: 19-29. doi: 10.1016/j.jhazmat.2018.08.073 [9] WANG X B, YU N Y, YANG J P, et al. Suspect and non-target screening of pesticides and pharmaceuticals transformation products in wastewater using QTOF-MS [J]. Environment International, 2020, 137: 105599. doi: 10.1016/j.envint.2020.105599 [10] HELMUS R, TER LAAK TL, van WEZEL AP, et al. patRoon: Open source software platform for environmental mass spectrometry based non-target screening[J].Cheminform, 2021 , 13(1):1 . [11] SMITH C A, WANT E J, O'MAILLE G, et al. XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification [J]. Analytical Chemistry, 2006, 78(3): 779-787. doi: 10.1021/ac051437y [12] HOHRENK L L, ITZEL F, BAETZ N, et al. Comparison of software tools for liquid chromatography–high-resolution mass spectrometry data processing in nontarget screening of environmental samples [J]. Analytical Chemistry, 2020, 92(2): 1898-1907. doi: 10.1021/acs.analchem.9b04095 [13] MYERS O D, SUMNER S J, LI S Z, et al. One step forward for reducing false positive and false negative compound identifications from mass spectrometry metabolomics data: New algorithms for constructing extracted ion chromatograms and detecting chromatographic peaks [J]. Analytical Chemistry, 2017, 89(17): 8696-8703. doi: 10.1021/acs.analchem.7b00947 [14] TAUTENHAHN R, BÖTTCHER C, NEUMANN S. Highly sensitive feature detection for high resolution LC/MS [J]. BMC Bioinformatics, 2008, 9(1): 1-16. doi: 10.1186/1471-2105-9-1 [15] KATAJAMAA M, ORESIC M, Processing methods for differential analysis of LC/MS profile data[J]. BMC Bioinformatics, 2005, 6: 179. [16] PLUSKAL T, CASTILLO S, VILLAR-BRIONES A, et al. MZmine 2: Modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data [J]. BMC Bioinformatics, 2010, 11(1): 1-11. doi: 10.1186/1471-2105-11-1 [17] TAUTENHAHN R, PATTI G J, RINEHART D, et al. XCMS online: A web-based platform to process untargeted metabolomic data [J]. Analytical Chemistry, 2012, 84(11): 5035-5039. doi: 10.1021/ac300698c [18] STOLT R, TORGRIP R J O, LINDBERG J, et al. Second-order peak detection for multicomponent high-resolution LC/MS data [J]. Analytical Chemistry, 2006, 78(4): 975-983. doi: 10.1021/ac050980b [19] ÅBERG K M, TORGRIP R J O, KOLMERT J, et al. Feature detection and alignment of hyphenated chromatographic-mass spectrometric data: Extraction of pure ion chromatograms using Kalman tracking [J]. Journal of Chromatography A, 2008, 1192(1): 139-146. doi: 10.1016/j.chroma.2008.03.033 [20] NI Y, SU M M, QIU Y P, et al. ADAP-GC 3.0: Improved peak detection and deconvolution of co-eluting metabolites from GC/TOF-MS data for metabolomics studies [J]. Analytical Chemistry, 2016, 88(17): 8802-8811. doi: 10.1021/acs.analchem.6b02222 [21] DU X, SMIRNOV A, PLUSKAL T, et al. Metabolomics data preprocessing using ADAP and MZmine 2[J]. Methods Mol Biol. 2020;2104:25-48. [22] MYERS O D, SUMNER S J, LI S Z, et al. Detailed investigation and comparison of the XCMS and MZmine 2 chromatogram construction and chromatographic peak detection methods for preprocessing mass spectrometry metabolomics data [J]. Analytical Chemistry, 2017, 89(17): 8689-8695. doi: 10.1021/acs.analchem.7b01069 [23] HU Y X, CAI B, HUAN T. Enhancing metabolome coverage in data-dependent LC-MS/MS analysis through an integrated feature extraction strategy [J]. Analytical Chemistry, 2019, 91(22): 14433-14441. doi: 10.1021/acs.analchem.9b02980 [24] JU R, LIU X Y, ZHENG F J, et al. A graph density-based strategy for features fusion from different peak extract software to achieve more metabolites in metabolic profiling from high-resolution mass spectrometry [J]. Analytica Chimica Acta, 2020, 1139: 8-14. doi: 10.1016/j.aca.2020.09.029 [25] BAKER E S, PATTI G J. Perspectives on data analysis in metabolomics: Points of agreement and disagreement from the 2018 ASMS fall workshop [J]. Journal of the American Society for Mass Spectrometry, 2019, 30(10): 2031-2036. doi: 10.1007/s13361-019-02295-3 [26] KUHL C, TAUTENHAHN R, BÖTTCHER C, et al. CAMERA: an integrated strategy for compound spectra extraction and annotation of liquid chromatography/mass spectrometry data sets [J]. Analytical Chemistry, 2012, 84(1): 283-289. doi: 10.1021/ac202450g [27] SINDELAR M, PATTI G J. Chemical discovery in the era of metabolomics [J]. Journal of the American Chemical Society, 2020, 142(20): 9097-9105. doi: 10.1021/jacs.9b13198 [28] ZENG Z D, LIU X Y, DAI W D, et al. Ion fusion of high-resolution LC-MS-based metabolomics data to discover more reliable biomarkers [J]. Analytical Chemistry, 2014, 86(8): 3793-3800. doi: 10.1021/ac500878x [29] DEFELICE B C, MEHTA S S, SAMRA S, et al. Mass spectral feature list optimizer (MS-FLO): A tool to minimize false positive peak reports in untargeted liquid chromatography-mass spectroscopy (LC-MS) data processing [J]. Analytical Chemistry, 2017, 89(6): 3250-3255. doi: 10.1021/acs.analchem.6b04372 [30] SENAN O, AGUILAR-MOGAS A, NAVARRO M, et al. CliqueMS: a computational tool for annotating in-source metabolite ions from LC-MS untargeted metabolomics data based on a coelution similarity network [J]. Bioinformatics, 2019, 35(20): 4089-4097. doi: 10.1093/bioinformatics/btz207 [31] KÖPPE T, JEWELL K S, DIETRICH C, et al. Application of a non-target workflow for the identification of specific contaminants using the example of the Nidda river basin [J]. Water Research, 2020, 178: 115703. doi: 10.1016/j.watres.2020.115703 [32] BROECKLING C D, AFSAR F A, NEUMANN S, et al. RAMClust: a novel feature clustering method enables spectral-matching-based annotation for metabolomics data [J]. Analytical Chemistry, 2014, 86(14): 6812-6817. doi: 10.1021/ac501530d [33] JU R, LIU X Y, ZHENG F J, et al. Removal of false positive features to generate authentic peak table for high-resolution mass spectrometry-based metabolomics study [J]. Analytica Chimica Acta, 2019, 1067: 79-87. doi: 10.1016/j.aca.2019.04.011 [34] DALY R, ROGERS S, WANDY J, et al. MetAssign: probabilistic annotation of metabolites from LC-MS data using a Bayesian clustering approach [J]. Bioinformatics, 2014, 30(19): 2764-2771. doi: 10.1093/bioinformatics/btu370 [35] KOUŘIL Š, de SOUSA J, VÁCLAVÍK J, et al. CROP: correlation-based reduction of feature multiplicities in untargeted metabolomic data [J]. Bioinformatics, 2020, 36(9): 2941-2942. doi: 10.1093/bioinformatics/btaa012 [36] FRAISIER-VANNIER O, CHERVIN J, CABANAC G, et al. MS-CleanR: A feature-filtering workflow for untargeted LC-MS based metabolomics [J]. Analytical Chemistry, 2020, 92(14): 9971-9981. doi: 10.1021/acs.analchem.0c01594 [37] LJONCHEVA M, STEPIŠNIK T, DŽEROSKI S, et al. Cheminformatics in MS-based environmental exposomics: Current achievements and future directions [J]. Trends in Environmental Analytical Chemistry, 2020, 28: e00099. doi: 10.1016/j.teac.2020.e00099 [38] GORNIK T, KOVACIC A, HEATH E, et al. Biotransformation study of antidepressant sertraline and its removal during biological wastewater treatment [J]. Water Research, 2020, 181: 115864. doi: 10.1016/j.watres.2020.115864 [39] WEIZEL A, SCHLÜSENER M P, DIERKES G, et al. Analysis of the aerobic biodegradation of glucocorticoids: Elucidation of the kinetics and transformation reactions [J]. Water Research, 2020, 174: 115561. doi: 10.1016/j.watres.2020.115561 [40] PURSCHKE K, VOSOUGH M, LEONHARDT J, et al. Evaluation of nontarget long-term LC–HRMS time series data using multivariate statistical approaches [J]. Analytical Chemistry, 2020, 92(18): 12273-12281. doi: 10.1021/acs.analchem.0c01897 [41] HOHRENK L L, VOSOUGH M, SCHMIDT T C. Implementation of chemometric tools to improve data mining and prioritization in LC-HRMS for nontarget screening of organic micropollutants in complex water matrixes [J]. Analytical Chemistry, 2019, 91(14): 9213-9220. doi: 10.1021/acs.analchem.9b01984 [42] WANG X B, YU N Y, QIAN Y L, et al. Non-target and suspect screening of per- and polyfluoroalkyl substances in Chinese municipal wastewater treatment plants [J]. Water Research, 2020, 183: 115989. doi: 10.1016/j.watres.2020.115989 [43] LI Y Q, YU N Y, DU L T, et al. Transplacental transfer of per- and polyfluoroalkyl substances identified in paired maternal and cord sera using suspect and nontarget screening [J]. Environmental Science & Technology, 2020, 54(6): 3407-3416. [44] WANG Y, YU N Y, ZHU X B, et al. Suspect and nontarget screening of per- and polyfluoroalkyl substances in wastewater from a fluorochemical manufacturing park [J]. Environmental Science & Technology, 2018, 52(19): 11007-11016. [45] KOELMEL J P, PAIGE M K, ARISTIZABAL-HENAO J J, et al. Toward comprehensive per- and polyfluoroalkyl substances annotation using FluoroMatch software and intelligent high-resolution tandem mass spectrometry acquisition [J]. Analytical Chemistry, 2020, 92(16): 11186-11194. doi: 10.1021/acs.analchem.0c01591 [46] FU Y Q, ZHANG Y H, ZHOU Z H, et al. Screening and determination of potential risk substances based on liquid chromatography–high-resolution mass spectrometry [J]. Analytical Chemistry, 2018, 90(14): 8454-8461. doi: 10.1021/acs.analchem.8b01153 [47] ZHANG M J, LIU Y L, CHEN J, et al. Sensitive untargeted screening of nerve agents and their degradation products using liquid chromatography-high resolution mass spectrometry [J]. Analytical Chemistry, 2020, 92(15): 10578-10587. doi: 10.1021/acs.analchem.0c01508 [48] ESPOSITO G, TETA R, MARRONE R, et al. A fast detection strategy for cyanobacterial blooms and associated cyanotoxins (FDSCC) reveals the occurrence of lyngbyatoxin A in Campania (South Italy) [J]. Chemosphere, 2019, 225: 342-351. doi: 10.1016/j.chemosphere.2019.02.201 [49] TETA R, DELLA SALA G, GLUKHOV E, et al. Combined LC–MS/MS and molecular networking approach reveals new cyanotoxins from the 2014 cyanobacterial bloom in green lake, Seattle [J]. Environmental Science & Technology, 2015, 49(24): 14301-14310. [50] le DARÉ B, FERRON P J, ALLARD P M, et al. New insights into quetiapine metabolism using molecular networking [J]. Scientific Reports, 2020, 10(1): 19921. doi: 10.1038/s41598-020-77106-x [51] HOLLENDER J, SCHYMANSKI E L, SINGER H P, et al. Nontarget screening with high resolution mass spectrometry in the environment: Ready to go? [J]. Environmental Science & Technology, 2017, 51(20): 11505-11512. [52] POCHIRAJU S S, LINDEN K, GU A Z, et al. Development of a separation framework for effects-based targeted and non-targeted toxicological screening of water and wastewater [J]. Water Research, 2020, 170: 115289. doi: 10.1016/j.watres.2019.115289 [53] SCHEUBERT K, HUFSKY F, PETRAS D, et al. Significance estimation for large scale metabolomics annotations by spectral matching [J]. Nature Communications, 2017, 8: 1494. doi: 10.1038/s41467-017-01318-5 [54] STEIN S E, SCOTT D R. Optimization and testing of mass spectral library search algorithms for compound identification [J]. Journal of the American Society for Mass Spectrometry, 1994, 5(9): 859-866. doi: 10.1016/1044-0305(94)87009-8 [55] VINAIXA M, SCHYMANSKI E L, NEUMANN S, et al. Mass spectral databases for LC/MS- and GC/MS-based metabolomics: State of the field and future prospects [J]. TrAC Trends in Analytical Chemistry, 2016, 78: 23-35. [56] HUFSKY F, SCHEUBERT K, BÖCKER S. New kids on the block: Novel informatics methods for natural product discovery [J]. Natural Product Reports, 2014, 31(6): 807. doi: 10.1039/c3np70101h [57] XUE J C, GUIJAS C, BENTON H P, et al. METLIN MS2 molecular standards database: A broad chemical and biological resource [J]. Nature Methods, 2020, 17(10): 953-954. doi: 10.1038/s41592-020-0942-5 [58] TSUGAWA H, CAJKA T, KIND T, et al. MS-DIAL: Data-independent MS/MS deconvolution for comprehensive metabolome analysis [J]. Nature Methods, 2015, 12(6): 523-526. [59] DÜHRKOP K, FLEISCHAUER M, LUDWIG M, et al. SIRIUS 4: A rapid tool for turning tandem mass spectra into metabolite structure information [J]. Nature Methods, 2019, 16(4): 299-302. [60] RÖST H L, SACHSENBERG T, AICHE S, et al. OpenMS: A flexible open-source software platform for mass spectrometry data analysis [J]. Nature Methods, 2016, 13(9): 741-748. doi: 10.1038/nmeth.3959 [61] WANG M X, CARVER J J, PHELAN V V, et al. Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking [J]. Nature Biotechnology, 2016, 34(8): 828-837. [62] TSUGAWA H, IKEDA K, TAKAHASHI M, et al. A lipidome atlas in MS-DIAL 4 [J]. Nature Biotechnology, 2020, 38(10): 1159-1163. doi: 10.1038/s41587-020-0531-2 [63] TSUGAWA H, NAKABAYASHI R, MORI T, et al. A cheminformatics approach to characterize metabolomes in stable-isotope-labeled organisms [J]. Nature Methods, 2019, 16(4): 295-298. doi: 10.1038/s41592-019-0358-2 [64] QIAN Y L, WANG X B, WU G, et al. Screening priority indicator pollutants in full-scale wastewater treatment plants by non-target analysis [J]. Journal of Hazardous Materials, 2021, 414: 125490. doi: 10.1016/j.jhazmat.2021.125490 [65] ALLARD P M, GENTA-JOUVE G, WOLFENDER J L. Deep metabolome annotation in natural products research: Towards a virtuous cycle in metabolite identification [J]. Current Opinion in Chemical Biology, 2017, 36: 40-49. doi: 10.1016/j.cbpa.2016.12.022 [66] RUTTKIES C, NEUMANN S, POSCH S. Improving MetFrag with statistical learning of fragment annotations [J]. BMC Bioinformatics, 2019, 20(1): 1-14. doi: 10.1186/s12859-018-2565-8 [67] SCHYMANSKI E L, GALLAMPOIS C M J, KRAUSS M, et al. Consensus structure elucidation combining GC/EI-MS, structure generation, and calculated properties [J]. Analytical Chemistry, 2012, 84(7): 3287-3295. doi: 10.1021/ac203471y [68] GETZINGER G J, HIGGINS C P, FERGUSON P L. Structure database and in silico spectral library for comprehensive suspect screening of per- and polyfluoroalkyl substances (PFASs) in environmental media by high-resolution mass spectrometry [J]. Analytical Chemistry, 2021, 93(5): 2820-2827. doi: 10.1021/acs.analchem.0c04109 [69] DJOUMBOU-FEUNANG Y, PON A, KARU N, et al. CFM-ID 3.0: Significantly improved ESI-MS/MS prediction and compound identification [J]. Metabolites, 2019, 9(4): 72. doi: 10.3390/metabo9040072 [70] LI Z C, LU Y, GUO Y F, et al. Comprehensive evaluation of untargeted metabolomics data processing software in feature detection, quantification and discriminating marker selection [J]. Analytica Chimica Acta, 2018, 1029: 50-57. doi: 10.1016/j.aca.2018.05.001 [71] COBLE J B, FRAGA C G. Comparative evaluation of preprocessing freeware on chromatography/mass spectrometry data for signature discovery [J]. Journal of Chromatography A, 2014, 1358: 155-164. doi: 10.1016/j.chroma.2014.06.100 [72] LIBISELLER G, DVORZAK M, KLEB U, et al. IPO: a tool for automated optimization of XCMS parameters [J]. BMC Bioinformatics, 2015, 16: 118. doi: 10.1186/s12859-015-0562-8 [73] ELIASSON M, RÄNNAR S, MADSEN R, et al. Strategy for optimizing LC-MS data processing in metabolomics: A design of experiments approach [J]. Analytical Chemistry, 2012, 84(15): 6869-6876. doi: 10.1021/ac301482k [74] ZHENG H, CLAUSEN M R, DALSGAARD T K, et al. Time-saving design of experiment protocol for optimization of LC-MS data processing in metabolomic approaches [J]. Analytical Chemistry, 2013, 85(15): 7109-7116. doi: 10.1021/ac4020325 [75] MCLEAN C, KUJAWINSKI E B. AutoTuner: High fidelity and robust parameter selection for metabolomics data processing [J]. Analytical Chemistry, 2020, 92(8): 5724-5732. doi: 10.1021/acs.analchem.9b04804