De Novo Sequencing of Proteins and Peptides: Algorithms, Applications, Perspectives

Main Article Content

K.V. Vyatkina

Abstract

Determination of the primary structure of proteins and peptides constitutes an important step in studying their properties. Currently, mass spectrometry is commonly applied to this end. The results of mass spectrometric measurements can be interpreted by means of either database search or de novo sequencing methods. The appeal of the latter is due to their applicability to investigating unknown proteins, as well as the ones that cannot be analyzed with genomics or transcriptomics methods. In this paper we briefly review the existing approaches to de novo sequencing of proteins and peptides, along with the problems that can be solved using those, and indicate directions and perspectives for their further development.

Article Details

How to Cite
Vyatkina, K. (2018). De Novo Sequencing of Proteins and Peptides: Algorithms, Applications, Perspectives. Biomedical Chemistry: Research and Methods, 1(1), e00005. https://doi.org/10.18097/BMCRM00005
Section
REVIEWS

References

  1. Edman P. (1949) A method for the determination of amino acid sequence in peptides. Arch. Biochem., 22(3):475-476.
  2. Edman P. (1950) Method for determination of the amino acid sequence in peptides. Acta Chem. Scand., 4:283-293.
  3. Eng J. K., McCormack A. L., Yates J. R. (1994) An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database . J. Am. Soc. Mass Spectrom., 5(11):976-989. DOI
  4. Perkins D. N., Pappin D. J. C., Creasy D. M., Cottrell J. S. (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis, 20(18):3551-3567. DOI
  5. Kim S., Gupta N., Pevzner P. A. (2008) Spectral probabilities and generating functions of tandem mass spectra: a strike against decoy databases. J. Proteome Res., 7 (8):3354-3363. DOI
  6. Kim S., Pevzner P. A. (2014) MS-GF+ makes progress towards a universal database search tool for proteomics. Nat. Commun., 5: 5277. DOI
  7. Cox J., Neuhauser N., Michalski A., Scheltema R. A., Olsen J. V., Mann M. (2011) Andromeda: A peptide search engine integrated into the MaxQuant environment. J. Proteome Res., 10 (4):1794-1805. DOI
  8. LeDuc R. D., Taylor G. K., Kim Y. B., Januszyk T. E., Bynum L. H., Sola J. V., Garavelli J. S., Kelleher N. L. (2004) ProSight PTM: an integrated environment for protein identification and characterization by top-down mass spectrometry. Nucleic Acids Res., 32(Web Server issue):W340-W345. DOI
  9. Zamdborg L., LeDuc R. D., Glowacz K. J., Kim Y. B., Viswanathan V., Spaulding I. T., Early B. P., Bluhm E. J., Babai S., Kelleher N. L. (2007) ProSight PTM 2.0: improved protein identification and characterization for top down mass spectrometry. Nucleic Acids Res., 35(Web Server issue):W701-W706. DOI
  10. Liu X., Sirotkin Y., Shen Y., Anderson G., Tsai Y. S., Ting Y. S., Goodlett D. R., Smith R. D., Bafna V., Pevzner P. A. (2012) Protein identification using top-down spectra. Mol. Cell Proteomics, 11(6):M111.008524. DOI
  11. Kou Q., Xun L., Liu X. (2016) TopPIC: a software tool for top-down mass spectrometry-based proteoform identification and characterization. Bioinformatics, 2(22):3495-3497. DOI
  12. Mann M., Wilm M. (1994) Error-tolerant identification of peptides in sequence databases by peptide sequence tags. Anal. Chem., 66 (24):4390–4399. DOI
  13. Taylor J. A., Johnson R. S. (2011) Implementation and uses of automated de novo peptide sequencing by tandem mass spectrometry. Anal. Chem., 73(11):2594-2604. DOI
  14. Tabb D. L., Saraf A., Yates J. R. (2003) GutenTag: high-throughput sequence tagging via an empirically derived fragmentation model. Anal. Chem., 75(23):6415–6421. DOI
  15. Sunyaev S., Liska A. J., Golod A., Shevchenko A., Shevchenko A. (2003) MultiTag: multiple error-tolerant sequence tag search for the sequence-similarity identification of proteins by mass spectrometry. Anal. Chem., 75(6):1307-1315. DOI
  16. Searle B. C., Dasari S., Turner M., Reddy A. P., Choi D., Wilmarth P. A., McCormack A. L., David L. L., Nagalla S. R. (2004) High-throughput identification of proteins and unanticipated sequence modifications using a mass-based alignment algorithm for MS/MS de novo sequencing results Anal. Chem., 76(8):2220–2230. DOI
  17. Savitski M. M., Nielsen M. L., Zubarev R. A. (2005) New data base-independent, sequence tag-based scoring of peptide MS/MS data validates Mowse scores, recovers below threshold data, singles out modified peptides, and assesses the quality of MS/MS techniques. Mol Cell. Proteomics, 4(8):1180-1188. DOI
  18. Frank A., Tanner S., Bafna V., Pevzner P. (2005) Peptide sequence tags for fast database search in mass-spectrometry. J. Proteome Res., 4(4):1287–1295. DOI
  19. Cao X., Nesvizhskii A. I. (2008) Improved sequence tag generation method for peptide identification in tandem mass spectrometry. J. Proteome Res., 7(10):4422–4434. DOI
  20. Na S., Jeong J., Park H., Lee K. J., Paek E. (2008) Unrestrictive identification of multiple post-translational modifications from tandem mass spectrometry using an error-tolerant algorithm based on an extended sequence tag approach. Mol. Cell Proteomics., 7(12):2452-2463. DOI
  21. Shen Y., Tolic N., Hixson K. K., Purvine S. O., Anderson G. A., Smith R. D. (2008) De novo sequencing of unique sequence tags for discovery of post-translational modifications of proteins. Anal. Chem., 8 (20):7742–7754. DOI
  22. Tabb D. L., Ma Z.-Q., Martin D. B., Ham A.-J. L., Chambers M. C. (2008) DirecTag: Accurate sequence tags from peptide MS/MS through statistical scoring. J. Proteome Res., 7(9):3838–3846. DOI
  23. Pan C., Park B. H., McDonald W. H., Carey P. A., Banfield J. F., VerBerkmoes N. C., Hettich R. L., Samatova N. F. (2010) A high-throughput de novo sequencing approach for shotgun proteomics using high-resolution tandem mass spectrometry. BMC Bioinformatics, 11:118. DOI
  24. Liu W. T., Kersten R. D., Yang Y. L., Moore B. S., Dorrestein P. C. (2011) Imaging mass spectrometry and genome mining via short sequence tagging identified the anti-infective agent arylomycin in Streptomyces roseosporus. J. Am. Chem, Soc., 133(45):18010-18013. DOI
  25. Kersten R. D., Yang Y. L., Xu Y., Cimermancic P., Nam S. J., Fenical W., Fischbach M. A., Moore B. S., Dorrestein P. C. (2011) Natural product peptidogenomics: A mass spectrometry-guided genome mining approach. Nat. Chem. Biol. 7(11):794-802. DOI
  26. Taylor J. A., Johnson R. S. (1997) Sequence database searches via de novo peptide sequencing by tandem mass spectrometry. Rapid Commun. Mass Spectrom.,11(9):1067-75. DOI
  27. Bartels C. (1990) Fast algorithm for peptide sequencing by mass spectroscopy. Biol. Mass Spectrom., 19:363–368. DOI
  28. Ma B., Zhang K., Hendrie C., Liang C., Li M., Doherty-Kirby A., Lajoie G. (2003) PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. Rapid Commun. Mass Spectrom. 17(20):2337-2342. DOI
  29. Frank A., Pevzner P. (2005) PepNovo: De novo peptide sequencing via probabilistic network modeling. Anal. Chem. 77(4):964-73. DOI
  30. Vyatkina K., Wu S., Dekker L. J. M., VanDuijn M. M., Liu X., Tolic N., Dvorkin M., Alexandrova S., Luider T. M., Pasa-Tolic L., Pevzner P. A. (2015) De novo sequencing of peptides from top-down tandem mass spectra. J. Proteome Res. 14(11):4450-62. DOI
  31. Vyatkina K., Wu S., Dekker L. J. M., VanDuijn M. M., Liu X., Tolic N., Luider T. M., Pasa-Tolic L., Pevzner P. A. (2016) Top-down analysis of protein samples by de novo sequencing techniques. Bioinformatics, 32(18):2753-2759. DOI
  32. Vyatkina K. (2017) De novo sequencing of top-down tandem mass spectra: A next step towards retrieving a complete protein sequence. Proteomes, 5(1): 6. DOI
  33. Vyatkina K., Dekker L. J. M., Wu S., VanDuijn M. M., Liu X., Tolic N., Luider T. M., Pasa-Tolic L. (2017) De novo sequencing of peptides from high-resolution bottom-up tandem mass spectra using top-down intended methods. Proteomics, 17(23-24). DOI
  34. Ma B. (2015) Novor: Real-time peptide de novo sequencing software. J. Am. Soc. Mass Spectrom. 26(11):1885-1894. DOI
  35. Elias J. E., Gygi S. P. (2007) Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods, 4(3):207-214. DOI
  36. Artemenko K.A., Samgina T.YU., Lebedev A.T. (2006) Mass-spektrometricheskoe de novo sekvenirovanie peptidov. Mass-spektrometriya, 3(4):225-254.
  37. Lebedev A.T., Artemenko K.A., Samgina T.YU. (2012) Osnovy mass-spektrometrii belkov i peptidov, M.: Tekhnosfera, 176 s.
  38. Lebedev A.T, Artemenko K.A., Samgina T. (2015) Mass-spektrometriya v organicheskoj himii (2-e izd.), M.: Tekhnosfera, 704 s.
  39. Taylor J. A., Johnson R. S. (2001) Implementation and uses of automated de novo peptide sequencing by tandem mass spectrometry. Anal. Chem., 73(11):2594-2604. DOI
  40. Dancik V., Addona T. A., Clauser K. R., Vath J. E., Pevzner P. A. (1999) De novo peptide sequencing via tandem mass spectrometry. J. Comput. Biol. 6(3-4):327-42. DOI
  41. Frank A. M., Savitski M. M., Nielsen M. L., Zubarev R. A., Pevzner P. A. (2007) De novo peptide sequencing and identification with precision mass spectrometry. J. Proteome Res., 6(1):114-123. DOI
  42. Frank A. M. (2009) A ranking-based scoring function for peptide-spectrum matches. J. Proteome Res., 8(5):2241-2252. DOI
  43. Frank A. M. (2009) Predicting intensity ranks of peptide fragment ions. J. Proteome Res., 8(5): 2226-2240. DOI
  44. Fischer B., Roth V., Roos F., Grossmann J., Baginsky S., Widmayer P., Gruissem W., Buhmann J. M. (2005) NovoHMM: a hidden Markov model for de novo peptide sequencing. Anal Chem., 77(22):7265-7273. DOI
  45. Chi H., Sun R. X., Yang B., Song C. Q., Wang L. H., Liu C., Fu Y., Yuan Z. F., Wang H. P., He S. M., Dong M. Q. (2010) pNovo: De novo peptide sequencing and identification using HCD spectra. J. Proteome Res., 9(5):2713-2724. DOI
  46. Jeong K., Kim S., Pevzner P. A. (2013) UniNovo: a universal tool for de novo peptide sequencing. UniNovo: a universal tool for de novo peptide sequencing. Bioinformatics, 29(16):1953-1962.DOI
  47. Olsen J. V., Macek B., Lange O., Makarov A., Horning S., Mann M. (2007) Higher-energy C-trap dissociation for peptide modification analysis. Nat. Methods, 4(9):709-712. DOI
  48. Syka J. E., Coon J. J., Schroeder M. J., Shabanowitz J., Hunt D. F. (2004) Peptide and protein sequence analysis by electron transfer dissociation mass spectrometry. Proc. Natl. Acad. Sci. USA, 101(26):9528-33. DOI
  49. Zubarev R. A., Kelleher N. L., McLafferty, F. W. (1998) Electron capture dissociation of multiply charged protein cations. A nonergodic process. J. Am. Chem. Soc., 120(13):3265–3266. DOI
  50. Frese C. K., Altelaar A. F., van den Toorn H., Nolting D., Griep-Raming J., Heck A. J., Mohammed S. (2012) Toward full peptide sequence coverage by dual fragmentation combining electron-transfer and higher-energy collision dissociation tandem mass spectrometry. Anal. Chem., 84(22):9668-9673. DOI
  51. Madsen J. A., Boutz D. R., Brodbelt J. S. (2010) Ultrafast ultraviolet photodissociation at 193 nm and its applicability to proteomic workflows. J. Proteome Res., 9(8):4205-4214. DOI
  52. Robotham S. A., Horton A. P., Cannon J. R., Cotham V. C., Marcotte E. M., Brodbelt J. S. (2016) UVnovo: A de novo sequencing algorithm using single series of fragment ions via chromophore tagging and 351 nm ultraviolet photodissociation mass spectrometry. Anal. Chem., 88(7):3990–3997. DOI
  53. Chi H., Chen H., He K., Wu L., Yang B., Sun R. X., Liu J., Zeng W. F., Song C. Q., He S. M., Dong M. Q. (2013) pNovo+: De novo peptide sequencing using complementary HCD and ETD tandem mass spectra. J. Proteome Res., 12(2):615-625. DOI
  54. He L., Ma B. (2010) ADEPTS: advanced peptide de novo sequencing with a pair of tandem mass spectra. J. Bioinform. Comput. Biol., 8(6):981-994. DOI
  55. Savitski M. M., Nielsen M. L., Zubarev R. A. (2005) New data base-independent, sequence tag-based scoring of peptide MS/MS data validates Mowse scores, recovers below threshold data, singles out modified peptides, and assesses the quality of MS/MS techniques. Mol. Cell Proteomics, 4(8):1180-1188. DOI
  56. Savitski M. M., Nielsen M. L., Kjeldsen F., Zubarev R. A. (2005) Proteomics-grade de novo sequencing approach. J. Proteome Res., 4(6):2348-2354. DOI
  57. Bertsch A., Leinenbach A., Pervukhin A., Lubeck M., Hartmer R., Baessmann C., Elnakady Y. A., Muller R., Bocker S., Huber C. G., Kohlbacher O. (2009) De novo peptide sequencing by tandem MS using complementary CID and electron transfer dissociation. Electrophoresis, 30(21):3736-47. DOI
  58. Datta R., Bern M. (2009) Spectrum fusion: using multiple mass spectra for de novo peptide sequencing. J. Comput. Biol., 16(8):1169-1182. DOI
  59. Guthals A., Clauser K. R., Frank A. M., Bandeira N. (2013) Sequencing-grade de novo analysis of MS/MS Triplets (CID/HCD/ETD) from overlapping peptides. J. Proteome Res., 12(6):2846-2857. DOI
  60. Horton A. P., Robotham S. A., Cannon J. R., Holden D. D., Marcotte E. M., Brodbelt J. S. (2017) Comprehensive de novo peptide sequencing from MS/MS pairs generated through complementary collision induced dissociation and 351 nm ultraviolet photodissociation. Anal. Chem., 89 (6):3747-3753. DOI
  61. Bandeira N., Tang H., Bafna V., Pevzner P. (2004) Shotgun protein sequencing by tandem mass spectra assembly. Anal Chem., 76(24):7221-7233. DOI
  62. Bandeira N., Clauser K. R., Pevzner P. A. (2007) Shotgun protein sequencing: assembly of peptide tandem mass spectra from mixtures of modified proteins. Mol. Cell Proteomics, 6(7):1123-1134. DOI
  63. Bandeira N., Pham V., Pevzner P., Arnott D., Lill J. R. (2008) Automated de novo protein sequencing of monoclonal antibodies. Nat. Biotechnol., 26(12):1336-1338. DOI
  64. Castellana N. E., Pham V., Arnott D., Lill J. R., Bafna V. (2010) Template proteogenomics: sequencing whole proteins using an imperfect database. Mol. Cell Proteomics, 9(6):1260-1270. DOI
  65. Liu X., Han Y., Yuen D., Ma B. (2009) Automated protein (re)sequencing with MS/MS and a homologous database yields almost full coverage and accuracy. Bioinformatics, 25(17):2174-80. DOI
  66. Blank-Landeshammer B., Kollipara L., Bi? K., Pfenninger M., Malchow S., Shuvaev K., Zahedi R. P., Sickmann A. (2017) Combining de novo peptide sequencing algorithms, a synergistic approach to boost both identifications and confidence in bottom-up proteomics. J. Proteome Res., 16(9):3209-3218. DOI
  67. Yang H., Chi H., Zhou W.-J., Zeng W.-F., He K., Liu C., Sun R.-X., He S.-M. (2017) Open-pNovo: De novo peptide sequencing with thousands of protein modifications. J. Proteome Res., 16(2):645-654. DOI
  68. Creasy, D. M.; Cottrell, J. S. (2004) Unimod: Protein modifications for mass spectrometry. Proteomics, 4(6):1534-1536. DOI
  69. Gorshkov V., Hotta S. Y. K., Verano?Braga T., Kjeldsen F. (2016) Peptide de novo sequencing of mixture tandem mass spectra. Proteomics, 16(18):2470-2479. DOI
  70. Horn D. M., Zubarev R. A., McLafferty, F. W. (2000) Automated de novo sequencing of proteins by tandem high-resolution mass spectrometry. Proc. Natl. Acad. Sci. USA, 97(19):10313-10317. DOI
  71. Liu X., Dekker L. J. M., Wu S., VanDuijn M. M., Luider T. M., Tolic N., Kou Q., Dvorkin M., Alexandrova S., Vyatkina K., Pasa-Tolic L., Pevzner P. A. (2014) De novo protein sequencing by combining top-down and bottom-up tandem mass spectra. J. Proteome Res., 13(7):3241-3248. DOI
  72. Liu X., Inbar Y., Dorrestein P. C., Wynne C., Edwards N., Souda P., Whitelegge J. P., Bafna V., Pevzner P. A. (2010) Deconvolution and database search of complex tandem mass spectra of intact proteins: a combinatorial approach. Mol. Cell Proteomics, 9(12):2772-2782. DOI
  73. Ecker D. M., Jones S. D., Levine H. L. (2015) The therapeutic monoclonal antibody market. MAbs, 7(1):9-14. DOI
  74. Tran N. H., Rahman M. Z., He L., Xin L., Shan B., Li M. (2016) Complete de novo assembly of monoclonal antibody sequences. Sci. Rep., 6:31730. DOI
  75. Guthals A., Gan Y., Murray L., Chen Y., Stinson J., Nakamura G., Lill J. R., Sandova W., Bandeira N. (2017) De novo MS/MS sequencing of native human antibodies. J. Proteome Res., 16 (1):45-54. DOI
  76. Vonk F. J., Casewell N. R., Henkel C. V., Heimberg A. M., Jansen H. J., McCleary R. sJ., Kerkkamp H. M., Vos R. A., Guerreiro I., Calvete J. J., Wuster W., Woods A. E., Logan J. M., Harrison R. A., Castoe T. A., de Koning A. P., Pollock D. D., Yandell M., Calderon D., Renjifo C., Currier R. B., Salgado D., Pla D., Sanz L., Hyder A. S., Ribeiro J. M., Arntzen J. W., van den Thillart G. E., Boetzer M., Pirovano W., Dirks R-P., Spaink H. P., Duboule D., McGlinn E., Kini R. M., Richardson M. K. (2013) The king cobra genome reveals dynamic gene evolution and adaptation in the snake venom system. Proc. Natl. Acad. Sci. USA, 110:20651-20656. DOI
  77. Petras D., Heiss P., Harrison R. A., Sussmuth R. D., Calvete J. J. (2016) Top-down venomics of the East African green mamba, Dendroaspis angusticeps, and the black mamba, Dendroaspis polylepis, highlight the complexity of their toxin arsenals. J. Proteomics, 46:148-164. DOI
  78. Bhatia S., Kil Y. J., Ueberheide B., Chait B. T., Tayo L., Cruz L., Lu B., Yates III J. R., Bern M. (2012) Constrained de novo sequencing of conotoxins. J. Proteome Res., 11(8): 4191-4200. DOI
  79. Pukala T. L., Bowie J. H., Maselli V. M., Musgrave I. F., Tyler M. J. (2006) Host-defence peptides from the glandular secretions of amphibians: structure and activity. Nat. Prod. Rep., 23(3):368-393. DOI
  80. Samgina T. Yu., Artemenko K. A., Gorshkov V. A., Ogourtsov S. V., Zubarev R. A., Lebedev A. T. (2008) De novo sequencing of peptides secreted by the skin glands of the Caucasian Green Frog Rana ridibunda. Rapid Commun Mass Spectrom., 22(22):3517-3525. DOI
  81. Lebedev A., Samgina T. (2013) O chem mogut rasskazat' lyagushki? Izuchenie peptidnogo sostava kozhnogo sekreta amfibij. Analitika, 5(12):38-47.
  82. Simmaco M., Mignogna G., Barra D., Bossa F. (1994) Antimicrobial peptides from skin secretions of Rana esculenta. Molecular cloning of cDNAs encoding esculentin and brevinins and isolation of new active peptides. J. Biol. Chem., 269(16):11956-11961.
  83. Terterov I., Vyatkina K., Kononikhin A. S., Boitsov V., Vyazmin S., Popov I. A., Nikolaev E. N., Pevzner P., Dubina M. (2014) Application of de novo sequencing tools to study abiogenic peptide formations by tandem mass spectrometry. The case of homo?peptides from glutamic acid complicated by substitutions of hydrogen by sodium or potassium atoms. Rapid Commun Mass Spectrom., 28(1):33-41. DOI
  84. Robidart J., Callister S. J., Song P., Nicora C. D., Wheat C. G., Girguis P. R. (2013) Characterizing microbial community and geochemical dynamics at hydrothermal vents using osmotically driven continuous fluid samplers. Environ. Sci. Technol., 47(9):4399-4407. DOI
  85. Menschaert G., Vandekerckhove T. T., Baggerman G., Landuyt B., Sweedler J. V., Schoofs L., Luyten W., Van Criekinge W. (2010) A hybrid, de novo based, genome-wide database search approach applied to the sea urchin neuropeptidome. J. Proteome Res., 9(2):990-996. DOI
  86. Carrasco M. A., Buechler S. A., Arnold R. J., Sformo T., Barnes B. M., Duman J. G. (2011) Elucidating the biochemical overwintering adaptations of larval Cucujus clavipes puniceus, a nonmodel organism, via high throughput proteomics. J. Proteome Res., 10(10):4634-4646. DOI
  87. Laskay U.A., Srzentic K., Monod M., Tsybin Y.O. (2014) Extended bottom-up proteomics with secreted asparatic protease Sap9. J. Proteomics, 110:20-31. DOI
  88. Srzentic K., Fornelli L., Laskay U.A., Monod M., Beck A., Ayoub D., Tsybin Y.O. (2014) Advantages of extended bottom-up proteomics using Sap9 for analysis of monoclonal antibodies. Anal. Chem., 86(19):9945-9953. DOI
  89. Devabhaktuni A., Elias J. E. (2016) Application of de novo sequencing to large-scale complex proteomics data sets. J. Proteome Res., 15(3):732-742.DOI
  90. Yang H., Chi H., Zhou W.-J., Zeng W.-F., Liu C., Wang R.-M., Wang Z.-W., Niu X.-N., Chen Z.-L., He S.-M. (2018) pSite: Amino acid confidence evaluation for quality control of de novo peptide sequencing and modification site localization. J. Proteome Res., 17(1):119-128. DOI