PMID: PMC4739677-1-6

 

    Legend: Gene, Sites

Title : Identification of the Peptide Moiety

Abstract :
  1. To complement the deduced glycan com position with peptide sequence information, CID-MS3 experiments were conducted on putative peptide masses, which were derived from CID-MS2 spectra (Fig. 4A)
  2. In separate LC-MS runs the selected peptide precursor masses (predominantly singly charged) were used to trigger manual CID-MS3 fragmentation
  3. In rare cases peptide\+HexNAc was selected for fragmentation , because of low signal intensity of the peptide species in MS2
  4. CID-MS3 spectra were searched against the human subset of the highly curated and nonredundant protein database UniProtKB /Swiss- Prot
  5. Notably, also in some CID-MS2 spectra b- and y-ions derived from peptide backbone cleavages were detected, which enabled peptide identification (e.g.supplemental Fig
  6. S5 : α-2-HS-glycoprotein m/z 623.233+)
  7. For 88 detected glycopeptides , 60 corresponding peptides could be identified unambiguously (Table I, Table II)
  8. These 60 peptides could be linked to 22 different proteins , most of them being acute phase proteins
  9. As the protein identification is based on a single peptide , validation of the potential peptide hits is of utmost importance
  10. Here, in particular, the protein inference problem (53), which is intrinsic to bottom-up proteomic approaches, had to be considered
  11. To cope with this, peptide spectra were manually revised and only peptide hits with a MASCOT ion score of greater than 20 were considered; only in rare cases, and supported by other evidences, also lower scored peptides were accepted
  12. Furthermore, peptide hits needed to exhibit at least one potential O-glycosylation site ( Ser/Thr )
  13. If available, knowledge derived from public databases (UniProtKB and UniCarbKB) on already described O-glycosylation sites within the putative peptides or within the entire protein was used to support a potential hit
  14. The peptide identification was further corroborated by redundant identifications, that is the multiple occurrence of: (1) the same glycopeptide in different HILIC fractions, (2) or the same peptide but with a different glycan moiety, (3) or the identification of a peptide harboring the same glycosylation site , but differing in peptide length; the latter being attributed to the broad-specific proteolysis (e.g. alpha-2-HS-glycoprotein , 341TVVQP[HexNAc1Hex1NeuAc1]VG348 derived from HILIC fraction #13 and 342VVQP[HexNAc1Hex1NeuAc1]VG348 from fraction #14)
  15. In some cases, though, peptide identification was hampered or inconclusive
  16. One of the main obstacles here was the frequent occurrence of prolines within the (glyco)peptide sequence , which was also described in literature
  17. The cyclic structure of proline , gives rise to a high signal of the preceding y-ion but precludes in most cases the generation of a subsequent b-ion—thus introducing a sequence gap (54)
  18. This in turn leads to incomplete peptide fragment ion series and the occurrence of dipeptide fragment ions (e.g. PS and SP), which may result in ambiguity in peptide identification
  19. This effect is particularly critical for short peptide sequences , as usually obtained by a broad- or nonspecific digest
  20. The average peptide length of glycopeptides identified in this study is 10 amino acids (aa)
  21. This is significantly shorter than the average length of tryptic peptides (14 aa, based on an in-silico digestion of the human UniProtKB database (55), supplemental Fig
  22. S2)
  23. All this—in conjunction with a nonspecific peptide search—makes a reliable peptide identification challenging
  24. To complement the identified O-glycopeptides with nonglycosylated peptides that are also present in blood plasma, CID und ETD fragment spectra of the corresponding HILIC fractions (#1–17) were searched against the human subset of the UniProtKB/Swiss- Prot protein database
  25. In total 111 proteins were identified
  26. CID and ETD spectra provided complementary results; 54 and 45 proteins were identified, respectively, and only 12 proteins were identified with both modes
  27. Compared with ETD, significantly more peptides were identified with CID (321 versus 150), though
  28. The majority of peptides were derived from immunoglobulins, serotransferrin , haptoglobin and serumalbumin (supplemental Table S1)
  29. Notably, also nonglycosylated peptides corresponding to previously identified O-glycopeptides , e.g. of plasminogen and hemopexin , were identified (Table I)
Output (sent_index, trigger, protein, sugar, site):
  • 12. O-glycosylation, , -, -, Ser/Thr
  • 12. O-glycosylation, , -, -, site
  • 13. O-glycosylation, , -, -, sites
  • 14. alpha-2-HS-glycoprotein, , alpha-2-HS-glycoprotein, -, -
  • 14. glycopeptide, , -, -, glycopeptide
  • 14. glycopeptide, , -, -, peptide
  • 14. glycosylation, , -, -, site
  • 20. glycopeptides, , -, -, glycopeptides
  • 24. O-glycopeptides, , -, -, O-glycopeptides
  • 24. nonglycosylated, , -, -, peptides
  • 29. O-glycopeptides, , hemopexin, -, O-glycopeptides
  • 29. O-glycopeptides, , plasminogen, -, O-glycopeptides
  • 29. nonglycosylated, , -, -, peptides
  • 6. α-2-HS-glycoprotein, , α-2-HS-glycoprotein, -, -
  • 7. glycopeptides, , -, -, glycopeptides
Output(Part-Of) (sent_index, protein, site):
  • 29. hemopexin, O-glycopeptides
  • 29. plasminogen, O-glycopeptides
*Output_Site_Fusion* (sent_index, protein, sugar, site):

 

 

Protein NCBI ID SENTENCE INDEX