Title : Identification of the Peptide Moiety
Abstract :
- To complement the deduced glycan com position with peptide sequence information, CID-MS3 experiments were conducted on putative peptide masses, which were derived from CID-MS2 spectra (Fig. 4A)
- In separate LC-MS runs the selected peptide precursor masses (predominantly singly charged) were used to trigger manual CID-MS3 fragmentation
- In rare cases peptide\+HexNAc was selected for fragmentation , because of low signal intensity of the peptide species in MS2
- CID-MS3 spectra were searched against the human subset of the highly curated and nonredundant protein database UniProtKB /Swiss- Prot
- Notably, also in some CID-MS2 spectra b- and y-ions derived from peptide backbone cleavages were detected, which enabled peptide identification (e.g.supplemental Fig
- S5 : α-2-HS-glycoprotein m/z 623.233+)
- For 88 detected glycopeptides , 60 corresponding peptides could be identified unambiguously (Table I, Table II)
- These 60 peptides could be linked to 22 different proteins , most of them being acute phase proteins
- As the protein identification is based on a single peptide , validation of the potential peptide hits is of utmost importance
- Here, in particular, the protein inference problem (53), which is intrinsic to bottom-up proteomic approaches, had to be considered
- To cope with this, peptide spectra were manually revised and only peptide hits with a MASCOT ion score of greater than 20 were considered; only in rare cases, and supported by other evidences, also lower scored peptides were accepted
- Furthermore, peptide hits needed to exhibit at least one potential O-glycosylation site ( Ser/Thr )
- If available, knowledge derived from public databases (UniProtKB and UniCarbKB) on already described O-glycosylation sites within the putative peptides or within the entire protein was used to support a potential hit
- The peptide identification was further corroborated by redundant identifications, that is the multiple occurrence of: (1) the same glycopeptide in different HILIC fractions, (2) or the same peptide but with a different glycan moiety, (3) or the identification of a peptide harboring the same glycosylation site , but differing in peptide length; the latter being attributed to the broad-specific proteolysis (e.g. alpha-2-HS-glycoprotein , 341TVVQP[HexNAc1Hex1NeuAc1]VG348 derived from HILIC fraction #13 and 342VVQP[HexNAc1Hex1NeuAc1]VG348 from fraction #14)
- In some cases, though, peptide identification was hampered or inconclusive
- One of the main obstacles here was the frequent occurrence of prolines within the (glyco)peptide sequence , which was also described in literature
- The cyclic structure of proline , gives rise to a high signal of the preceding y-ion but precludes in most cases the generation of a subsequent b-ion—thus introducing a sequence gap (54)
- This in turn leads to incomplete peptide fragment ion series and the occurrence of dipeptide fragment ions (e.g. PS and SP), which may result in ambiguity in peptide identification
- This effect is particularly critical for short peptide sequences , as usually obtained by a broad- or nonspecific digest
- The average peptide length of glycopeptides identified in this study is 10 amino acids (aa)
- This is significantly shorter than the average length of tryptic peptides (14 aa, based on an in-silico digestion of the human UniProtKB database (55), supplemental Fig
- S2)
- All this—in conjunction with a nonspecific peptide search—makes a reliable peptide identification challenging
- To complement the identified O-glycopeptides with nonglycosylated peptides that are also present in blood plasma, CID und ETD fragment spectra of the corresponding HILIC fractions (#1–17) were searched against the human subset of the UniProtKB/Swiss- Prot protein database
- In total 111 proteins were identified
- CID and ETD spectra provided complementary results; 54 and 45 proteins were identified, respectively, and only 12 proteins were identified with both modes
- Compared with ETD, significantly more peptides were identified with CID (321 versus 150), though
- The majority of peptides were derived from immunoglobulins, serotransferrin , haptoglobin and serumalbumin (supplemental Table S1)
- Notably, also nonglycosylated peptides corresponding to previously identified O-glycopeptides , e.g. of plasminogen and hemopexin , were identified (Table I)
Output (sent_index, trigger,
protein,
sugar,
site):
- 12. O-glycosylation, , -, -, Ser/Thr
- 12. O-glycosylation, , -, -, site
- 13. O-glycosylation, , -, -, sites
- 14. alpha-2-HS-glycoprotein, , alpha-2-HS-glycoprotein, -, -
- 14. glycopeptide, , -, -, glycopeptide
- 14. glycopeptide, , -, -, peptide
- 14. glycosylation, , -, -, site
- 20. glycopeptides, , -, -, glycopeptides
- 24. O-glycopeptides, , -, -, O-glycopeptides
- 24. nonglycosylated, , -, -, peptides
- 29. O-glycopeptides, , hemopexin, -, O-glycopeptides
- 29. O-glycopeptides, , plasminogen, -, O-glycopeptides
- 29. nonglycosylated, , -, -, peptides
- 6. α-2-HS-glycoprotein, , α-2-HS-glycoprotein, -, -
- 7. glycopeptides, , -, -, glycopeptides
Output(Part-Of) (sent_index,
protein,
site):
- 29. hemopexin, O-glycopeptides
- 29. plasminogen, O-glycopeptides
*Output_Site_Fusion* (sent_index,
protein,
sugar,
site):