PMID: PMC5795011-1-2

 

    Legend: Gene, Sites

Title : Linearized glycopeptide sequences and custom glycoprotein databases

Abstract :
  1. After manual annotation of various glycopeptide MS2 spectra, linear glycan sequences were defined based on the criteria that they should i) cover the maximum possible intense peaks in the MS2 spectrum and ii) provide close to complete information about the glycopeptide sequence
  2. Considering the di-sialylated bi-antennary glycopeptide (Fig. 1A), the following linear sequence OJUUJOJJJOO-peptide fulfills the criteria mentioned above (Fig. 1B)
  3. By attaching the glycan sequence at the peptide N-terminus, the Yn type glycosidic cleavage ions now become the peptide cleavage type y ions
  4. The last three residues at the N-terminus (OJU) cover the three most intense peaks of oxonium ions at 204.086 (HexNAc) , 366.138 (HexNAc-Hex) and 657.233 (HexNAc-Hex-Neu5Ac) as b1, b2 and b3 ions
  5. The remaining linear sequence ( UJOJJJOO-peptide ) can be annotated to the intense peaks as yn to yn+7 ions (Fig. 1B)
  6. The spectrum now contains a series of eight y type and three b type intense ions
  7. All major glycan structures were converted to linear sequences following the same principles (Supplementary Table 1)
  8. The next step was to create a customized database , where both the protein and glycan sequences co-exist
  9. An in-house written python script was developed for this purpose (Supplementary File)
  10. Briefly, following an in-silico digestion, the tryptic peptides containing NxT/S/C motifs (N-linked glycosylation) or serine/threonine residues (O-linked glycosylation) and the linear glycan sequences were combined (Supplementary Fig. 2)
  11. The custom database used in the manuscript, if not otherwise described, consists of a total of 406 potential glycoproteins which were known to be glycosylated in serum (PeptideAtlas N-Glyco build 2010)
  12. After adding 21 unique linear sialylated glycan sequences (Supplementary Table 1), the database contained 41,727 potential glycopeptide sequences and a total of 1,195,485 residues
Output (sent_index, trigger, protein, sugar, site):
  • 0. glycopeptide, , -, -, glycopeptide sequences
  • 0. glycoprotein, , glycoprotein, -, -
  • 1. glycopeptide, , -, -, glycopeptide sequence
  • 1. glycopeptide, , -, -, glycopeptide
  • 11. glycoproteins, , glycoproteins, -, -
  • 11. glycosylated, , glycoproteins, -, -
  • 12. glycopeptide, , -, -, glycopeptide sequences
  • 12. sialylated, , -, -, sequences
  • 2. di-sialylated, , -, -, glycopeptide
  • 2. glycopeptide, , -, -, glycopeptide
Output(Part-Of) (sent_index, protein, site):
*Output_Site_Fusion* (sent_index, protein, sugar, site):

 

 

Protein NCBI ID SENTENCE INDEX