Glyco

Title : Linearized glycopeptide sequences and custom glycoprotein databases

Abstract :

After manual annotation of various glycopeptide MS2 spectra, linear glycan sequences were defined based on the criteria that they should i) cover the maximum possible intense peaks in the MS2 spectrum and ii) provide close to complete information about the glycopeptide sequence
Considering the di-sialylated bi-antennary glycopeptide (Fig. 1A), the following linear sequence OJUUJOJJJOO-peptide fulfills the criteria mentioned above (Fig. 1B)
By attaching the glycan sequence at the peptide N-terminus, the Yn type glycosidic cleavage ions now become the peptide cleavage type y ions
The last three residues at the N-terminus (OJU) cover the three most intense peaks of oxonium ions at 204.086 (HexNAc) , 366.138 (HexNAc-Hex) and 657.233 (HexNAc-Hex-Neu5Ac) as b1, b2 and b3 ions
The remaining linear sequence ( UJOJJJOO-peptide ) can be annotated to the intense peaks as yn to yn+7 ions (Fig. 1B)
The spectrum now contains a series of eight y type and three b type intense ions
All major glycan structures were converted to linear sequences following the same principles (Supplementary Table 1)
The next step was to create a customized database , where both the protein and glycan sequences co-exist
An in-house written python script was developed for this purpose (Supplementary File)
Briefly, following an in-silico digestion, the tryptic peptides containing NxT/S/C motifs (N-linked glycosylation) or serine/threonine residues (O-linked glycosylation) and the linear glycan sequences were combined (Supplementary Fig. 2)
The custom database used in the manuscript, if not otherwise described, consists of a total of 406 potential glycoproteins which were known to be glycosylated in serum (PeptideAtlas N-Glyco build 2010)
After adding 21 unique linear sialylated glycan sequences (Supplementary Table 1), the database contained 41,727 potential glycopeptide sequences and a total of 1,195,485 residues

Output (sent_index, trigger, protein, sugar, site):

0. glycopeptide, , -, -, glycopeptide sequences
0. glycoprotein, , glycoprotein, -, -
1. glycopeptide, , -, -, glycopeptide sequence
1. glycopeptide, , -, -, glycopeptide
11. glycoproteins, , glycoproteins, -, -
11. glycosylated, , glycoproteins, -, -
12. glycopeptide, , -, -, glycopeptide sequences
12. sialylated, , -, -, sequences
2. di-sialylated, , -, -, glycopeptide
2. glycopeptide, , -, -, glycopeptide

Output(Part-Of) (sent_index, protein, site):

*Output_Site_Fusion* (sent_index, protein, sugar, site):