Title : Large-scale glycoproteome characterization enabled by AI-ETD
Abstract :
- Given the AI-ETD method is fast and easily automated, we reasoned the technique could provide analysis of the glycoproteome at a large-scale
- To test this hypothesis we extracted proteins from mouse brain lysate, digested them with trypsin, enriched for glycosylated peptides , and performed high-throughput LC–MS/MS analysis using AI-ETD scans triggered by the presence of oxonium ions in HCD scans
- In total, we identified 5662 unique N-glycopeptides (24,099 glycopeptide spectral matches) mapping to 1545 unique N-glycosites on 771 glycoproteins with 117 different glycan com positions, which were included in a database compiled from literature on previous mouse and rat brain glycosylation studies
- These data are the result of several steps of post-Byonic search filtering, which were performed because caveats still exist in automated glycopeptide identification—as evidenced by the current HUPO glycoproteomics initiative (https://hupo.org/ HPP-News/6272119)
- Note, we do not offer any fundamentally new approach to address such challenges here, but rather we present AI-ETD data for large-scale glycoproteomics using the tools that are currently available in the field
- See the Methods for discussion of the six post-Byonic search filtering steps we performed
- Following post-search filtering, no decoy peptides remained in the dataset
- All the data reported here comprise tryptic N-glycopeptides carrying only one glycan modification and have a DeltaMod score that indicates the correct glycosite has been identified within the confidence range suggested by Byonic
- With this extensive dataset in hand we next characterized several Figures of Merit
- First, we examined the percentage of cleaved bonds observed relative to total possible backbones bonds (for both peptide and glycan backbones, Fig. 1b)
- Here we achieve 89% median peptide backbone sequence coverage and 78% median glycan sequence coverage with AI-ETD, which significantly outperforms HCD (Supplementary Fig. 5a)
- Figure 1c presents the average distribution of explainable signal amongst fragment ion types in AI-ETD spectra
- On average AI-ETD produces relatively equal proportions of signal in Y-type and peptide backbone fragments (41% and 45%, respectively), compared to HCD which has less signal in peptide backbone fragments and more in Y-type and oxonium ions (Supplementary Fig. 5b)
- This is congruent with observations presented in Supplementary Fig. 2 (discussed above)
- Approximately 29% and 46% of total ion current could be explained on average in AI-ETD and HCD spectra, respectively, but we only considered the following fragment ion classes: (1) unmodified peptide backbone fragments (i.e., b/y/c/z-type), (2) peptide backbone fragments with intact glycan still attached, (3) peptide backbone fragments with only a HexNAc moiety still attached, (4) Y-type ions (intact peptide plus glycan fragments ), and (5) oxonium ions/glycan B-type ions
- It is possible that photoactivation generated some degree of glycan fragmentation on peptide backbone fragments , which could provide more explainable signal in AI-ETD spectra, and this will be the subject of future work
- Even so, 71% of AI-ETD spectra (compared to <4% of HCD spectra) contained fragments with the intact glycan species retained on peptide backbone fragments
- This percentage would likely increase (especially for AI-ETD) by extending the m/z range of MS/MS scans above 2000 Th
- Note, intact glycopeptides have considerably larger precursor m/z distributions (Supplementary Fig. 6), as compared to unmodified peptides , and these low-charge density precursors (z ≤ 3) can be a challenge to dissociate
- Even so, AI-ETD provided robust fragmentation that often facilitated identification of these challenging glycopeptides (Fig. 1d)
- AI-ETD spectra provided evidence for 4680 unique N-glycopeptides (83%) and 1361 (88%) of the glycosites reported in this study, with the remaining identifications/glycosites supported only by HCD spectral evidence
- In all , these data represent the one of the most in-depth N-glycoproteome characterization to date to rely on intact glycopeptide identifications (Fig. 1e)
- Importantly, this method is amenable to in vivo sources, as demonstrated here, for applicability to practically any mammalian system (workflow in Supplementary Fig. 7)
- The ability to profile glycosites with intact glycan modifications at this scale provides opportunities to investigate system-wide glycosylation patterns
- Despite differences in enrichment strategies and fragmentation methods, the overlap in identified glycosites is relatively high between this study and two other recent intact glycopeptide studies in mouse brain (Fig. 2a) and deglycoproteomic datasets (Supplementary Fig. 8)
- Supplementary Figure 9 compares one example of overlapping glycosites between this study and the Liu et al. dataset, demonstrating that similar glycosites and glycan heterogeneity were identified on integrin alpha-1 in mouse brain, for example, and that AI-ETD methods further add to the number of glycosites and glycosite-glycan combinations observed
- Figure 2b demonstrates that ~69% of the glycosites identified in this study are annotated as glycosites in the UniProt database
- Of the 1065 UniProt-annotated glycosites , the majority of them were assigned via ‘sequence analysis,’ which indicates a prediction based on presence of the N-X-S/T sequon rather than by experimental observation
- In total, we provide experimental evidence for over 850 UniProt-predicted glycosites , in addition to identifying nearly 200 previously observed glycosites
- Expected N-X-S and N-X-T sequons were observed in our identified glycosites (Fig. 2c), with ~59% of the glycosites having the N-X-T sequon
- Figure 2d displays the percentage of glycosites containing high mannose glycans, fucosylated glycans, or sialylated glycans, which resembles previous studies (although Liu et al. observed significantly higher proportions of fucosylated glycopeptides )
- Note, this calculation did not consider glycosites exclusively, so one site can count toward multiple types if multiple glycans were identified at that site
- Gene ontology (GO) enrichments of functional category terms from identified glycoproteins are available in Supplementary Fig. 10, which shows expected enriched terms such as glycoprotein , several membrane terms, and extraceullar/secreted protein related terms
- See the Discussion section for more about differences in glycosylation profiles between this study and other published datasets, where we also discuss the implications of our lectin enrichment strategy compared to other strategies
Output (sent_index, trigger,
protein,
sugar,
site):
- 15. ions, , -, -, fragments
- 15. ions, , -, -, peptide
- 19. glycopeptides, , -, -, glycopeptides
- 2. glycosylated, , -, -, peptides
- 20. glycopeptides, , -, -, glycopeptides
- 21. N-glycopeptides, , -, -, N-glycopeptides
- 21. N-glycopeptides, , -, -, glycosites
- 21. glycosites, , -, -, glycosites
- 21. identifications/glycosites, , -, -, identifications/glycosites
- 22. glycopeptide, , -, -, glycopeptide
- 24. glycosites, , -, -, glycosites
- 25. glycopeptide, , -, -, glycopeptide
- 25. glycosites, , -, -, glycosites
- 26. glycosites, , -, -, glycosites
- 27. glycosites, , -, -, glycosites
- 28. glycosites, , -, -, glycosites
- 29. glycosites, , -, -, glycosites
- 3. N-glycopeptides, , -, -, N-glycopeptides
- 3. N-glycosites, , -, -, N-glycosites
- 3. glycopeptide, , -, -, glycopeptide
- 3. glycoproteins, , glycoproteins, -, -
- 30. glycosites, , -, -, glycosites
- 31. fucosylated, , -, -, glycopeptides
- 31. glycopeptides, , -, -, glycopeptides
- 31. glycosites, , -, -, glycosites
- 32. glycosites, , -, -, glycosites
- 33. glycoprotein, , glycoprotein, -, -
- 33. glycoproteins, , glycoproteins, -, -
- 4. glycopeptide, , -, -, glycopeptide
- 8. N-glycopeptides, , -, -, N-glycopeptides
- 8. glycosite, , -, -, glycosite
Output(Part-Of) (sent_index,
protein,
site):
- 27. database, glycosites
- 3. glycoproteins, N-glycosites
- 3. glycoproteins, positions,
*Output_Site_Fusion* (sent_index,
protein,
sugar,
site):