PMID: PMC6428843-1-3

 

    Legend: Gene, Sites

Title : Visualizing glycoproteome heterogeneity

Abstract :
  1. Intact glycopeptide analysis uniquely enables characterization of site-specific microheterogeneity, and our large-scale dataset can provide an initial glimpse at this fascinating facet of glycosylation
  2. Trinidad et al. explored heterogeneity to some degree but ultimately provided a limited overview from a global perspective
  3. Others have explored several facets discussed herein to some degree, including subcellular glycosylation profiles and glycosylation based on glycosite accessibility/structural motifs
  4. Even so, we sought to approach these questions from a systems level using our large pool of intact glycopeptide identifications, and we developed several ways to visualize such data
  5. Figure 2e captures the prevalence of both singly- or multiply-glycosylated proteins (right) and the degree of glycan microheterogeneity for each of the 1545 characterized glycosites (left)
  6. More than half of the 771 identified glycoproteins were observed with only one glycosite , but nearly 60% of glycosites have more than one glycan that modify them
  7. A glycoprotein-glycan network diagram in Fig. 2f maps which glycans (outer nodes) were observed on identified glycoproteins (inner column, organized by number of glycosites )
  8. Several discernable patterns appear, perhaps most notably the prevalence of high mannose glycosylation
  9. The network diagram also indicates that the majority of fucosylated, paucimannose, and sialylated glycans occur on proteins with multiple glycosylation sites , and it indicates which glycans contribute more to heterogeneity
  10. Supplementary Figure 11 provides a larger version of this network diagram with glycan identities in Supplementary Table 1
  11. To further investigate site-specific microheterogeneity, we calculated how many times glycan pairs co-occurred at the same site , as shown in the glycan co-occurrence heat map in Fig. 2g (larger version in Supplementary Fig. 12, glycan identities in Supplementary Table 2)
  12. This data shows glycan pair combinations, i.e., glycans that appeared together at the same glycosite , and the darker color indicates more incidences of co-occurrence
  13. High mannose glycans appear to co-occur together with high frequency, and they also co-occur with several groups of complex/hybrid, fucosylated, and sialylated glycans
  14. Furthermore, numerous other co-occurrence patterns are observed, including co-occurrence of certain complex/hybrid and fucosylated glycans, different fucosylated glycans, and some specific fucosylated and sialylated glycans
  15. We also generated glycan co-occurrence networks to display the frequency of co-occurrence of specific glycans with all other glycans across glycosites
  16. Figure 3 shows an example of a co-occurrence network for a biantennary sialylated complex glycan and Supplementary Fig. 13 displays how co-occurrence networks facilitate visualization of co-occurrence for both a specific glycan, HexNAc(2)Hex(9) , and an entire class of glycans, such as all high mannose glycans
  17. Glycan identities are given in Supplementary Table 3
  18. In yet another approach, arc plots in Supplementary Figs. 14–17 visualize glycan microheterogeneity delineated by the number of glycans per glycosite , showing increases in co-occurrence complexity as the number of glycans per site rises
  19. We note that calculating mass differences between co-occurring glycans is straightforward and can provide some information about glycan microheterogeneity (Supplementary Fig. 18), but the limited resolution in information about glycan differences and lack of dimensionality in this analysis inspired us to generate the other analyses and visualization discussed herein
  20. Glyco proteins with a high degree of glycan heterogeneity are readily observed by plotting the number of unique glycans versus total number of glycosites for a protein (Fig. 4)
  21. Several interesting cases where the number of glycans is significantly higher than the total number of glycosites are highlighted
  22. The distribution of glycan types is provided, showing that the types of glycans that contribute to this heterogeneity can vary based on glycoprotein
  23. Investigations into specific glycoprotein examples also indicate that glycan microheterogeneity can manifest in several different forms (Fig. 5 and Supplementary Figs. 19–21)
  24. Some proteins can have several glycosites but relatively little glycan heterogeneity overall (e.g., protein sidekick-2 , Fig. 5a), while others can have one glycosite with a multitude of glycans modifying it (e.g., SPARC , Fig. 5b)
  25. Or, in the case of sodium/potassium ATPase β2 2 subunit ( Atp1b2 ), some glycosites on a protein can show notably little heterogeneity while others have 15–20 different glycans modifying them (Fig. 5c)
  26. Interestingly, Atp1b2 glycosites with lower glycan heterogeneity ( N96, N156, N193, N197, and N250 ) are on one face of the protein while the sites with relatively high (>10 glycans) heterogeneity ( N118, N153, and N238 ) are on the opposite face , where the protein interacts with alpha subunits
  27. Moreover, sites N118 and N238 have been shown to be important in mammalians systems for folding and localization of the Na/K-ATPase complex to the plasma membrane, where it creates concentration gradients important for a variety of cell physiological functions
  28. The tilt of the transmembrane helix of the β2 subunit , which is close to N118, N153, and N238 glycosites , mediates functional differences in the Na/K-ATPase complex, suggesting that various glycans at these sites also have the potential to alter function via conformational changes
  29. The β1 subunit of Na/K- ATPase is also glycosylated (all three known sites are also characterized in this dataset), but the importance of specific glycosites is less pronounced in β1 versus β2 subunits , highlighting the differential roles glycosylation heterogeneity can play even within isoforms of the same complex
  30. Next we examined glycosylation profiles of glycoproteins in different cellular components (CC) (Fig. 6)
  31. Each of the twelve different subcellular groups in Fig. 6a has edges connecting to glycan nodes that are arranged in a circle based on glycan type
  32. A striking feature of this analysis is the increased level of glycan diversity at glycosites in the plasma membrane, other membranes, and extracellular proteins , where glycosites have noticeably more sialylated glycans
  33. Other interesting trends arise, such as the presence of a relatively high occurrence of mannose-6-phosphate (M6P) in lysosomal proteins
  34. Note, this is expected because of the role of M6P in trafficking proteins to the lysosome for degradation, and this data serves as an internal control to support our approach of analyzing glycosylation profiles
  35. Some trends match those reported by Medzihradszky et al. for cellular compartment in their glycoproteomic comparison of mouse brain and liver glycosites , including high mannose glycans in secreted and ER glycoproteins
  36. To compare glycosylation profiles we calculated a Euclidean distance between each subcellular group (Fig. 6b), where darker color indicates more similar (i.e., closer) glycosylation profiles
  37. The most similar subcellular groups were plasma membrane/membrane, and also synapse/plasma membrane/membrane groups
  38. Other closely related groups included vesicle glycoproteins and other cell surface-related subcellular locations (groups 1–7), while lysosomal glycoproteins were most closely related vesicles but not many other groups
  39. Among other patterns, secreted proteins were most similar to Golgi and ER groups, and ER glycoproteins had the shortest Euclidean distance to Golgi glycoproteins
  40. That said, the Golgi was related to more groups beyond just ER glycoproteins , and the none listed group had the most similarity to Golgi glycoproteins
  41. This is perhaps unsurprising, as the Golgi is central to most glycosylation processing for proteins trafficked to the cell surface while ER glycosylation pathways are often followed by further processing in the Golgi
  42. One shortcoming of this approach of analyzing subcellular localization is the presence of several GO CC terms for a single UniProt entry
  43. Figure 6c displays how many proteins mapped to a given number of subcellular groups based on their GO CC terms, and Supplementary Fig. 22 shows the proportion of proteins in each subcellular group that mapped to other groups
  44. A more robust analysis would require subcellular fractionation during sample preparation and/or the use of proximity labeling strategies to investigate the glycoproteomes of each cellular component individually
  45. These strategies present a challenge because of low amounts of starting material for subsequent steps, but coupling intact glycopeptide characterization to these subcellular location methods will be a worthwhile endeavor in future experiments to gain a more refined understanding of glycoproteome organization
  46. Note, Thaysen-Andersen and co-workers have performed such subcellular fractionation analyses with some success using a combination of glycomic and proteomic approaches
  47. Finally, we conducted analysis of protein domains and their characteristic glycosylation profiles (Fig. 7)
  48. Glycosites were mapped to protein domains to which they belong using information available in UniProt
  49. In total 745 of the 1545 glycosites could be mapped to domains
  50. The top bar graph (dark blue) shows the number of glycosites mapping to a given domain type, with the heat map above it (orange) showing the percent of a given domain that was seen as glycosylated relative to the total number of domains present in the mouse proteome
  51. The gray bar graph compares glycan heterogeneity ratios for domains compared to the ratio for all 1545 glycosites
  52. The ratio is number of glycan-glycosite combinations (i.e., a glycosite site with three different glycans would count as a three) compared to the number of glycosites
  53. Thus, a higher ratio indicates a larger amount of glycan heterogeneity for glycosites in that domain
  54. The heat map on the bottom indicates differences in glycan types observed at sites within a domain type compared to the distribution of glycan types seen in all 1545 sites
  55. Here, a difference of zero shows that the proportion of glycosites containing a given glycan type is equivalent to the overall proportion observed for all glycosites , whereas positive or negative values indicate a higher or lower proportional contribution, respectively, of a given glycan type at the glycosites mapped to that domain
  56. Note, only domains with seven or greater glycosites are delineated here, with all other domains grouped into the other domains category, which shows little difference in glycan type expression from the total number of glycosites
  57. Of the 745 sites that could be mapped to a domain, 197 of them existed within an immunoglobulin (Ig)-like domain , where glycosites had a slightly higher proportion of fucosylated and sialylated glycans relative to all glycosites
  58. Peptidase , EGF-like, and Ig-like domains tended to have glycosites that contained more diverse glycan types, while glycosites observed in Sushi, CUB , Laminin , Cadherin , and Sema domains had lower glycan heterogeneity and contained a high proportion of high mannose glycans
  59. Glycosites in fibronectin and EGF-domains harbored proportionally high amounts of fucosylated glycans whereas the majority of other domain types did not
  60. Interestingly, glycosites in peptidase domains have a relatively higher contribution from M6P, which correlates with the known lysosomal targeting role of that glycan
  61. Lee et al. suggested previously that differences in glycosylation profiles can be explained by differential solvent accessibility of glycosites (which they link to differences in subcellular glycosylation profiles)
  62. This presents an intriguing future avenue to explore for domain-specific glycotypes, although the integration of proteomic and glycomic data (as Lee et al. performed) with intact glycopeptide analysis (as is provided here) is likely needed for such an investigation
  63. For the majority of domains, only a fraction of the total known number of domains across the entire mouse proteome were detected to contain glycosites in this study (~6% or less of any given domain ), except for Sema domains where 25 glycosites were observed from only 99 known Sema domains
  64. The Sema domain exists in a large family of secreted and transmembrane proteins called semaphorins which can function in axon guidance, and the majority of glycosites in this dataset that are in Sema domains contained high mannose glycans
  65. investigating glycosites in various protein domains in this study is limited by the information available in UniProt, but it represents an intriguing perspective for future large-scale glycoproteomic studies
Output (sent_index, trigger, protein, sugar, site):
  • 1. glycopeptide, , -, -, glycopeptide
  • 12. glycosite, , -, -, glycosite
  • 15. glycosites, , -, -, glycosites
  • 18. glycosite, , -, -, glycosite
  • 20. glycosites, , -, -, glycosites
  • 21. glycosites, , -, -, glycosites
  • 22. glycoprotein, , glycoprotein, -, -
  • 23. glycoprotein, , glycoprotein, -, -
  • 24. glycosite, , -, -, glycosite
  • 24. glycosites, , -, -, glycosites
  • 25. glycosites, , -, -, glycosites
  • 26. glycosites, , -, -, glycosites
  • 26. heterogeneity, , -, -, N118, N153, and N238
  • 26. heterogeneity, , -, -, N96, N156, N193, N197, and N250
  • 28. glycosites, , -, -, N118, N153, and N238 glycosites
  • 29. glycosites, , -, -, glycosites
  • 29. glycosylated, , β1, -, -
  • 3. glycosite, , -, -, glycosite
  • 30. glycoproteins, , glycoproteins, -, -
  • 30. glycosylation, , glycoproteins, -, -
  • 32. glycosites, , -, -, glycosites
  • 35. glycoproteins, , glycoproteins, -, -
  • 35. glycosites, , -, -, glycosites
  • 38. glycoproteins, , glycoproteins, -, -
  • 39. glycoproteins, , glycoproteins, -, -
  • 4. glycopeptide, , -, -, glycopeptide
  • 40. glycoproteins, , glycoproteins, -, -
  • 45. glycopeptide, , -, -, glycopeptide
  • 49. glycosites, , -, -, glycosites
  • 5. glycosites, , -, -, glycosites
  • 5. multiply-glycosylated, , proteins, -, -
  • 50. glycosites, , -, -, glycosites
  • 51. glycosites, , -, -, glycosites
  • 52. glycan-glycosite, , -, -, glycan-glycosite
  • 52. glycosite, , -, -, glycosite site
  • 52. glycosites, , -, -, glycosites
  • 52. number, , -, -, glycosite site
  • 53. glycosites, , -, -, glycosites
  • 55. glycosites, , -, -, glycosites
  • 56. glycosites, , -, -, glycosites
  • 57. glycosites, , -, -, glycosites
  • 58. glycosites, , -, -, glycosites
  • 6. glycoproteins, , glycoproteins, -, -
  • 6. glycosite, , -, -, glycosite
  • 6. glycosites, , -, -, glycosites
  • 60. glycosites, , -, -, glycosites
  • 61. glycosites, , -, -, glycosites
  • 62. glycopeptide, , -, -, glycopeptide
  • 63. contain, , -, -, domain
  • 63. glycosites, , -, -, glycosites
  • 64. glycosites, , -, -, glycosites
  • 65. glycosites, , -, -, glycosites
  • 7. glycoproteins, , glycoproteins, -, -
  • 7. glycosites, , -, -, glycosites
  • 9. glycosylation, , -, -, sites
Output(Part-Of) (sent_index, protein, site):
  • 25. protein can, glycosites
  • 26. Interestingly, Atp1b2, glycosites
  • 28. β2, N118, N153, and N238 glycosites
  • 32. proteins, glycosites
  • 56. -, glycosites
  • 58. CUB, glycosites
  • 58. Cadherin, glycosites
  • 58. Laminin, glycosites
  • 9. proteins, sites
*Output_Site_Fusion* (sent_index, protein, sugar, site):
  • 26. -, -, N118, N153, and N238
  • 26. -, -, N96, N156, N193, N197, and N250
  • 28. β2, -, N118, N153, and N238 glycosites

 

 

Protein NCBI ID SENTENCE INDEX