PMCID: PMC6243375

 

    Legend: Gene, Sites, Suger

Section : Extraction of site‐specific O‐linked glycopeptides

Content :
  1. In EXoO, proteins are first digested to generate peptides, which are then conjugated to a solid support
  2. After washing, the O‐linked glycopeptides are enzymatically released from the support using an endo‐protease OpeRATOR that requires the presence of O‐linked glycans to specifically cleave on the N‐terminal side of O‐linked glycan‐occupied Ser or Thr (Fig 1A)
  3. To demonstrate proof of principle, bovine fetuin was analyzed and the six known O‐linked glycosylation sites documented in the UniProt database were pinpointed at Ser‐271, Thr‐280 , Ser‐282, Ser‐296, Thr‐334, and Ser‐341 (Dataset EV1)
  4. In addition, a new O‐linked glycosylation site at Ser‐290 was also identified (Dataset EV1 and Appendix Fig S1)
  5. Of note, O‐linked glycans were still attached to the site‐specific O‐linked glycopeptides as confirmed by the detection of oxonium, peptide (Y0), and less commonly identified peptide + HexNAc (Y1) ions in the MS/MS spectrum (Fig 1B)
  6. The detection of oxonium ions in the MS/MS spectrum is particularly useful for obtaining the correct identification of O‐linked glycopeptides
  7. In addition, the chemical conjugation of peptides to a solid support allows efficient washing and specific enzymatic release of intact O‐linked glycopeptides
  8. As a result, 193 peptide spectrum matches (PSMs) were assigned to fetuin site‐specific O‐linked glycopeptides with Ser or Thr at the N‐termini of peptides , glycan modification, and oxonium ions in the MS/MS spectra from a total of 270 assigned PSMs, indicating a specificity of approximately 71.5% for O‐linked glycopeptide enrichment using EXoO (Dataset EV1)
  9. The analysis of fetuin demonstrated the ability of EXoO to enrich and identify O‐linked glycopeptides at specific O‐linked glycosylation sites and their corresponding O‐linked glycans
*Output_Site_Fusion* (sent_index, protein, sugar, site):
Section : Large‐scale and precision mapping of the O‐linked glycoproteome in human kidney tissues, T cells, and serum sample

Content :
  1. EXoO was benchmarked using human kidney tissue, T cells, and serum to determine performance of the method in samples with differing levels of protein complexity
  2. To do this, O‐linked glycopeptides were extracted using EXoO and fractionated into 24 fractions and then subjected to LC‐MS/MS analysis (Fig 2A)
  3. To study kidney tissue, paired tumor and normal tissues were collected from three patients
  4. The extracted proteins from these tissues were separately pooled to generate two samples, that is, tumor and normal
  5. After analysis with 1% false discovery rate (FDR) at PSM level, 35,848 PSMs were assigned to 2,804 O‐linked glycopeptides containing 1,781 O‐linked glycosylation sites from 592 glycoproteins (Dataset EV2)
  6. A number of 112 spectra with different sequences , charge, peptide length, scores, and glycan com positions were annotated (Appendix Fig S2)
  7. When the EXoO approach was applied to the analysis of T cells, 4,623 PSMs were assigned to contain 1,295 O‐linked glycosylation sites from 1,982 O‐linked glycopeptides and 590 glycoproteins (Dataset EV3)
  8. Finally, we studied human serum that contains a number of highly glycosylated proteins and has been previously subjected to detailed mapping of N‐linked glycosylation sites and N‐linked glycans but for which there has been little success in mapping of O‐linked glycosylation sites and O‐linked glycans (Zhang et al, 2005; Stumpo & Reinhold, 2010; Yabu et al, 2014; Darula et al, 2016; Hoffmann et al, 2016)
  9. With 1% FDR, 6,157 PSMs were assigned to 1,060 O‐linked glycopeptides with 732 O‐linked glycosylation sites from 306 glycoproteins being identified (Dataset EV4)
  10. This analysis of human tissue, T cells, and serum demonstrated that EXoO is a highly effective tool for accessing the O‐linked glycoproteome in different types of samples
*Output_Site_Fusion* (sent_index, protein, sugar, site):
Section : Specificities of OpeRATOR for peptides and O‐linked glycans

Content :
  1. To define cleavage specificities of peptides and O‐linked glycans of OpeRATOR in the complex samples, sequential ETD/HCDMS2 analysis was conducted on serum O‐linked glycopeptides generated using the EXoO method
  2. With 1% FDR, HCDMS2 and ETD‐MS2 identified 85 and 40 unique intact glycopeptides , respectively, with 27 glycopeptides identified in both modes (Dataset EV5)
  3. Next, the ambiguity of O‐linked glycosylation site at the first amino acid position on glycopeptides was determined in the PSMs generated by ETD‐MS2
  4. Among the 113 PSMs from ETD‐MS2 analysis, 105 PSMs were assigned to have core 1 O‐linked glycans conjugated at glycosylation sites localized at the first amino acid position of glycopeptides with ptmRS site probabilities of over 99% (Dataset EV5)
  5. O‐linked glycosylation sites with core 1 O‐linked glycans in seven PSMs could not be located by ETD spectra (Dataset EV5)
  6. One PSM was identified to have the O‐linked glycosylation site assigned at the sixth amino acid position on a serine (Dataset EV5)
  7. However, a second PSM of the same precursor was assigned to the first threonine residue (Dataset EV5)
  8. The observation of two PSMs of the same precursor with different site localization suggested that the site localization for this glycopeptide might not be confident for site assignment
  9. Therefore, ETD‐MS2 provided that a cleavage specificity by OpeRATOR was at the N‐termini of the O‐linked glycosylation sites with core 1 glycans
  10. HCDMS2 appeared to identify more unique glycopeptides compared to ETD‐MS2 that might be due to shorter glycopeptides with low charge states generated by trypsin and OpeRATOR digestion, while glycopeptides identified by ETD‐MS2 contained only +3 charge and above (Dataset EV5)
  11. Therefore, one of the advantages of EXoO method for O‐linked glycoproteomics analysis empowered efficient O‐linked glycosylation site localization by high cleavage specificity of OpeRATOR that was confirmed by ETD‐MS2
  12. The precise specificity of OpeRATOR for different O‐linked glycans remains unclear
  13. Analysis of our data from tissue, serum, and cells revealed that approximately 69% of total PSM contained glycan com position Hex(1)HexNAc(1) that was most likely to be core 1 mucin‐type glycan Gal‐GalNAc
  14. Therefore, it is possible to define that the O‐linked glycopeptide contained Hex(1)HexNAc(1) or most likely to be Gal‐GalNAc with or without sialic acid at the site of O‐linked glycosylation
  15. These data could also be explained as that the major glycan com position for site‐specific O‐linked glycopeptide is the core 1 structure Hex(1)HexNAc(1) that is prevalent in a wide range of glycoproteins from different cell types compared to the relatively restricted presence of other core structures seen in specific tissue and cell types (Brockhausen & Stanley, 2015)
  16. However, the fact that other glycoforms accounted for approximately 31% of total identified glycan com positions argues that further investigation is needed to definitively establish the glycoform specificity of OpeRATOR
  17. In addition, the possibility of multiple glycans on a glycopeptide demands caution in the data interpretation to define site‐specific glycan com position
  18. For example, two Hex(1)HexNAc(1) on a glycopeptide could yield a glycan com position of Hex(2)HexNAc(2) in the result
  19. EXoO may miss O‐linked glycosylation sites that are not in an appropriate peptide length for identification
  20. It can be anticipated that using enzymes other than trypsin for generating peptides will increase the identification of O‐linked glycosylation sites (Choudhary et al, 2003)
*Output_Site_Fusion* (sent_index, protein, sugar, site):
Section : Characterizing the O‐linked glycoproteome

Content :
  1. Our large‐scale analysis mapped 3,055 O‐linked glycosylation sites from 1,060 glycoproteins in kidney tissues, T cells, and serum (Dataset EV6)
  2. To compare the EXoO identified sites to that reported previously, 2,746 reported O‐GalNAc sites were collected from O‐GalNAc human SimpleCell glycoproteome DB (Steentoft et al, 2011, 2013), PhosphoSitePlus (Hornbeck et al, 2015), and UniProt database (UniProt Consortium, 2018)
  3. Remarkably, EXoO identified 2,580 novel O‐linked glycosylation sites , an approximately 94% increase in the known sites , which however are mapped primarily using engineered cell lines
  4. To determine sample‐specific O‐linked glycoproteome, the distribution of EXoO identified peptides in different samples was determined
  5. Kidney tissue and T cells had a large number of unique peptides compared to that seen for serum, with more than half of peptides detected in serum also being identified in the tissue sample, possibly due to the presence of serum in tissue samples (Fig 2B)
  6. To visualize the relative abundance of peptides in different samples, the PSM numbers of peptides, which are suggestive of relative abundance, were clustered by unsupervised hierarchical clustering (Fig 2C)
  7. This showed that not only that the peptides differed between samples but also that their relative abundances were markedly divergent between samples (Fig 2C)
  8. Interestingly, immunoglobulin heavy constant alpha 1 ( IGHA1 ) has the highest PSM number in the normal tissue and serum but had the second highest PSM number in the tumor tissue where versican core protein ( VCAN ) scored the highest PSM number suggesting their relatively high abundance for detection and aberrant O‐linked glycosylation of VCAN in tumor tissue
  9. In the case of IGHA1 , four of the five known sites on Ser residues and two new sites on Thr residues were mapped supportive of EXoO's capacity to both localize known and discover new O‐linked glycosylation sites
  10. Overall, these data suggest that protein O‐linked glycosylation is highly dynamic and may exhibit a disease‐specific signature
  11. To identify possible O‐linked glycosylation motifs , the amino acids (±7 amino acids) at and surrounding 3,042 of the sites mapped in this study were analyzed
  12. O‐linked glycan addition at Thr and Ser accounted for 67.6 and 22.4% of the sites , respectively (Fig 2D)
  13. Analysis of the surrounding sequence motifs revealed that Pro was overrepresented at the + 3 and −1 positions irrespective of which amino acid ( Thr or Ser ) was glycosylated or sample type (Fig 2D and Appendix Fig S2)
  14. Overall enrichment of Pro was observed in the amino acids surrounding O‐linked glycosylation sites (Appendix Fig S2)
  15. Thirteen O‐linked glycosylation sites were not used in the motif analysis because they were located close to the termini of proteins concerned and consequently did not have enough surrounding amino acids to allow for full motif analysis
  16. Gene ontology (GO) analysis of EXoO identified glycoproteins was carried out, and this showed that extracellular space, the cell surface, the ER lumen, and the Golgi membrane were the major cellular components for O‐linked glycoproteins (Fig 2E)
  17. Analysis of biological process and molecular function suggested various activities and functionalities associated with O‐linked glycoproteins , consistent with their important role in different aspects of biology (Appendix Fig S3)
  18. Specifically, extracellular matrix organization, cell adhesion, and platelet degranulation were the biological processes most represented in the glycoproteins identified (Appendix Fig S3), whereas heparin binding, calcium ion binding, and integrin binding were the top molecular functions identified (Appendix Fig S3)
  19. To overview the position al distribution of the O‐linked glycosylation sites identified, the relative position of the sites in the proteins was determined and arranged relative to the N‐terminus of the glycoprotein in question (Fig 2F lower panel)
  20. In addition, frequency of the sites at the relative position of proteins was calculated (Fig 2F upper panel)
  21. It was found that the sites had relatively even distribution across the protein sequence but less frequent at protein termini (Fig 2F upper panel)
  22. Strikingly, 20 proteins were seen to contain over 20 sites
  23. Five proteins with the highest number of sites were zoomed for clear visualization in Fig 2F middle panel
  24. These heavily glycosylated proteins appeared to show continuous clusters of many vincinal sites that nearly cover the whole proteins such as VCAN , mucin‐1 ( MUC1 ), and aggrecan core protein ( ACAN )
  25. The cluster of sites could be relatively short while distributed evenly as seen in apolipoprotein ( LPA ) and Tenascin‐X ( TNXB )
  26. Among these heavily O‐linked glycoproteins , VCAN contained the highest number of sites reaching 165 sites with distinct peptide sequences surrounding the sites, whereas MUC1 contained 161 sites , the second highest, but composited from only six distinct sequence repeats
  27. ACAN , LPA , and TNXB were heavily O‐linked glycosylated to have 82, 73, and 44 sites, respectively
  28. Analysis of the site distribution on glycoproteins demonstrated advantage of EXoO to study heavily O‐linked glycoproteins that is difficult to be analyzed by current analytical approach due to structural complexity and resistance to enzymatic digestion
  29. To determine localization of the sites to protein structures, protein topological and structural annotations were retrieved from UniProt database and mapped to the EXoO identified sites
  30. It was found that approximately 28.3 and 10.3% of the sites were predicted to localize in extracellular and luminal region , respectively (Appendix Fig S4)
  31. In contrast, only approximately 1.6% of the sites were predicted in cytoplasmic compartment (Appendix Fig S4)
  32. Approximately 5% of the sites were associated with Ser/Thr/Pro‐rich region but weaker correlation of the sites to other protein structures including repeats, coiled‐coil, beta strand, helix, turn, and signal peptides (Appendix Fig S4)
  33. Close to none correlation of the sites to intra‐ and transmembrane region of proteins was observed
  34. The structural correlation of the sites to extracellular, lumen, and Ser/Thr/Pro‐rich regions coincided with the location of O‐linked glycoproteins to present on extracellular space, the cell surface, the ER, and the Golgi lumen for various functionality
*Output_Site_Fusion* (sent_index, protein, sugar, site):
Section : Mapping aberrant O‐linked glycoproteome associated with human kidney tumor

Content :
  1. To identify changes in the O‐linked glycoproteome between normal and tumor kidney tissue, spectral counting label‐free quantification of the EXoO identified peptides was used (Fig 3A)
  2. This identified 56 O‐linked gl glycoproteins as exhibiting significant change using scoring criteria of at least a twofold change together with a difference in at least 10 PSMs between normal and tumor samples (Fig 3B and Appendix Table S1)
  3. The most striking change observed was the dramatic increase in O‐linked glycans, primarily in the core 1 structure Hex(1)HexNAc(1) across the 163 and 82 sites mapped in VCAN and ACAN , respectively (Fig 3C)
  4. For example, 35 PSMs were found at Thr‐2983 of VCAN from tumor tissue but none in normal tissue samples (Fig 3C red asterisk in upper panel)
  5. Similarly, 109 PSMs were detected at Thr‐374 of ACAN from tumor tissue and only five in normal tissue (Fig 3C red asterisk in lower panel)
  6. Owing to unclear substrate specificity of OpeRATOR for O‐linked glycans, the site‐specific O‐linked glycosylation by glycans in addition to Core 1 glycans merits future investigation
  7. VCAN and ACAN are known proteoglycans that have long sugar chains in normal condition (Binder et al, 2017)
  8. The extensive addition of short core 1 O‐linked glycans to these two proteins would be expected to enhance their mucin‐type properties such as resistance to enzymatic digestion and improved stiffness, and this in turn may alter their biological and biomechanical properties producing some remodeling of the tumor microenvironment (Kufe, 2009)
  9. In addition to VCAN and ACAN , an average of 4.3‐fold increase was detected in 14 sites across fibulin‐2 ( FBLN2 ), a glycoprotein known to be involved in stabilizing the VCAN and ACAN network for growth and metastasis of tumor (Olin et al, 2001; Baird et al, 2013; Fig 3D and Appendix Table S1)
  10. Remodeling of the extracellular matrix ( ECM ) in tumor tissue might be further underpinned by the significant changes in other ECM ‐related proteins such as ELN , LTBP1 , LTBP2 , LTBP3 , EMILIN2 , FN1 , CDH16 , and EMCN and collagens COL8A1 , COL12A1 , COL28A1 , and COL26A1 , and enzymes including ITIH2 , MMP14 , ADAMTS7 , SERPINE1 , ANPEP , PAPLN , ADAMTSL5 , GGT5 , and CPXM1 detected in tumor tissue (Fig 3D and Appendix Table S1)
  11. This type of fine‐tuning of the ECM network might be critical to supporting tumorigenesis and tumor progression (Lu et al, 2012)
  12. In addition, carbonic anhydrase 9 ( CA9 ) and angiopoietin‐related protein 4 ( ANGPTL4 ), proteins known to respond to tumor hypoxia (Sedlakova et al, 2014; Carbone et al, 2018), showed 13‐ and 3.6‐fold increases, respectively, in tumor tissue (Fig 3D and Appendix Table S1)
  13. Finally, EGF‐containing fibulin‐like extracellular matrix protein 1 ( EFEMP1 ), which binds to epidermal growth factor receptor ( EGFR ) to promote tumor growth, invasion, and metastasis (Yin et al, 2016), showed a threefold increase across seven O‐linked glycosylation sites in tumor tissue (Fig 3D and Appendix Table S1)
  14. By contrast, both IGHA1 and MUC1 , known O‐linked glycoproteins , showed no detectable change between normal and tumor tissue indicating that the changes in O‐linked glycoproteins observed in tumor tissue are highly selective
*Output_Site_Fusion* (sent_index, protein, sugar, site):
Section : EXoO procedure for mapping the site‐specific O‐linked glycoproteome Schematic of EXoO process for precision mapping of O‐linked glycosylation sites and site‐specific glycans

Content :
  1. MS/MS spectrum of the site‐specific O‐linked glycopeptide at Ser‐296 in bovine fetuin
*Output_Site_Fusion* (sent_index, protein, sugar, site):
Section : Mapping of O‐linked glycoproteome Workflow to map site‐specific O‐linked glycoproteome in human samples

Content :
  1. Distribution of EXoO identified peptides in different samples
  2. Unsupervised hierarchical clustering of samples and PSM number of peptides to show that EXoO identified peptides exhibited different distribution and relative abundance in samples
  3. Euclidean distance and city block distance were used for clustering of samples and peptides, respectively
  4. Analysis of amino acid sequence surrounding O‐linked glycosylation sites
  5. Cellular component analysis of O‐linked glycoproteins
  6. Landscape distribution of the O‐linked glycosylation sites and its frequency in proteins
  7. Relative position was calculated using amino acid position of the sites to divide total number of amino acids of the protein and time a hundred to show in percentage
  8. Lower panel: Relative position of the sites in proteins
  9. Proteins with positions close to protein N‐termini were ranked to the top
  10. Middle panel: Relative position of sites in five proteins with the highest number of sites
  11. Protein with higher number of sites was ranked to the top
  12. Upper panel: Frequency of the sites in proteins
  13. The y‐axis of upper panel shows the number of sites at the relative position of proteins
*Output_Site_Fusion* (sent_index, protein, sugar, site):
Section : Comparative analysis of the O‐linked glycoproteome of normal and tumor‐derived kidney tissue Data processing steps in label‐free quantification to identify differential glycoproteins between tumor and normal kidney tissues

Content :
  1. Volcano plot of differentially expressed O‐linked glycoproteins between tumor and normal tissues
  2. A total of 592 glycoproteins are plotted in the Volcano plot according to their fold change (log2) and number of differential PSM between cancer and normal (log10)
  3. Red and green dots indicate significantly upregulated and downregulated proteins , respectively
  4. Extensive addition of O‐linked glycans at sites covering versican core protein ( VCAN ) (upper panel) and aggrecan core protein ( ACAN ) (lower panel)
  5. Fold change for specific site between tumor and normal was showed as connected dots in red
  6. Sites only detected in tumor or normal were assigned two‐ or 0.5‐fold change, respectively
  7. Red asterisks indicate sites with the highest PSM number divergent between tumor and normal tissues
  8. The 56 O‐linked glycoproteins with significant change between tumor and normal
*Output_Site_Fusion* (sent_index, protein, sugar, site):
Section : A novel tool (EXoO) has been developed for the combined mapping of O‐linked glycosylation sites in proteins and the definition of the O‐linked glycans at those sites

Content :
  1. The main advantages of EXoO are (i) applicability for analysis of clinical samples including tissue, body fluid, and primary cells; (ii) precise localization of O‐linked glycosylation sites ; (iii) simultaneous definition of O‐linked glycans at the glycosylation sites ; and (iv) no requirement for ETD mass spectrometry for site localization
  2. The effectiveness of the method derives from the specific enrichment of O‐linked glycopeptides at specific glycosylation sites using the tandem action of a solid support and the O‐linked glycan‐specific OpeR OpeRATOR enzyme
  3. The solid support specifically binds peptides and maximizes the removal of non‐bound molecules, while the OpeRATOR enzyme specifically cleaves on the N‐terminal side of glycan‐occupied Ser and Thr residues of the bound peptides to release O‐linked glycosylation sites at the N‐terminus of peptides enabling localization of O‐linked glycosylation sites
  4. The O‐linked glycan remains attached to the released O‐linked glycopeptides and provides oxonium ions in the MS/MS spectrum to facilitate confident identification
  5. As stated by the manufacturer, the use of sialidase in the procedure facilitated efficient cleavage by OpeRATOR that was used to improve mapping of O‐linked glycosylation sites
  6. The addition of sialidase could be omitted if the study focus is to define site‐specific glycan structures with sialic acid
  7. Analysis of the more than 3,000 O‐linked glycosylation sites identified by EXoO revealed many glycoproteins that were previously not known to be modified by O‐linked glycosylation
  8. Many of those identified were mucin‐type glycoproteins whose mucin domain contains clusters of dense O‐linked glycans that protect the underlying peptide backbone from normal proteolytic digestion, and consequently, typical proteomic analysis would not contain detailed information on many of these domains
  9. By contrast, OpeRATOR is naturally designed to dissect such mucin‐type O‐glycan‐rich regions
  10. Therefore, using EXoO allowed detailed mapping of over one hundred sites on VCAN and MUC1
  11. MUC1 has been reported to be an important molecule in many research areas including different cancers, immunity, and immunotherapy (Hanisch, 2005; Tarp & Clausen, 2008; Beatson et al, 2016; Hanson & Hollingsworth, 2016)
  12. The use of EXoO therefore is advantageous to reveal new biological insight regarding mucin‐type glycoproteins
  13. Motif analysis of the amino acid sequence surrounding these O‐linked glycosylation sites revealed that Pro was favored at the + 3 and −1 positions
  14. This was consistent with previous reports, which gives some validation of the O‐linked glycosylation sites identified using EXoO (Christlet & Veluraja, 2001; Julenius et al, 2005)
  15. However, achieving a better understanding the structural and functional roles of the O‐linked glycans in these proteins certainly merits future investigation
  16. Compared to other O‐linked glycoproteomic methods (Nilsson et al, 2009; Steentoft et al, 2011; Woo et al, 2015; Darula et al, 2016; Hoffmann et al, 2016; King et al, 2017; Qin et al, 2017), EXoO identified a large number of O‐linked glycosylation sites and glycoproteins with 2,580 novel O‐linked glycosylation sites that are not reported in three major database including O‐GalNAc human SimpleCell glycoproteome DB (Steentoft et al, 2011, 2013), PhosphoSitePlus (Hornbeck et al, 2015), and UniProt database (UniProt Consortium T, 2018)
  17. It also identified aberrant expression of O‐linked glycoproteins in kidney tumor tissue compared to normal tissue pointing to its utility in clinical investigations
  18. Given these advantages of EXoO, it is anticipated that it will be widely applied in studies to analyze O‐linked glycosylation of proteins
*Output_Site_Fusion* (sent_index, protein, sugar, site):

 

 

Protein NCBI ID SENTENCE INDEX