Glycosylation of the viral attachment protein of avian coronavirus is essential for host cell and receptor binding

Avian coronaviruses, including infectious bronchitis virus (IBV), are important respiratory pathogens of poultry. The heavily glycosylated IBV spike protein is responsible for binding to host tissues. Glycosylation sites in the spike protein are highly conserved across viral genotypes, suggesting an important role for this modification in the virus life cycle. Here, we analyzed the N-glycosylation of the receptor-binding domain (RBD) of IBV strain M41 spike protein and assessed the role of this modification in host receptor binding. Ten single Asn–to–Ala substitutions at the predicted N-glycosylation sites of the M41–RBD were evaluated along with two control Val–to–Ala substitutions. CD analysis revealed that the secondary structure of all variants was retained compared with the unmodified M41–RBD construct. Six of the 10 glycosylation variants lost binding to chicken trachea tissue and an ELISA-presented α2,3-linked sialic acid oligosaccharide ligand. LC/MSE glycomics analysis revealed that glycosylation sites have specific proportions of N-glycan subtypes. Overall, the glycosylation patterns of most variant RBDs were highly similar to those of the unmodified M41–RBD construct. In silico docking experiments with the recently published cryo-EM structure of the M41 IBV spike protein and our glycosylation results revealed a potential ligand receptor site that is ringed by four glycosylation sites that dramatically impact ligand binding. Combined with the results of previous array studies, the glycosylation and mutational analyses presented here suggest a unique glycosylation-dependent binding modality for the M41 spike protein.

Avian coronaviruses of poultry cause significant disease with subsequent economic losses in several commercially farmed bird species. Avian infectious bronchitis virus (IBV) 2 is a gam-macoronavirus that predominantly affects domestic fowl, primarily chickens (Gallus gallus). The virus initially infects upper airway epithelium tissues, and depending on the IBV strain, disease outcomes range from mild respiratory disease to kidney failure and death (1).
The viral envelope of IBV contains the highly-glycosylated spike (S) protein that is post-translationally cleaved into two domains, S1 and S2. This S glycoprotein is the major adhesion molecule of the virus. It is a class I viral fusion protein, in which the variable S1 domain is involved in host cell receptor binding, and the more conserved S2 domain mediates the fusion of the virion with the cellular membrane (2,3). The role of spike in host cell attachment and the induction of protective immunity has been reviewed (4). The spike protein monomer is a transmembrane glycoprotein with a molecular mass of 128 kDa before glycosylation (3). A cleavable N-terminal signal peptide (5) directs the S protein toward the endoplasmic reticulum (ER), where it is extensively modified with N-linked glycosylation (6,7). After glycosylation in the ER, the monomers oligomerize to form trimers (6 -9).
The N-terminal 253 amino acids of S1 were shown to encompass the receptor-binding domain (RBD) of IBV strain M41 (10), which interacts with sialyl-␣2,3-substituted glycans present on the host's cell surface (11,12). Ten N-linked glycosylation sites are predicted to exist on the M41-RBD (5), of which most are highly conserved (Fig. S1). It is interesting that 8 of the 10 sites are 95-100% conserved. Sites Asn-33 and Asn-59 were less conserved at 80 and 25%. However, each had a nearby alternative site that was also highly conserved. Alternative site Asn-36 was conserved 50% of the time, and one or both Asn-33 and Asn-36 was present in 94% of the sequences. Site Asn-57 was conserved at 73%. In 97% of the sequences, either Asn-59 or Asn-57 was present but never together. Therefore, all 10 sites, including the alternatives, likely serve important functions.
The N-glycosylation of viral glycoproteins is known to modulate the ability of viruses to infect host cells and to be recognized by the host's immune system (13). Recently, Zheng et al. (14) studied extracted spike proteins and mutant viruses with Asn-to-Asp (asparagine to aspartate) and Asn-to-Gln (aspara-gine to glutamine) mutations at 13 predicted glycosylation sites in the S protein of the Beaudette IBV strain (14). Their results indicate that glycosylation at some sites on the Beaudette S1-RBD was important for viral fusion and infectivity, which may include host recognition. However, the Beaudette strain is a cell culture-adapted strain, is nonvirulent in chickens (15), and does not bind chicken tissues known to be important for infectivity (11), making it difficult to extrapolate these results to clinically relevant IBVs.
To characterize and assess the role that glycosylation plays when interacting with host tissues through the RBD of pathogenic IBV strain M41, we used a combination of molecular and analytical techniques, including histochemistry, ELISA, circular dichroism (CD), MS, and docking analyses as listed in Table  1. Systematic deletion of each glycosylation site and histochemical analysis of each variant revealed which of the 10 glycosylation sites affect the binding of IBV S protein to host epithelial tissue. Site occupancy analysis by LC/MS E indicated that at least 9 of 10 predicted N-glycosylation sites in the M41-RBD domain are glycosylated. Analysis of site occupancy and signature N-glycan patterns at each site in combination with single glycosylation site deletions provided insight toward the biological relevance of each of those sites in binding to host tissue receptors. Overall, our data confirm that N-glycosylation plays a critical and likely unique role in binding of the IBV spike domain to its host tissue receptors.

Gel electrophoresis and CD analysis indicate that M41-RBD and glycosylation variants are similarly expressed, folded, and stable
To analyze the role of glycosylation of M41-RBD in receptor binding, missense mutants (Asn-to-Ala) were generated on a site-by-site basis at each of the predicted N-glycosylation sites. Recombinantly produced glycovariant RBD proteins migrated with the same electrophoretic mobility as unmodified M41-RBD (Fig. 1). The RBD proteins were evaluated by CD spectroscopy to assess similarity to the WT secondary structure. WT M41-RBD, all 10 glycosylation-site variants, and two nonglycosylation variants, V57A and V58A, were analyzed for secondary structure differences at 25°C. Thermal melts were performed on each construct from 25 to 95°C followed by full scans collected at 95°C and again at 25°C after the melt. Overlays of all the CD spectra can be found in Fig. S2. Visually, all spectra at all temperatures follow the same curve. The N85A spectra were generated at higher protein concentrations but aligned well to CD spectra of all other variants when normal-ized to the percent of maximum signal. Likewise, all the proteins had analogous broad melting curves suggesting the proteins were similarly stable. Protein folding was reversible for all proteins, with comparable recovery rates (see CD-25°C-aftermelt-normalized in Fig. S2). Dichroweb (16) was used to calculate the percent of ␣-helix, ␤-strand, turn, and unordered portions of the protein in the initial 25°C spectra to estimate secondary structure differences between the proteins (Fig. 2). The percent of ␣-helix varied with the extremes being unmodified RBD and N145A. N145A exhibited 19.5 Ϯ 0.3% ␣-helix character as compared with WT, which has 31.6 Ϯ 2.4%. Interestingly, N145A gave a very strong signal in the histochemical assay ( Fig. 3A) and had the most notably different released glycans' signature compared with the other constructs. We conclude that all proteins maintained a very similar structure and therefore suggest that single N-glycosylation sites are by themselves not indispensable for protein folding or stability.

Six glycosylation variants abrogate binding to host tissue and sialic acid
Because we established that all variant M41-RBD proteins are folded, we investigated their abilities to bind tissue receptors. Recombinant proteins were incubated with chicken trachea tissue sections and examined by histochemical analysis. N145A, N219A, N229A, N246A, V57A, and V58A bound ciliated epithelial cells of the chicken trachea with similar staining intensity as the unmodified RBD with the most intense staining associated with the N145A construct (Fig. 3A). In contrast, binding of constructs N33A, N59A, N85A, N126A, N160A, and N194A to trachea tissue was not detectable. Removal of sialic acids by treatment of the trachea tissues with Arthrobacter ureafaciens neuraminidase (AUNA) abrogated binding of all constructs as shown in Fig. S4. These results demonstrate that glycosylation on the RBD affects binding to sialyl ligands on chicken trachea tissue.
The interaction of the variants with Neu5Ac(␣2-3)Gal (␤1-3)GlcNAc, a previously established ligand for M41 (11), was assayed by ELISA. N145A, N219A, N229A, and N246A variants were able to bind the ligand in a concentration-dependent manner (Fig. 3B) like unmodified RBD. Binding affinities of N33A, N59A, N85A, N126A, N160A, and N194A were significantly reduced compared with unmodified RBD and comparable with that of a negative control protein, the S1 of turkey coronavirus, with specificity for nonsialylated diLacNAc glycans (17). Fig. 3C shows the ELISA absorbance at the 75 nmol of ligand concentration for each construct. No significant difference was observed for variants N145A, N219A, N229A, and

Avian coronavirus glycosylation
N246A compared with unmodified RBD (shown in dark gray bars in Fig. 3C). All other variants (shown in light gray bars in Fig. 3C) demonstrated significantly lower affinity for the receptor, consistent with histochemistry and ligand titration plot results.

Overall glycosylation of nonbinding variants is similar to M41-RBD
Six of the 10 single glycosylation site variants lost the ability to bind ligand. To investigate whether global changes in glycosylation may have affected binding, we analyzed release glycans from each protein. Matrix-assisted laser desorption/ionizationtime of flight (MALDI-TOF) mass spectrometry (MS) analysis of enzymatically released and permethylated glycans allows for semi-quantitative analysis of glycan compositions. The method is particularly useful for samples containing sialylated glycans because they are stabilized by permethylation. The percent abundances of glycans identified in each sample are shown in Fig. 4.
The majority of the Asn-to-Ala variants, as well as the V57A and V58A control variants, had similar MALDI-TOF-MS permethylation profiles (Fig. 4). Over 100 glycan compositions were identified ranging from high-mannose glycans to large complex ones. Nearly half of the glycans contained at least one and up to three sialic acid molecules in all samples. The most intense glycoforms clustered in five groups with increasing amounts of complexity as reflected by the number of N-acetyl glucosamines (HexNAcs). These include high-mannose, complex, and hybrid forms as follows: I, Hex 5-9 HexNAc 2 (high mannose); II, NeuAc 0 -1 Hex 5-6 dHex 0 -1 HexNAc 3 (complex and hybrid); III, NeuAc 0 -2 Hex 5 dHex 1 HexNAc 4 (complex); IV, NeuAc 0 -1 Hex 6 dHex 1 HexNAc 5 (complex); and V, NeuAc 2 Hex 7 dHex 1 HexNAc 6 (complex). High-mannose glycans were less abundant in unmodified M41 than in variant RBDs. The N194A, N219A, and N229A variants contained diminished amounts of the group V high-mass complex glycans. The N145A variant was the most atypical with less defined clustering in the common clustering regions of the spectrum and higher abundances in spectral regions where compositions had less Hex and more HexNAc overall. For instance, cluster IV was shifted from glycans with 6 hexoses (NeuAc 0 -1 Hex6 dHex 1 HexNAc 5 ) to glycoforms with 3-4 hexoses (NeuAc 0 -1 Hex3-4 dHex 1 HexNAc 5 ). More abundance was observed in regions containing 6 HexNAc residues (NeuAc 0 -2 Hex 3-6 dHex 1 HexNAc 6 ). To better understand the difference between N145A and the other constructs, we calculated the monosaccharide percent mass and average mass for each construct. The average mass percent for glycans across all released glycan pools was Hex (45.8%), HexNAc (42.0%), dHex (5.0%), and NeuAc (7.2%). The N145A construct had the lowest amount of Hex (38.6%) and the highest amounts of HexNAc (46.0%) and NeuAc (9.8%). The former two were 2 S.D. or greater from the mean (see Table S2). This indicates that the N145A construct likely had shorter, more branched, and more highly-charged glycans on average than the other constructs. Two other variants had values more than 2 S.D. from the mean. N229A (normal binding) was most abundant in Hex (53.6%) and least abundant in HexNAc (37.5%) and dHex (3.8%), probably due to its higher high-mannose content. N246A (normal binding) had the lowest amount of NeuAc (3.6%). This is perhaps a reflection of the missing sugars in this variant because site Asn-246 in

Avian coronavirus glycosylation
other variants was populated with many sialylated glycoforms based on site-specific analysis (Table S1).

Glycosylation and site occupancy were similar between M41-RBD, N59A, and N145A
To assess the differences in glycosylation on a site-to-site basis, glycopeptide LC/MS analysis was carried out on unmodified M41 and two single glycosylation site variants, N59A and N145A, that represented a nonbinder and a binder of trachea tissue, respectively. M41-RBD had 10 predicted glycosylation sites, whereas the variant RBDs had nine each. N145A was also of specific interest due to the unique glycosylation pattern observed in its free glycan profile. As cleavage with trypsin alone resulted in glycopeptides with more than one glycosylation site, we also analyzed glycopeptides after an additional treatment with chymotrypsin, which resulted in one glycosite per peptide, the identification of more glycopeptides, and decreased ambiguity concerning glycosylation site assignment.
Although a protein may contain the sequence (NX(S/T)), where N-glycosylation is known to occur, it may not actually be glycosylated, or it may be glycosylated only part of the time. Potential glycosylation sites, their predicted glycosylation state, and their measured site occupancy are shown in Table 2. Of the 10 glycosites, all but Asn-246 were predicted to be glycosylated (occupied) based on NetNGlyc analysis (http://www.cbs. dtu.dk/services/NetNGlyc-1.0/). 3 Percent occupancy was analyzed by LC/MS; however, a poor signal was obtained for the Asn-219 site in M41 and N59A, and therefore, occupancies were not calculated. All other sites were estimated to be occu- Figure 3. Tissue-binding assay and ELISAs. Histochemical assays of recombinant unmodified M41-RBD and single Asn-to-Ala and Val-to-Ala glycosylation variants to trachea tissue (A) and ELISA-presented Neu5Ac␣2-3Gal␤1-3GlcNAc (B and C). B, concentration dependence of binding. C, absorbance for each protein at the 75-nmol concentration. Two-way ANOVA showed significantly less binding by variant N33A, N59A, N85A, N126A, N160A, and N194A RBD proteins compared with unmodified RBD (compare light gray bars (variant) to unmodified (black bar)). No significant (n.s.) difference was observed for variants with dark gray bars. Data points are averaged from three separate assays. ****, p Ͻ 0.0001.

Avian coronavirus glycosylation
pied at 89% or greater in M41 and N59A. The N145A variant exhibited site occupancy at all expected sites, including Asn-219, although signal intensity at that site was low. Two sites had much lower occupancy in N145A as compared with the other samples. Site Asn-126 dropped to 61% occupancy and site Asn-246 to 79% occupancy compared with nearly complete occupancy in the N59A and M41 proteins. Overall site occupancy was high for all sites. The difficulty in detecting some of the peptides, particularly Asn-219, may be due to hydrophobicity. Ionization is partially driven by hydrophobicity, and Asn-219 only had 20% hydrophobic character after the two digestions, which may, in part, explain its low detectability. By comparison, glycopeptides containing Asn-85, Asn-145, and Asn-160 were short and between 21 and 33% hydrophobicity, whereas glycopeptides containing other sites had predicted hydrophobicity ranging from 37 to 61% and tended to produce higher intensity spectra.
Glycoform relative abundances at each site are listed in Table  S1. Fig. 5 shows the location of each glycosylation site on the RBD of M41. Overall compositions at each site were similar in charge and size across the three constructs. A representative glycan is shown at each site based on peak intensity. The N145A construct had glycoforms like those identified by MALDI-TOF MS with more HexNAc and fewer Hex compared with M41 and N59A.
Fewer overall glycan compositions were detected on glycopeptides by LC/MS compared with the free glycans observed by MALDI-TOF MS (63 versus 100 compositions). This can be expected because the technology of instrumentation used and the physiochemical characteristics of permethylated glycans and glycopeptides differ significantly. The forms detected overlapped between the two analyses.

Docking results are dependent on glycosylation status of the M41-RBD protein
During our investigation, the first structure of the M41 spike protein was solved using electron microscopy (EM) (18). Mapping the glycosylation sites onto the structure did not lead to a clear understanding of how the mutations affect binding. Although EM structural resolution is limited, and the precise coordinates for the attached glycans are not known, an attempt was made to dock a series of potentially sialylated ligands to a glycan-stripped structure of the RBD and a structure that was populated with glycans based on our data. The glycan chosen for each site on the RBD was based on the predominant glycans identified at each site by LC/MS (see Fig. 5).
Seventeen oligosaccharide ligands were chosen based on a previous glycan array study of M41 (11) and ELISA data (this work). Both strong and weak binders were selected (Fig. 6). Each ligand was docked 20 times against both the sugarstripped and in silico glycosylated M41-RBD coordinates. There was no statistically significant difference between the docked binding energies of ligands that did and did not bind on the array. All oligosaccharide ligands, except for 1, 3, 9, 13, 15, and 17, docked seven or more times to one or more of the four sites on the M41 sugar-stripped structure with no clear pattern differentiating between them (Fig. 6). In the sugar-stripped structure, all binding occurred at sites A and B. Site A is under the galectin fold near site Asn-194, and site B encompasses Asn-85 and Asn-59. All three glycosylation sites are required for binding to trachea tissue. The docking pattern changed dramatically when glycans were modeled onto the structure. The most dramatic change was seen at site D where eight ligands bound seven or more times, whereas interactions at all other sites decreased. There were no binders at site A, only two at site

Avian coronavirus glycosylation
C (3 and 16) and three at site B (6, 9, and 17). All of the ligand oligosaccharides that docked at site D were sialylated, consistent with ligands identified by array and ELISA. No control ligand (1 and 2 uncharged; 3 and 4 KDN-charged) bound at site D. The interaction at site D involved both sugar-protein and sugar-sugar contacts, and in some docking runs, the interaction was completely sugar-sugar. Site D is in the center of a circle of glycosylation sites that showed altered binding profiles when mutated; N59A, N85A, and N160A lost the ability to bind, whereas N145A gave a very strong signal in the histochemical assay. Of note, no ligands docked in the site at the top of the galectin fold where many structural homologs of M41 are thought to bind sugars, such as the bovine coronavirus RBD (19). For comparison, we docked Neu5Ac(␣2-6)Gal(␤1-3)GlcNAc (␤-OMe) against the crystal structure of the bovine RBD. Twenty five of 25 times the glycan docked in the proposed binding site at the top of the galectin fold in the negatively-charged area of the bovine RBD control near Asn-198 (Fig. 7B).

Discussion
Previously, we established that the IBV M41 S1 protein binds sialic acid-substituted glycoconjugate ligands in chicken trachea and lung tissue (11). Intriguingly, the M41 RBD is highlyglycosylated with 10 potential glycosylation sites, and glycosylation appears to be necessary for binding to host tissues because treating the protein with a neuraminidase diminishes binding (11). This study extends our investigation toward determining the role of glycosylation in the function of the RBD, which encompasses the N-terminal region of the native protein. Each of the potential glycosylation sites was individually ablated, and each construct was examined for its ability to bind tissue and an ELISA-presented ligand. In addition, the global glycosylation profile of every construct was surveyed, and glycosylation of three representative constructs was examined on a site-specific basis.
Six of the 10 glycosylation sites in the RBD domain of IBV M41 were essential for binding to chicken trachea tissue and an ELISA-presented sialylated oligosaccharide ligand. CD analysis demonstrated that both secondary structure and stability were similar across all the RBD constructs indicating the proper fold was likely retained for all. Globally, percent abundances of sialylated glycans differed across mutants, but the differences were not associated with loss of binding. For example, 51 and 20% of the glycans in binding mutants N145A and N246A, respectively, and 46 and 51% of the glycans in the nonbinders N126A and N160A, respectively, were sialylated (summed from Fig. 4). By comparison, 40% of the glycans in the unmodified RBD construct were sialylated. On a site-specific basis, some glycosylation sites had more sialylation than others (Table S1). On average, each of glycosites Asn-126, Asn-194, Asn-229, and Asn-246 were sialylated at least 50% of the time. Sites Asn-229 and Asn-246 were in the less-ordered region of the protein Figure 5. Site-specific glycosylation of M41, N59A, and N145A. The S1-N-terminal receptor binding domain residues 21-268 from PDB entry 6cv0 is represented as gray ribbons. The asparagines of glycosylation sites that could still bind trachea tissue after mutation to alanine are in cyan, and those that could not are in dark red. GlcNAc residues from the structure are dark blue balls and sticks. The most predominant glycan for each site across all three constructs is shown to the right. Glycoforms shown on the right are based on our data, and inferred structural detail is based on accepted knowledge of the cell type used in protein production. Monosaccharides are represented as follows: mannose (green circles); galactose (yellow circles); GlcNAc (blue squares); fucose (red triangles); and sialic acid (purple diamonds). Numbering of the sites is based on the mature sequence. The figure was made with CCP4MG (38) and GIMP.  (11) and referenced in the figure as Array score 1 . White columns were against structure without sugars, and gray columns were LC/MS-identified where the sugars were modeled. Bottom, RBD-binding domain of M41 from PDB structure 6cv0. Glycosylation sites are shown as cyan balls. Sites where two or more oligosaccharides docked seven or more times are indicated as colored space-filled amino acids. Colors and labels match the table above. B is A turned 90°toward the user. Structure representations were made in CCP4-MG (38). Sugar symbols were rendered with DrawGlycan-SNFG (www.virtualglycome.org/DrawGlycan/) 3 (39).

Avian coronavirus glycosylation
away from the galectin fold where binding is associated in the docking study. Site Asn-194 is at the bottom of the galectin fold and is required for ligand binding. Site Asn-126 is at the top of the galectin fold and is also required for binding. Although we cannot conclude that sialylation is required at Asn-194 and Asn-126, it is clear that glycosylation at these sites serves a role in ligand binding.
The publication of the cryo-EM structure of M41 (18), the first structure of a spike protein from a gammacoronavirus, made it possible to visualize the distribution of the glycosylation sites in the tertiary structure of the protein. The study verified the site occupancy we observed on M41-RBD because 9 of 10 of the glycosylation sites in the EM structure were occupied. Site Asn-246, not occupied in the EM structure, is on a ␤-strand in the EM structure, and it forms close contacts with the S1 C-terminal domain in the native protein. The C-terminal domain was not part of our construct. Therefore, Asn-246 in the recombinant constructs was likely in an environment much different from that found in the full-length protein.
Many human galectins, and also the bovine ␤-coronavirus spike protein (18), bind sugars at what is the top of the ␤-sandwich near site Asn-126 in the RBD constructs (see Fig. 5). The bovine RBD site Asn-198 closely aligns with site Asn-126 of M41 (see Fig. 7). In the bovine protein, this demarks the region of proposed ligand binding. Loss of Asn-126 in the M41 RBD abrogates binding to trachea tissue. Although ablation of Asn-126 diminishes ligand binding, our docking study gave no evidence that this is the sialyl ligand-binding site in M41. Evaluation of the charge distribution in the proposed binding sites indicates that the bovine site is negatively charged, whereas the negative charge in the same region in M41 is sparse (Fig. 7). This difference in charge near Asn-126 may explain the lack of ligand docking in this region (gray ␤-strands in Fig. 6B) during docking simulations.
The precise ligand-binding region of proteins with a galectin fold varies. Rotavirus protein VP4, for example, binds sialic acid in a groove between the ␤-sheets of the sandwich (20). The clustering of five of six required N-glycosylation sites suggests the location of the ligand-binding site may be on the right of the galectin fold as shown in Fig. 5. Our docking experiments studying 17 possible oligosaccharide ligands to M41 were not conclusive in terms of binding energies but did identify four potential saccharide-binding regions (Fig. 6). Docking also demonstrated that glycosylation affects binding in silico because one potential site (site A; see Fig. 6) lost favor, whereas another one, site D, dramatically gained favor when the protein was glycosylated. Site D is in the center of three glycosylated asparagines required for binding (Asn-59, Asn-85, and Asn-160), and one whose loss results in a very strong histochemical signal and has a protein-wide effect on glycosylation with increased sialylation (Asn-145). In addition, the site D region is negatively charged (see Fig. 7A) like the proposed sialyl ligandbinding site on the bovine protein (Fig. 7B) (19). All the ligands that interacted with site D were sialylated and included the glycan that bound in our ELISA studies. Interestingly, carbohydrate-carbohydrate contacts were detected in the RBD-ligand interactions at site D. This is an intriguing result because carbohydrate-carbohydrate interactions, although not common, have been reported between nonfucosylated antibodies and their receptor, in cell-cell adhesion interactions, between tumor antigens, and between bacterial receptors and mucin (21)(22)(23)(24)(25). A literature search did not uncover any reported carbohydrate-carbohydrate interactions between virus and host. Although our docking study must be evaluated in the context of the higher root mean square deviations typical of EM structures, and the inexactness of modeled oligosaccharides, results suggest that a combination of carbohydratecarbohydrate and carbohydrate-protein interactions should be considered in the binding mechanism.
In conclusion, we have shown that glycosylation of six sites on the M41 IBV RBD are necessary for the interaction of M41 with both trachea tissue and Neu5Ac(␣2-3)Gal(␤1-3) GlcNAc ligand in ELISA. Based on occupancy data, at least nine sites were glycosylated in the recombinant M41-RBD. Deletion of individual glycosylation sites had little effect on secondary structure, but it did have some effect on overall glycosylation profiles of some variants, especially N145A. Some differences can be expected because one site, with specific glycans, is lost from each variant, thus mildly altering overall profiles. In silico docking suggests that glycosylation may guide ligand binding. Especially intriguing is site D, where glycosylation is required for in silico docking at that site. The interaction of M41 IBV with sialyl ligand may prove to be a unique interaction involving both carbohydrates and protein. Further investigation is warranted.

Ethics statement
The tissues used for this study were obtained from the tissue archive of the Veterinary Pathologic Diagnostic Center (Department of Pathobiology, Faculty of Veterinary Medicine, Utrecht University, The Netherlands). This archive is composed of paraffin blocks with tissues maintained for diagnostic purposes; no permission from the Committee on the Ethics of Animal Experiment is required.

Plasmid construction
The pCD5 vector containing IBV M41-RBD in-frame with a C-terminal GCN4 trimerization motif and Strep-Tag has been  A and pink boxes on B. Y162, E182, W184, and  H185 in B are involved in binding to sialic acid. The large asterisk in A indicates possible binding site based on structural comparison between the two proteins. Images were made with CCP4-MG (38). Bovine coordinates are from PDB code 4H14.

Avian coronavirus glycosylation
described previously (10). Site-directed mutagenesis using the Q5 technology (New England Biolabs) was performed to mutate the asparagine-encoding residues of the N-linked glycosylation sequence motif NX(S/T) into alanine or valine using the primers in Table 3. Sequences of the resulting RBDs were confirmed by Sanger sequencing (Macrogen, The Netherlands).

Production of recombinant proteins
HEK293T (ATCC CRL-3216) cells were transfected with pCD5 plasmids using polyethyleneimine at a 1:12 ratio. The recombinant proteins were purified using Strep-Tactin-Sepharose beads, as described previously (11), and their production was confirmed by Western blotting using Strep-Tactin HRP antibody (IBA, Germany).

CD
Recombinant M41 and its variants were prepared for CD spectroscopy by buffer exchange and concentration with four centrifugation cycles through 10-kDa MWCO Amicon Ultra 0.5-ml centrifugal filters (UFC 501024) into 10 mM sodium phosphate, pH 7.75. Final concentrations were measured with a Thermo Fisher Scientific Nanodrop 2000 spectrophotometer. CD spectra were collected on a JASCO J-810 spectropolarimeter with a Peltier thermostated fluorescence temperature controller module. Samples were diluted to 0.06 mg/ml and four scans accumulated from 285 to 190 nm with a scanning speed of 10 nm/min, digital integrated time 1-s, bandwidth 1 nm, and standard sensitivity at 25°C. A thermal melt was done from 25 to 95°C with a ramp rate of 1°C/min. Measurements were taken every 2°at 222, 218, 215, 212, 208, 205, 196, and 194 nm. A full CD scan was collected at 95°C. The temperature was then lowered to 25°C. After allowing the protein to refold for 20 min at 25°C, a third CD scan was taken at 25°C to measure recov-ery. A Savitzky-Golay filter was used to smooth CD data at different temperatures for visual comparison (Fig. S2).
Secondary structure calculations for the CD data collected at 25°C before the thermal melt were processed by Dichroweb (16) using the CDSSTR (26), Selcon3 (27), and Contill (28) algorithms with protein reference set 7. Results from the three algorithms were averaged and plotted in Fig. 2.

Protein histochemistry
Histochemistry was performed as described previously (11). Briefly, chicken trachea tissues from a 7-week-old broiler chicken were sectioned at 4 m before incubation with RBD proteins at 100 g/ml. Desialylated tissues were prepared by pre-treatment with 2 milliunits of neuraminidase (sialidase) from A. ureafaciens (AUNA, Sigma, Germany) in 10 mM potassium acetate, 2.5 mg/ml Triton X-100, pH 4.2, at 37°C overnight before protein application. Chicken trachea tissues were from a 7-week-old broiler chicken (G. gallus) obtained from the tissue archive of the Veterinary Pathologic Diagnostic Center (Department of Pathobiology, Faculty of Veterinary Medicine, Utrecht University, The Netherlands).

ELISA
Sialic acids (Neu5Ac␣2-3Gal␤1-3GlcNAc-PAA, 3-SiaLc-PAA, GlycoNZ, Russia) were coated (1 g/well) in a 96-well Maxisorp plate (NUNC, Sigma) at 4°C overnight, followed by blocking with 3% BSA (Sigma) in PBS-0, 1% Tween. RBD proteins (100 g/ml) were preincubated with Strep-Tactin-HRPO (1:200) for 30 min on ice, before applying them to the plates for 2 h at room temperature. 3,3Ј,5,5Ј-Tetramethylbenzidine substrate was used as a peroxidase substrate to visualize binding, after which the reaction was terminated using 2 N H 2 SO 4 . Absorbances (A 450 nm ) were measured in a FLUOstar Omega (BMG Labtech) microplate reader, and MARS data analysis software was used for analysis. Protein samples of each recombinant protein were measured at each concentration in triplicate. Statistical analysis was performed by comparing each protein to the unmodified RBD using two-way ANOVA with Dunnett's multiple comparisons test where ␣ was set to 0.05.

Glycopeptide preparation, enrichment, and N-glycan release
The workflow is shown in Fig. S3. Aliquots between 200 and 400 g of M41, N59A, and N145A and 50 g of the remaining proteins were digested with trypsin as per An and Cipollo (29). Approximately 25-100-g aliquots of protease-digested proteins were processed for deglycosylated glycopeptide and permethylated glycan analyses. Samples were resuspended in 50 mM ammonium bicarbonate, pH 8.0. Glycans were released by digestion with 10 units/l PNGase F (glycerol-free from New England Biolabs) for 3 h at 37°C. The samples were adjusted to pH 5.0 with 2-4 l of 125 mM HCl. To maximize glycan release, samples were further digested with 0.15 milliunits/l PNGase A overnight at 37°C. Free glycans and deglycosylated peptides were separated using C18 SPE cartridges (Thermo Fisher Scientific). Intact glycopeptide analyses were performed using 175-300 g of HILIC-enriched glycopeptides as per An and Cipollo (29). Following data collection on the trypsinized glycopeptides, the remainder of the M41, N59A, and N145A sam-

Site occupancy
LC/MS E data were collected on trypsinized peptides deglycosylated with PNGase F as described under N-glycan release. Asparagines that are deglycosylated by PNGase F are converted to aspartate with a mass gain of 0.984 Da due to the replacement of -NH 2 with -OH. The percent occupancy for each site is calculated by comparing the intensity of peptides with Asn to those with Asp. However, spontaneous deamidation of unmodified Asn to Asp can also occur. 18 O-Water, which results in mass shift of 2.984 Da, was used to ensure calculated percent occupancy was not skewed due to spontaneous deamidation. This experiment allows for examination of both spontaneous and enzymatically catalyzed deamidation, and therefore, accurate estimations of percent occupancy of glycosites can be determined. Percent occupancy was calculated by comparing the intensities of the deglycosylated (DG) and nonglycosylated (NG) peptides using the equation: DG/(DG ϩ NG)⅐100.

Purification, permethylation, and semi-quantitation of free glycans
PNGase-released N-glycans were applied to C18 SPE and eluted with 0.1% formic acid leaving the deglycosylated peptides bound to the C18 column. The glycan eluate fractions were combined, and butanol was added to a final concentration of 1%. The samples were then loaded onto 100-mg porous graphite columns prepared first by sequential washes of 1 ml of 100% acetonitrile (ACN), 1 ml of 60% ACN in water, 1 ml of 30% ACN in water, and 1 ml of water. All solutions contained 0.1% trifluoroacetic acid (TFA). The loaded columns were washed three times with 1 ml of 0.1% TFA in water, then eluted with 30% ACN, 0.1% TFA, water, followed by 60% ACN, 0.1% TFA, and water. The eluents were pooled and dried in glass vials by rotary evaporation. Permethylation was done following the method of Cincanu and Costello (30) and Cincanu and Kerek (31). MALDI-TOF analysis of permethylated N-glycans was performed on a Bruker Autoflex TM speed mass spectrometer in positive polarity reflectron mode. 2,5-Dihydroxybenzoic acid was used as a matrix, and malto-oligosaccharides were used as an external calibrant. Data were processed using FlexAnalysis TM . Each sample was spotted three times, and scans were collected in positive reflectron mode. Peaks were picked and assigned, and intensities were averaged across each set of spots using in-house software. Assignments were based on glycans known to be present in HEK293T cells.

Reverse-phase nanoLC/MS E analysis of glycopeptides and peptides
Each peptide or glycopeptide sample was analyzed three times. A C18 column (BEH nanocolumn 100 m inner diameter ϫ 100 mm, 1.7-m particle, Waters Corp.) was used for nanoLC/MS E analyses. A nanoAcquity UPLC system (Waters Corp.) was used for automatic sample loading and flow control. Load buffer was 3% ACN, 97% water. Peptides were eluted via a 60-min gradient from 3 to 50% ACN with a flow of 0.4 l/min. All chromatography solutions included 0.1% formic acid. The eluent flowed to an uncoated 20-m inner diameter PicoTip Emitter (New Objective Inc., Woburn, MA). The mass spectrometer was a SYNAPT G2 HDMS system (Waters Corp.). Applied source voltage was 3000 V. Data were collected in positive polarity mode using data-independent MS E acquisition, which consists of a starting 4-V scan followed by a scan ramping from 20 to 50 V in 0.9 s. To calibrate internally, every 30 s 400 fmol/l Glu-fibrinopeptide B with 1 pmol/l leucine enkephalin in 25% acetonitrile, 0.1% formic acid, 74.9% water was injected through the lockmass channel at a flow rate of 500 nl/min. Initial calibration of the mass spectrometer was performed in MS 2 mode using Glu-fibrinopeptide B and tuned for a minimum resolution of 20,000 full-width at half-maximum.

Data analysis for peptide and glycopeptide identification
NanoLC/MS E data were processed using BiopharmaLynx 1.3 (Waters Corp.) and GLYMPS (in-house software) (32,33) to identify specific glycans on each peptide. The search settings included trypsin digest with up to one missed cleavage, fixed cysteine carbamidomethylation, variable methionine oxidation, and variable N-glycan modifications based on a building block glycan library. Assignment inclusion criteria were as follows: 1) the presence of a core fragment (peptide, peptide ϩ HexNAc, peptide ϩ HexNAc 2 , peptide ϩ dHex 1 HexNAc 1 , and peptide ϩ Hex 1 HexNAc 2 ); 2) the presence of three or more peptide fragments; 3) the presence of three or more assigned glycopeptide fragments; 4) assignment is made in at least 2 of 3 injections; and 5) the existence of the glycan in GlyConnect (https://glyconnect.expasy.org). 3

Docking
Residues 21-268 of the M41 spike EM structure were extracted from the published structure (PDB code 6cv0) (18). This corresponds to the M41-RBD used in this paper. Glycamweb's glycoprotein-builder program (34) was used to add the major oligosaccharide found at each glycosylation site onto the protein in silico. All glycosites in the M41 EM structure were occupied except Asn-246; however, Asn-246 was occupied in our data and was populated accordingly. All glycosites were glycosylated in the new PDB file based on best evidence from our MS data. The coordinates of M41-RBD without glycans, M41-RBD with modeled glycans, and bovine RBD (PDB code 4H14) were used in docking experiments. A virtual library of 17 oligosaccharides representing a variety of binding epitopes was created based on the CFG array version 4.2 (see Fig. 6 for a list). Raw models of the oligosaccharide ligands were created with the AMBER tool tleap (www.ambermd.org) 3 utilizing the GLYCAM06 force field (35), then energy minimized using YASARA (36). Dock screening of the library was performed with the YASARA implementation of Autodock Vina (37) with default parameters. A molecular dynamics simulation with explicit water (TP3) but with fixed coordinates for the backbone atoms was run on the glycosylated M41 RBD model to allow the amino acid side chains to accommodate the added glycans and to find low energy conformations. Two models were extracted from the glycosylated MD RBD run at 5 and 10 ns, which were used for dock screening with the virtual library.

Avian coronavirus glycosylation
Each oligosaccharide ligand was docked against the structures 20 times. Docking results shown in Fig. 6 are for the 10-ns model. Results were similar in the 5-ns models.