Structure of a Hydroxyproline (Hyp)-Arabinogalactan Polysaccharide from Repetitive Ala-Hyp Expressed in Transgenic Nicotiana tabacum*

A synthetic gene encoding the fusion protein (Ala-Hyp)51-enhanced green fluorescent protein expressed in Nicotiana tabacum cells produced a fusion glycoprotein with all proline residues hydroxylated and substituted with an arabinogalactan polysaccharide. Alkaline hydrolysis of the fusion glycoprotein yielded a population of hydroxyproline (Hyp)-arabinogalactan polysaccharides ranging in size from 13 to 26 saccharide residues/Hyp, with a median size of 15-17 residues. We isolated a 15-residue Hyp-arabinogalactan for structure determination by sugar analyses and one- and two-dimensional nuclear magnetic resonance techniques that provided the assignment of proton and carbon signals of a small polysaccharide O-linked to the hydroxyl group of Hyp. The polysaccharide consisted of a 1,3-linked β-d-Galp backbone with a single 1,6-linked β-d-Galp “kink.” The backbone had two side chains of Galp substituted at position 3 with an arabinose di- or trisaccharide and at position 6 with glucuronic acid or rhamnosyl glucuronic acid. Energy-minimized space-filling molecular models showed hydrogen bonding within polysaccharides attached to repetitive Ala-Hyp and also between polysaccharides and the peptide backbone. Polysaccharides distorted the peptide Ramachandran angles consistent with the circular dichroic spectra of isolated (Ala-Hyp)51 and its reversion to a polyproline II-like helix after deglycosylation. This first complete structure of a Hyp-arabinogalactan polysaccharide shows that computer-based molecular modeling of Hyp-rich glycoproteins is now feasible and supports the suggestion that small repetitive subunits comprise larger arabinogalactan polysaccharides.

Arabinogalactan proteins (AGPs) 1 expressed at the plant cell surface comprise a multigene family of hydroxyproline-rich glycoproteins (HRGPs) broadly implicated in all aspects of plant growth and development from fertilization to apoptosis (1,2).
Compared with other HRGPs like the extensins that contain highly repetitive motifs based on contiguous Hyp residues, AGPs show a lower peptide periodicity based on clustered noncontiguous Hyp residues. Although short O-linked arabinoside substituents decorate contiguous Hyp residues of both extensins and AGPs, the larger arabinogalactans attached to clustered non-contiguous Hyp define the hyperglycosylated AGPs (90 -95% carbohydrate) (3,4). These differences in glycosylation reflect different networks, cellular locations, and biological functions. Thus cross-linked extensins form covalent networks in muro, whereas noncovalently associated AGPs (5,6) occur primarily at the periplasmic interface where a glycosylphosphatidylinositol lipid initially anchors them to the plasma membrane (7)(8)(9)(10).
The precise role of AGPs, both soluble and anchored, remains unclear. However, AGPs readily associate (5,11); furthermore, ␤-glycosyl Yariv compounds interact specifically with AGPs and also inhibit cell expansion (12,13). These observations imply that multiple weak interactions between the polysaccharide components are essential to the role of AGPs (8 -10). The arabinogalactan substituents occur in small clusters along the polypeptide backbone forming glycomodules that are highly conserved (14 -16) and hence of functional significance, no doubt involving specific interactions. Thus, it is of paramount interest to elucidate the size, structure, arrangement, shape, and binding partners of these glycomodules in more detail.
AGP arabinogalactan polysaccharides typically consist of short side chains containing arabinose and often rhamnose (Rha) and glucuronic acid (GlcUA) attached to a ␤-(133)-linked galactan backbone core (11). Based on 13 C NMR spectra of crude gum arabic degradation products, Defaye and Wong (17) deduced a complete linkage analysis of a short pentameric side chain containing Ara, galactose (Gal), Rha, and GlcUA, and they proposed that the side chain is a major component of gum arabic arabinogalactan polysaccharide. Likewise, Akiyama and Kato (18) suggested a similar structure occurred in the AGPs of Nicotiana tabacum suspension-cultured cells. More recently, Gane applied one-dimensional and two-dimensional homonuclear and heteronuclear NMR techniques to characterize a mixture of AGPs from Nicotiana alata and described a structure having a highly branched backbone of 3-, 6-, and 3,6-linked Galp with terminal Galp and Araf side chains (11). However, complete structural elucidation of an entire AGP polysaccharide remained a formidable problem.
Recently, we introduced the use of synthetic genes to express single repetitive glycopeptide motifs of AGPs and extensins which effectively amplifies glycomodules of interest and simplifies their isolation (14,15,19). AGPs often have short Ala-Hyp repeats. Synthetic gene constructs of these repeats gave an (Ala-Hyp) 51 expression product with all the Hyp residues glycosylated by an arabinogalactan polysaccharide (19). This novel AGP was therefore suitable for isolating Hyp-arabinogalactan polysaccharides, first described in sycamore-maple (20) and tomato cells (21). Here we report the first complete structural analysis of an arabinogalactan polysaccharide, including a space-filling model that depicts possible glycan interactions.

EXPERIMENTAL PROCEDURES
Isolation of the (Ala-Hyp) 51 51 -The (Ala-Hyp) 51 -EGFP fusion glycoprotein was isolated from spent medium of 20-day transformed N. tabacum BY-2 suspension cultured cells by a combination of hydrophobic interaction and reversed phase chromatography, as described earlier (19). (Ala-Hyp) 51 was isolated from tryptic digests of (Ala-Hyp) 51 -EGFP by gel permeation chromatography, also described earlier (19).
Base Hydrolysis of (Ala-Hyp) 51 -EGFP-Sixty-six milligrams of (Ala-Hyp) 51 -EGFP was dissolved in 9 ml of 0.44 N NaOH and hydrolyzed at 105°C for 18 h. The hydrolysate was chilled on ice and neutralized with 1 N HCl to pH 7.6, followed by freeze-drying.
Hyp-Arabinogalactan Glycoamino Acid Purification-The neutralized, dry base hydrolysate of (Ala-Hyp) 51 -EGFP was dissolved in deionized, distilled H 2 O and applied to a 75 ϫ 0.6-cm column (Hϩ form) of Technicon Chromobeads C washed and eluted with deionized, distilled H 2 O. The water eluate contained the Hyp-polysaccharides that were pooled and freeze-dried before further purification on an analytical Superdex-peptide gel filtration column (Amersham Biosciences). The Superdex-peptide column was equilibrated and isocratically eluted in 20% acetonitrile (aqueous) at a flow rate of 0.3 ml/min. One hundred microliters of each fraction (fraction size was 0.9 ml) were assayed for Hyp as described earlier (20). Superdex fractions representing the median arabinogalactan size were combined and refractionated on the Superdex-peptide column using the same conditions described above, except the fraction volumes were smaller (450 l/1.5 min). Fractions containing the Hyp-glycoamino acids were individually freeze-dried. We chose the fraction with the most material, designated AHP-1, for further analyses.
Sugar Composition and Linkage Analyses-The monosaccharide composition of AHP-1 was analyzed as alditol acetate derivatives by gas chromatography (22) or by gas chromatography-mass spectrometry of the per-O-trimethylsilyl derivatives (23,24). Uronic acids were also estimated colorimetrically (25). For linkage analysis, samples were permethylated, depolymerized, reduced, and acetylated resulting in partially methylated alditol acetates that were analyzed by gas chromatography-mass spectrometry as described earlier (24).
Glycan Size Estimation-The glycan size was determined by dividing the moles of monosaccharide in a known weight of AHP-1 or (Ala-Hyp) 51 -EGFP by the moles of Hyp present. Hyp amount was determined colorimetrically (20) and the monosaccharide content by gas chromatography as described above.
Circular Dichroism-CD spectra of standard poly-L-hydroxyproline (5-20 kDa, Sigma) and purified (Ala-Hyp) 51 and deglycosylated (Ala-Hyp) 51 were recorded on a Jasco-715 spectropolarimeter (Jasco Inc., Easton, MD). Spectra were averaged over two scans with a bandwidth of 1 nm and step resolution of 0.5 nm. All spectra were reported in terms of mean residue ellipticity with the 190 -250-nm region using a 1-mm path length. Samples were dissolved in water at a concentration of 18 M.
NMR Experiments-All one-and two-dimensional NMR experiments were carried out on AHP-1 (ϳ2.5 mg AHP-1/ml D 2 O) with Varian Unity INOVA 600-MHz spectrometer operating at 1 H and 13 C frequencies of 598.621 and 150.536 MHz, respectively. Analyses were conducted at 60°C, as at this temperature the residual H 2 O signal overlapped with the fewest resonances and the line widths had narrowed. The only exception was a NOESY experiment that was carried out at 30°C to slow down molecular motion and increase the number of cross-peaks. Quadrature detection in the indirect dimension was achieved by the hypercomplex method (26) for both homonuclear and heteronuclear experiments. HMQC experiments were typically conducted with spectral width of 8 kHz for 1 H and 30 kHz for 13 C, respectively, a 1 J CH set to 150 Hz, a relaxation delay of 1.5 s, the delay for minimizing signals from 1 H bound to 12 C set to 0.30 s, and WURST decoupling (27) on the 13 C channel. Data were collected as an array of 2 ϫ 128K point, which after linear prediction in t 1 dimension and zero filling in both dimensions produced a 2 ϫ 1K data matrix. For HMBC experiments, the acquisition conditions were similar to those used for HMQC spectra but with no decoupling in the 13 C channel. Data set consisting of 4 ϫ 1K data points was acquired with multibounds J CH set to 8 Hz. DQCOSY spectra were collected with spectral width of 8 kHz in both dimensions. A relaxation delay of 2 s was used. The TOCSY mixing time was 90 ms. The NOESY spectra were recorded with mixing time of 250 ms. Data were processed by using NMRPipe (28) and analyzed by using nmrview (29).
Molecular Modeling Software-We used HyperChem version 7.0 running on a fast dual-processor Pentium machine (plus 1Gb RAM) to build and energy-minimize the (Ala-Pro) 6 glycosylated and non-glycosylated peptide via the Mmϩ force field and steepest descent algorithm. The HyperChem Sugar Builder module allowed the facile construction of small oligosaccharides that were then energy-minimized and ligated to form the small polysaccharide AHP-1 followed by final energy-minimization again using the Amber3 force field. (We corrected a software error in the anomeric configuration of L-sugars by inserting the ␣for ␤-anomer and vice versa in order to follow IUPAC definitions.) Three AHP-1 polysaccharides were then attached glycosidically to (Ala-Pro) 6 to form a tight cluster of three consecutive arabinogalactans on noncontiguous Hyp residues and then subjected to final energy minimization using the Amber3 force field. We found it helpful to define the approximate bond lengths for the Gal-Hyp glycosidic link by using the ''set bond length'' command when attaching the polysaccharide to the polypeptide.

-EGFP
Earlier work showed that (Ala-Hyp) 51 -EGFP, isolated as described earlier (19), contained 51 O-Hyp-linked arabinogalactan polysaccharides containing Gal, Ara, GlcUA, and Rha. Base hydrolysis of (Ala-Hyp) 51 -EGFP released Hyp-arabinogalactans that voided a Chromobeads cation exchange column (Fig.  1A). Further fractionation on a Superdex-peptide gel filtration  51 -EGFP base hydrolysate on a Chromobeads C2 cation exchange column (A) yielded a single peak containing Hyp that eluted in the void volume; this was consistent with Hyp-polysaccharide being the only Hyp-glycoside present in the fusion protein (19). Fractions 11-18 were collected, freeze-dried, and further fractionated on the Superdex peptide gel permeation column, shown in B. Fractions 11 and 12 from the initial Superdex peptide column fractionation step (B) contained the most material judging by weight recovered and therefore were combined and refractionated on the Superdex peptide column (C) although the volume of the fractions shown was half those shown in B. The Hyp-polysaccharides in fraction 23 of C were designated AHP-1.
column yielded a major peak containing small Hyp-arabinogalactan polysaccharides ranging in size from about 13-26 monosaccharide residues judging by the molar ratio of monosaccharide to Hyp in each fraction (Fig. 1B). Fractions 11 and 12 from the column contained the most material (2.8 mg), judging by the recovered weights, and therefore were chosen for refractionation on the Superdex column. Refractionation of 11 and 12 produced a single major peak (Fig. 1C) the tip of which we collected for structural analyses. The peak was designated Ala-Hyp polysaccharide-1 (AHP-1). Monosaccharide analyses of isolated AHP-1 indicated it was rich in Gal and Ara (50 and 33 mol %, respectively) with lesser amounts of Rha and GlcUA (4 and 14 mol %, respectively), yielding the following molar ratios: 7 Gal:5 Ara:2 GlcUA:0.5 Rha. This was consistent with the estimated size and composition of the isolated oligosaccharide and with the NMR data below, albeit with an underestimate of Rha. The isolated AHP-1 glycan size was also consistent with that deduced from the sugar composition of intact (Ala-Hyp) 51 -EGFP, which also showed a ratio of 15.1 sugar residues per Hyp. This indicates that Hyp-polysaccharides were released, but not degraded, by alkaline hydrolysis.
Consistent with the sugar composition, the linkage analysis of AHP-1 (Table I) showed mainly 3,6-linked Galp with lesser amounts of terminal, 3-and 6-linked Galp; Ara occurred as terminal Araf, 3-and 5-linked Araf, all Rha was terminal, and the GlcUA residues were either terminal or 4-linked. However, the linkage analyses (repeated three times) were not strictly quantitative due to incomplete methylation and perhaps partial loss of volatile methylated pentose.

Structural Elucidation of AHP-1 by One-and Twodimensional NMR Spectroscopy
To confirm and extend the results of the glycosyl residue and glycosyl-linkage composition analyses, we analyzed AHP-1 by one-and two-dimensional homonuclear and heteronuclear NMR spectroscopy. Spectra are shown in Figs. 2-4, and the assignments are reported in Tables II and III. The 1 H assignments were obtained from a one-dimensional proton spectrum (Fig. 2), from homonuclear COSY, TOCSY, ROESY (Fig. 4), and NOESY two-dimensional spectra and a heteronuclear HMQC two-dimensional spectrum (not shown). The 13 C assignments were established from a combination of two-dimensional HMQC (not shown) and HMBC heteronuclear spectra (Fig. 3) and proton decoupled one-dimensional and DEPT 13 C spectra.

AHP-1 Sugar Molar Ratios and Anomeric Configuration
A one-dimensional 1 H NMR spectrum determined the number of saccharide anomeric protons with aid from the glycosyl composition analyses. The spectrum (Fig. 2) showed resonances at 5.25, 5.09, 4.79, 4.77, 4.70, 4.57, and ϳ4.50 ppm in a ratio of 4:1:1:1:4:1:4 determined by integrating areas in the one-dimensional 1 H NMR spectrum. We assigned the resonances based on known chemical shifts (11, 30 -33) and the AHP-1 composition as follows: signals at 5.25 and 5.09 ppm corresponded to H-1 of four and one ␣-L-Araf residues, respectively. The signal at 4.79 ppm corresponded to H-1 of a single ␣-L-Rhap residue; although there is no published AGP 1 H spectrum indicating the H-1 assignment of ␣-L-Rhap, identification in the HMQC spectrum (not shown) of the corresponding 13 C signal at 101.9 ppm confirmed the Rha H-1 assignment (14,34). We assigned the 1 H signal at 4.77 ppm to H-4 of 4-O-glycosylated Hyp, although this assignment had not been reported previously. We identified the Hyp H-4 signal because it was correlated with a Hyp C-4 signal at 79 ppm (35,36) in the HMQC spectrum. The resonances at 4.70 and 4.57 ppm in the one-dimensional 1 H NMR spectrum corresponded to H-1 of five ␤-D-Galp residues (30,35) in a ratio of 4:1, the latter signal at 4.57 ppm corresponding to H-1 of Gal O-linked to Hyp (35). The signals ranging from 4.46 to 4.52 ppm corresponded to the anomeric protons of two ␤-D-Galp residues (11,30,32) and two ␤-D-GlcpUA residues, respectively ( Fig. 3); however, we only assigned the signal at 4.52 ppm to H-1 of ␤-D-GlcpUA after identifying the cross-peak corresponding to C-1 (103.8 ppm) (17,18,34) and H-1 of ␤-D-GlcpUA in the HMQC spectrum. Thus, the onedimensional 1 H NMR spectrum together with the glycosyl molar ratios showed that AHP-1 is a glycoamino acid containing 15 sugar residues: 7 Gal, 5 Ara, 2 GlcUA, 1 Rha, and 1 Hyp residue. The one-dimensional 1 H NMR spectrum also showed that 1 Ara residue was in a unique chemical environment relative to the other 4 Ara residues (the H-1 signal at 5.09 ppm versus those at 5.25 ppm) and that the 7 Gal residues occur in three distinct chemical environments in a molar ratio of 2:1:4 (H-1 signals at ϳ4.5, 4.57, and ϳ4.7 ppm).

Identification of the ␤-D-Gal-O-Hyp Linkage
The 13 C and 1 H resonances corresponding to the Hyp residues were easily distinguished from those of saccharides judging from earlier work (35,36) and by TOCSY and HMQC experiments from which we were able to assign each 1 H and 13 C NMR resonance to a distinct residue (Tables II and III) 2. The one-dimensional 1 H NMR spectrum corresponding to the anomeric region of AHP-1. The chemical shifts indicate the type of monosaccharides involved, and the peak areas indicate the number of anomeric protons giving rise to the signals. Peaks A and B correspond to H-1 of four and one ␣-L-Ara residues, respectively; peak C to H-1 of a single ␣-L-Rha residue; peak D to H-4 of the AHP-1 Hyp residue; peak E to H-1 of four ␤-D-Gal residues (the galactan main chain); peak F to H-1 of a single ␤-D-Gal residue (G o , the Gal linked to Hyp); and peak G to H-1 of two ␤-D-Gal residues and two ␤-D-GlcUA residues. spectrum (not shown) corroborated the assignments.
A second set of weak cross-peaks in the HMQC spectrum occurred indicating the Hyp residues were a mixture of L-Hyp and allo-Hyp isomers formed during base hydrolysis of the polypeptide backbone (35,36) (Table III, set 2).

Side Chain Characterization
All Ara Is in the ␣-Configuration and None Is 2-Linked-The 13 C chemical shifts corresponding to C-1 of the Ara residues (Table II) indicated that all five Ara residues were ␣-L-Araf and not ␤-L-Araf, which produces a shift at ϳ102.7 ppm (11,18,32,36). Consistent with earlier spectra of terminal ␣-L-Araf in AGP polysaccharides (11,17,32,36), the 13 C signals corresponding to terminal Araf C-2 through C-5 were present in the HMQC spectrum, as were signals corresponding to Araf having only a 5-linked substituent (C-5 at 68.0 ppm) and Araf residues with substituents at C-3 (83.4 ppm) (Table II) (11,33). None of the ␣-Araf residues had 2-linked substituents as no chemical shifts corresponding to the ring carbons of Araf occurred downfield of 85.2 ppm (36), nor was 2-linked Araf detected in the glycosyl linkage composition (Table I). Thus the five Araf residues in AHP-1 were terminal or had 3-or 5-linked substituents. Judging by the glycosyl linkage analysis and one-dimensional 1 H FIG. 3. The AHP-1 HMBC spectrum collected at 60°C. The two-dimensional heteronuclear HMBC spectrum identified most of the AHP-1 monosaccharide sequence with aid from the glycosyl composition and linkage analyses and the one-dimensional NMR spectrum (Fig. 2). Cross-peak A arose from the signals corresponding to H-1 of Ara residue A 1 and C-5 of Ara residue A 2 and indicated that A 1 was linked to C-5 of A 2 (see Table  II for residue assignments). Cross-peak B arose from H-1 and C-3 of neighboring Ara residues such as A 2 -(133)-A 3 and C 1 -(133)-C 2 . Cross-peaks C and the two labeled D identified the Ara-(133)-Gal linkages, as they arose from correlations between Ara H-1 (A 3 /C 2 ) and Gal C-3 (G a /G b ) and between Ara C-1 (A 3 /C 2 ) and Gal H-3 (G a /G b ), respectively. Cross-peak E indicated the Rha residue, B 1 , was linked to position 4 of a GlcUA residue (Rha H-1/GlcA C-4). Cross-peak F indicated the GlcUA residues were linked to position 6 of Gal residues, G a and G b (GlcUA H-1/G a/b C-6). The two cross-peaks G indicated Gal residues G a and G b were linked to position 6 of Gal residues G 1 and G 0 (G a/b H-1/G 1/0 C-6). The two cross-peaks H and three J identified the Gal-(133)-Gal residues in the galactan main chain: G 1 3 G 0 and G 2  spectrum, we estimated AHP-1 had two terminal Araf residues, one 5-linked Araf residue and two 3-linked Araf residues.
There Are Two Ara Side Chains and They Are 3-Linked to Gal-The HMBC spectrum contained two cross-peaks corresponding to C-1 of Araf (110.3 ppm) and H-3 of Galp (3.70 and 3.68 ppm), thus identifying two ␣-L-Araf-(133)-Gal glycosidic linkages and the likelihood of two Ara side chains (Fig. 3, cross-peak D). This conclusion was supported by the occurrence of two distinct terminal Araf residues, discussed below.
The Araf residue designated A 1 (Table II) displayed chemical shifts typical of terminal Araf residues (11,17,36,37). Residue A 1 is linked to O-5 of Ara A 2 judging by 3 J CH correlations between the H-1 signals of A 1 (5.09 ppm) and the 13 C signal at 68.0 ppm corresponding to C-5 of an adjacent Ara, designated A 2 (Table II and Fig. 3). These assignments are consistent with earlier identified NMR peaks arising from terminal ␣-Araf linked to O-5 of another Araf residue (33) and suggest that the earlier assignment (11) of a resonance at 5.09 ppm to H-1 of ␣-L-Araf possessing a 2-linked substituent probably corresponded instead to the H-1 chemical shift of an Ara residue 135-linked to another Araf. A 1 is the only Ara residue that is in a 135 linkage to another Araf residue, judging by the one-dimensional 1 H spectrum (Fig. 2) and linkage analysis ( Table I). The other four Ara residues in AHP-1 (designated A 2 , A 3 , C 1 , and C 2 in Table II) were in very similar chemical environments judging by their anomeric proton and carbon chemical shifts at 5.25 and 110.3 ppm, respectively. One of them, designated C 1 , is a terminal residue, as it also displayed resonances typical of terminal Araf residues (11,17,19,37). Therefore, of the five AHP-1 Araf residues, two are terminal, one is involved in a 135-linkage to another Araf residue, and two in 133-linkages to Gal. The 3-linked Araf residues evident in the linkage analyses (Table I) remained to be accounted for and there are two of them judging by the one-dimensional 1 H (Fig. 3) and 13 C NMR spectra.
The HMBC spectrum showed the Araf H-1 signals at 5.25 ppm had long range 3 J CH correlations with 13 C chemical shifts at 78.0, 81.5, 83.4, and 85.3 ppm (Fig. 3 and Table II). The 13 C resonance at 78.0 ppm corresponded to unsubsti-tuted C-3 of A 2 or C 1 of Table II (11,17,30), the shift at 81.5 ppm, labeled C in Fig. 3, corresponded to C-3 of Galp (11,17,30,32) having a substituent at O-3 corroborating the ␣-L-Ara-(133)-Gal linkages identified above (G a and G b in Table II); the signal at 85.3 ppm, corresponded to C-4 of Araf (Table II, A 2 , A 3 , and C 2 ) (11,17,30). The remaining signal at 83.4 ppm, labeled B in Fig. 3, corresponded to C-3 of Araf having a substituent at O-3 (32,33,36) and indicated the presence of ␣-L-Ara-(133)-␣-L-Ara linkages. These results considered together with the one-dimensional 1 H NMR spectrum (Fig. 3) and the glycosyl linkage composition of the Ara residues (Table I) (Table  II). The corresponding 13 C chemical shifts (Table II), assigned with the aid of the one-dimensional 13 C and HMQC spectra, were similar to those reported by Defaye and Wong (17) except we did not find a signal at 175.6 ppm corresponding to C-6 of ␤-D-GlcpUA. Cross-peaks in the HMBC spectra arising from H-1 of Rhap and C-4 of GlcpUA and from C-1 of Rhap and H-4 of GlcpUA indicated that Rhap was linked to GlcpUA through  an ␣-(134) linkage. This was consistent with the composition and linkage analyses of AHP-1 (Table II). The one-dimensional 1 H NMR spectrum, the HMBC spectrum, the linkage analyses of AHP-1, and the assignments shown in Table II indicated AHP-1 had a terminal ␤-D-GlcpUA residue, and a terminal ␣-L-Rhap residue 134-linked to a second GlcpUA residue. Earlier work showed that a substitution at O-6 of Gal produces a downfield shift in the 13 C signal from ϳ62 ppm (unsubstituted) to ϳ71 ppm (substituted) (11,17). Likewise, the signals for H-6 shift from a single signal at ϳ3.77 ppm (unsubstituted) to a split signal at ϳ3.90/4.03 ppm (substituted) (11). Similar shifts reported here indicated the presence in AHP-1 of Galp residues having adducts at O-6 (i.e. as in 3,6-linked and 6-linked Gal, G a , G b , G 0 , G 1 in Table II) all of which is consistent with the glycosyl linkage composition (Table I). Furthermore, the signals assigned to H-1 of GlcpUA and C-6 of Galp yielded cross-peaks in the HMBC spectrum (labeled F in Fig. 3); therefore, we concluded the two GlcpUA residues in AHP-1 were linked to O-6 of two Gal residues, which can only be G a and G b , judging by their unique C-6 resonances at 70.2 ppm. Other lines of evidence supported this conclusion such as the NOESY spectrum conducted at 30°C (not shown) which had cross-peaks arising from the H-1 signals of GlcpUA at 4.51 ppm and H-4 of G a /G b (4.12 ppm), indicating the residues are spatially near one another; also, the earlier work of DeFaye and Wong (17) identi-fied very similar oligosaccharide side chains in arabinogalactan polysaccharide fragments isolated from gum arabic and, finally, from the structure of the galactan main chain discussed below.
G a and G b Are Side Chain Gal Residues-Judging from the chemical shifts corresponding to G a and G b , they are 3,6-linked residues (Table II) (11,17) and, judging by the HMBC crosspeaks discussed above, they have Ara side chains at O-3. The ring systems of G a and G b were unique and produced C-6 resonances at 70.2 ppm identifying G a and G b as the Gal residues having GlcpUA substituents at O-6. The H-1 signals of G a and G b were also correlated with unique Gal C-6 signals at 70.5 and 70.8 ppm that arose from two other Gal residues, designated G o and G 1 in Table II. Thus G a and G b , bearing Ara adducts at O-3 and GlcpUA adducts at O-6, are themselves linked to the O-6 of two other Gal residues in AHP-1. Gal residue G 0 , which is linked to Hyp, has already been identified as a 3,6-linked residue and, judging by the chemical shifts assigned to G 1 , G 1 is a 3,6-linked Gal residue as well (Table II). Thus G a , G b , G 0 , and G 1 were identified as 3,6-linked Gal residues and account for 4 of the 7 Gal residues in AHP-1.
In summary, AHP-1 contains two oligosaccharide side chains consisting of G a and G b substituted at position 3 with the ␣-L-Araf units and at position 6 with ␣-L-Rhap-(134)-␤-D-Glc-pUA or ␤-D-GlcpUA units.  Table II, including the anomeric configurations and linkage positions. Hyp is in the lower right-hand corner of the structure. A NOESY spectrum collected at 30°C (not shown) indicated that the Ara side chain containing the 135-linked Ara residue was nearest the nonreducing end of the glycan; however, the spectrum did not indicate which side chain contained the Rha residue. Here the Rha, B 1 , is featured in the top center above, on the side chain nearest the non-reducing end.

The Galactan Backbone
Four Gal residues identified in the one-dimensional 1 H NMR spectrum (Fig. 2) had H-1 shifts at ϳ4.7 ppm (specifically, 4.68 and 4.71 ppm, G 1 -G 4 in Table II) and corresponding C-1 chemical shifts at 105.0 ppm, assigned through the HMQC spectrum. We assigned the shifts at 4.68 and 4.71 ppm, respectively, to four Gal residues within the galactan backbone, assignments not previously made because these chemical shifts are typically obscured by the water peak in earlier spectra collected at 30°C (11,31) but evident in spectra such as those reported here collected at a higher temperatures (60°C) in which the water peak shifted upfield.
There are two Gal residues with H-1 signals at 4.68 ppm and two Gal residues with H-1 signals at 4.71 ppm, judging by the presence in the TOCSY spectrum of four ring systems associated with these H-1 signals (Table II). We determined that G 1 is linked to O-3 of G 0 , that G 2 is linked to O-3 of G 1 , that G 3 is linked to O-6 of G 2 , and that G 4 is linked to O-3 of G 3 based on the following evidence.
The HMBC spectrum (Fig. 3) showed three cross-peaks arising from Gal C-1 signals at 105.0 ppm and Gal H-3 signals at 3.82, 3.86, and 3.88 ppm (Fig. 3, cross-peaks labeled J) indicating that three of the four Gal residues discussed above were involved in ␤-D-Galp-(133)-Gal linkages, as Gal that is unsubstituted at position 3 exhibits H-3 chemical shifts much further upfield (i.e. 3.66 ppm (11)). None of these three Gal residues is G a or G b judging by the H-3 chemical shifts of G a and G b (Table  II). Thus there is a Gal residue, designated G 1 , in a 133linkage to G 0 (G 0 has an H-3 signal at 3.86 ppm). The second C-1/H-3 HMBC cross-peak indicated G 2 is linked to O-3 of G 1 (G 1 has an H-3 signal at 3.82 ppm), and the third cross-peak indicated that there is a Gal residue linked to O-3 of another Gal residue possessing a H-3 signal at 3.88 ppm, designated G 4 and G 3 , respectively. The G 1 3 G 0 and the G 4 3 G 3 linkages were corroborated by another set of cross-peaks in the HMBC spectrum arising from H-1 of G 1 and C-3 of G 0 and from H-1 of G 4 and C-3 of G 3 (Fig. 3, cross-peak H). Residue G 4 is the terminal Gal in AHP-1 judging by the HMBC spectrum which showed a correlation between signals at 62.7 ppm (C-6 of G 4 , Table II) and 3.90 and 3.70 ppm (H-4 and H-5 of G 4 ), signals that are typical of a terminal Gal residue (11,30,33). Thus the HMBC and NOESY spectra suggested the following structural units in AHP-1 in addition to those described above: ␤-D-Gal-(133)-␤-D-Gal-(133)-␤-D-Gal-(134)-Hyp (G 2 , G 1 , and G 0 in Fig. 5) and ␤-D-Gal-(133)-␤-D-Gal (G 4 and G 3 in Fig. 5).
The NOESY spectrum determined at 60°C (not shown) corroborated the results from the HMBC spectrum in that it showed three unique correlations between H-1 of G 1 and H-3 of G 0 , between H-1 of G 2 and H-3 of G 1 , and between H-1 of G 4 and H-3 of G 3 .
We determined that Gal G 3 was in a 136-linkage to Gal G 2 from the chemical shifts assigned to G 4 , G 3 , and G 2 (Table II) and from the ROESY spectrum (Fig. 4). The chemical shifts of G 2 are diagnostic of a Gal bearing an O-6 substituent (11,30), but not at O-3, and therefore corresponded to the Gal residue bearing an O-6 substituent evident in the glycosyl linkage analysis (Table I). The ROESY spectrum collected at 60°C (Fig. 4) showed a correlation between an H-1 signal at 4.71 ppm (H-1 of G 3 or G 4 ) and signals at 3.94 and 4.04 ppm arising from the H-6 protons of a neighboring Gal (Table II). Although G 4 like G 3 has an H-1 signal at 4.71 ppm, G 4 is a terminal residue and linked to O-3 of G 3 (discussed above). Thus, it is G 3 that is linked to O-6 of G 2 (Fig. 5). A NOESY spectrum collected at 30°C (not shown) corroborated that G 3 is spatially near G 2 as the spectrum showed correlations between the signal for H-1 of G 3 (4.71 ppm) and the H-6 signals of another Gal (3.94 and 4.04 ppm) and between the signal for H-1 of G 3 (4.71 ppm) and a signal at 3.91 ppm (H-4 of G 2 ).
Thus The final task was to determine the placement of the side chains along the galactan backbone. As discussed above, only G 0 and G 1 in the main chain galactan had ring systems with resonances characteristic of 3,6-linked Gal (Table II) and therefore are the sites for side chain attachment at O-6 ( Fig. 5), but which side chain occurs on G 0 and which on G 1 ?
A NOESY spectrum carried out at 30°C (not shown) had a signal at 4.71 ppm (H-1 of G 4 or G 3 ) that was correlated with another at 5.09 ppm (H-1 of A 1 in Table II and Fig. 5). This suggested that Ara A 1 in the Ara side chain, A 1 (135)-A 2 (13 3)-A 3 (133)-Gal-(13, was spatially near G 4 /G 3 and nearest the nonreducing end of AHP-1 as shown in Fig. 5. The 30°C NOESY spectrum was more ambiguous regarding the precise location of the side chain ␣-L-Rhap-(134)-␤-D-GlcpA-(13. The spectrum showed correlations between a signal at 4.68 ppm corresponding to H-1 of Gal G 1 or G 2 and at signal at 4.79 ppm corresponding to H-1 of Rha; thus we can only conclude the Rha residue is attached to one of the two side chains but could not identify which one. Given the extensive microheterogeneity of glycoprotein carbohydrate adducts, including HRGPs (3), both species undoubtedly exist in (Ala-Hyp) 51 -EGFP.

Galactan Backbone Subunits
The AHP-1 structure may help resolve a long-standing question of ''kinks'' in the arabinogalactan backbone, which presumably arise from periodic occurrences of either (135)-␣-L-Araf or (136)-␤-D-Galp linkages. These residues were identified by their sensitivity to periodate oxidation and the subsequent release of apparently small repetitive arabinogalactan subunits (38,39). However, as shown here, 5-linked Araf terminates a side chain. Thus, the single 6-linked Gal residue, if part of a truncated main chain repeat unit, could be the missing kink in the arabinogalactan main chain. Characterization of larger Hyp-arabinogalactans should help resolve the issue.

A Molecular Model of AHP-1
By using the structure of O-Hyp glyco-substituents and their location predicted from the Hyp contiguity hypothesis, it is now feasible to approach the molecular modeling of HRGPs. Figs. 5 and 6A show the proposed AHP-1 structure inferred from NMR and carbohydrate analyses: a 15-residue glycan based on a (133)-␤-D-Galp main chain with bifurcated tetra-and pentasaccharide side chains containing Ara, Rha, and GlcUA linked through a side chain Gal to the main chain. As 10 residues define the maximum size of an oligosaccharide (40), AHP-1 is a 15-residue borderline arabinogalactan polysaccharide, small enough for molecular modeling to produce an energy-minimized polysaccharide conformation and orientation for comparison and possible corroboration by the NMR structure. It also enabled a search for possible interactions between closely clustered polysaccharides and a test of H-bonding be-tween polysaccharides and the polypeptide backbone hypothesized earlier (41).
We adopted a strategy of constructing a small molecule containing O-Hyp-linked AHP-1 arabinogalactans, designated (Ala-Pro) 6 (Fig. 6, B and C); the arabinogalactans can be considered as bulky side chains of the polypeptide. After building and energy minimizing the polysaccharides, we glycosidically linked three of them to C-4 of the internal Pro residues in the 12-residue non-glycosylated peptide (Ala-Pro) 6 to form a typical AGP tight cluster consisting of three consecutive arabinogalactan glycomodules in the resulting glycopeptide: Ala-Pro-Ala-Hyp-Ala-Hyp-Ala-Hyp-Ala-Pro-Ala-Pro in which each Hyp had an AHP-1 substituent (underlined). The (Ala-Pro) 6 nonglycosylated backbone initially showing a polyproline II conforma-tion (PPII) with Ramachandran angles: Ϫ75 o and ϩ145 o (42) was re-energy minimized after each successive polysaccharide addition. A single AHP-1 polysaccharide contained three internal H-bonds (Fig. 6A) when energy-minimized in vacuo; energy minimization of AHP-1 in a 51-Å 3 periodic box containing 5833 water molecules yielded essentially the same conformation and H-bonding patterns. Polysaccharides attached to the peptide showed similar internal H-bonding but none between adjacent polysaccharides. Most interesting, however, the central polysaccharide of the cluster (polysaccharide B in Fig.  6, B and C) also showed H-bonding with the polypeptide backbone, although the earlier suggestion of extensive interaction between arabinogalactan polysaccharides and polypeptide backbone (41) is too simplistic. Indeed, rather than following FIG. 6. Space-filling CPK Models of AHP-1 (A), a side view of glycosylated (Ala-Pro) 6 (B), and an end-on view of glycosylated (Ala-Pro) 6 (C). Nitrogen atoms are shown in dark blue; the oxygen atoms are red; hydrogen atoms are gray; and carbon atoms are turquoise blue. Bar, 1 nm. A, Hyp arabinogalactan AHP-1. This energy-minimized structure is stabilized by three H-bonds. Two stabilize the G b side chain: one between Ara C 2 and Ara C 1 (the C-2 OH donates to the C-6 O)] and one between Gal G b and Gal G 0 (the C-2 OH donates to the C-4 O). A third H-bond stabilizes the G a side chain between Gal G 4 and Ara A 2 (the C-6 OH donates to C-2 O). Dimensions of AHP-1 are as follows: x ϭ 3.5 nm; y ϭ 1.6 nm; z ϭ 2.6 nm. Ara and Gal residue labels correspond to those of Fig. 5. B, Glycosylated (Ala-Pro) 6 . Side view of a polysaccharide cluster. Three AHP-1 glycans labeled A-C are O-linked to the Hyp residues of the glycosylated (Ala-Pro) 6 model (residues 4, 6, and 8 of the peptide). The protein backbone lies across the figure with the N terminus at the far left (nitrogen atoms are in dark blue). Note the close proximity of polysaccharide B (green) to the polypeptide backbone where the arabinose disaccharide residues C 1 and C 2 on the G b side chain form three H-bonds as follows: the hydroxymethyl (C-5) of Ara residue C 1 to both the carbonyl of Hyp residue 4 and to the peptide N of Ala residue 5; and the C-2 hydroxyl of Ara residue C 2 to the NH of Ala residue 6. In contrast, the arabinose trisaccharide residues at the tip of each polysaccharide form peripheral hook-like projections; these may result in multiple weak interactions (''molecular Velcro'') with the Yariv reagent (2) which specifically interacts with the arabinogalactans on AGPs. C, glycopeptide (Ala-Pro) 6 . End-on view of a polysaccharide cluster. Reorienting the polypeptide so it is perpendicular to the plane of the paper shows a syndiotactic propeller-like arrangement of the arabinogalactan polysaccharides around the polypeptide, providing surfaces for interactions and interdigitation with other matrix molecules. the contours of the polypeptide, the polysaccharides tend to splay out in the alternating syndiotactic manner with dimensions ranging from 3.5 to 5 nm across two polysaccharides orthogonal to the peptide backbone; the peptide / angles deviated significantly from the PPII conformation, fluctuating from Ϫ43 to Ϫ132 and from ϩ98 to ϩ153, indicating that the polysaccharides impose steric constraints deforming the polypeptide backbone, confirmed by the ''random coil'' CD spectrum of (Ala-Hyp) 51 and the partial restoration of 3-fold PPII helicity observed after its deglycosylation with anhydrous HF (Fig. 7). The low temperature NOESY spectrum indicated that the side chain Rha residue is close to the main chain Gal residues G 1 or G 2 and therefore favors rhamnose attachment to the G a side chain of G 1 ; the molecular model is consistent with this as it shows Rha closer to G 1 than G 2 . The model also shows two shallow pockets in AHP-1, the first lined by Gal, GlcUA, and Rha and the second by side chain Gal residue G a and main chain Gal residues G 1-3 . The main chain Gal residues G 0 , G 1 , and G 2 each have a 1,6-linked side chain substituent in close proximity to one another; this results in steric hindrance to their rotation around the 1,6-linkage, further enhanced by the H-bonding, and thus may also stabilize these shallow pockets as putative water-binding sites that may account for the nonfreezing water content identified by differential scanning calorimetry of gum arabic arabinogalactan polysaccharides (43).

Glycomodule Subunits
Judging from the similarities between gum arabic side chains (17) and diverse species ranging from rice (32) to grape berries (37), the AHP-1 arabinogalactan is a small consensus subunit that represents many AGP glycomodules. This confirms the insightful observation of Churms et al. (44) that ''any arabinogalactan of this general type, irrespective of its molecular complexity, can be regarded as being composed of uniform subunits'' and raises the distinct possibility that AGP biosynthesis involves their en bloc transfer from a lipid intermediate (16,45). ''Snapshots'' of small structural units in other arabinogalactans are also similar to AHP-1 albeit with some variations, possibly including fucose (6-deoxy-L-galactose) (1, 46) as a conservative replacement for Rha (6-deoxy-L-mannose). Other significant variants may include an alternative ␤-anomeric configuration for the terminal ␣-L-Araf (32), the presence of xylose (1), and fewer rhamnose and uronic acid residues (47,48). These putative variants need confirmation by their isola-tion and characterization as unique Hyp-polysaccharides or glycopeptides.
On the other hand, AGPs from diverse species like Acacia senegal (Leguminosae) (17), N. tabacum (18), and Lycopersicon esculentum (Solanaceae) (4) contain repetitive structures similar to AHP-1. This structural conservation points to common functions of cell-surface HRGPs that originally evolved in the green algae (49) where self-assembly of the wall involves modular extensin-like HRGPs to a lesser (50) or greater extent (51,52), with AGPs contributing to mucilage around cells (53). Higher plants have relegated AGP mucilage to the secondary role of wound-induced protective gum exudates (41), whereas primary roles remain less than obvious. However, concurrent work 2 suggests a dual role for AGPs based on the dimensions of the anionic polysaccharide structures described here and their quantitative distribution. Thus, monomeric AGPs may act as plasticizers in muro (54), whereas the classical AGPs at the membrane/wall interface, such as LeAGP-1 (4), may form a periplasmic polymer cushion that stabilizes the plasma membrane protecting it from the inherently high hydrostatic pressures in turgid plant cells. FIG. 7. CD spectra of glycosylated and deglycosylated (Ala-Hyp) 51 . The polyhydroxyproline standard (black) has a polyproline II conformation featuring a maximum at 225 nm and a minimum at 205 nm. Glycosylated (Ala-Hyp) 51 (green) displays a ''random coil'' structure compared with deglycosylated (Ala-Hyp) 51 (red) which shows an increase in partial polyproline II conformation, in agreement with the structures predicted by the molecular modeling shown in Fig. 6.