Structure-function analysis of human alpha1,3-fucosyltransferase. Amino acids involved in acceptor substrate specificity.

A series of molecular biology experiments were carried out to identify the catalytic domain of two human a1,3/4-fucosyltransferases (fucosyltransferases (FucTs) III and V), and to identify amino acids that function in acceptor substrate binding. Sixty-one and 75 amino acids could be eliminated from the N terminus of FucTs III and V, respectively, without a significant loss of enzyme activity. In contrast, the truncation of one or more amino acids from the C terminus of FucT V resulted in a dramatic or total loss of enzyme activity. Results from the truncation experiments demonstrate that FucT III (containing amino acids 62–361) and FucT V (containing amino acids 76–374) are active, whereas shorter forms of the enzymes were inactive. The shortest, active forms of the enzymes are more than 93% identical at the predicted amino acid level, but have distinct acceptor substrate specificities. Thus, FucT III is an a1,4-fucosyltransferase, whereas FucT V is an a1,3fucosyltransferase with disaccharide substrates. All but one of the amino acid sequence differences between the two proteins occur near their N terminus. Results obtained from domain swapping experiments demonstrated that the single amino acid sequence difference near the C terminus of these enzymes did not alter the enzyme’s substrate specificity. However, swapping a region near the N terminus of the truncated form of FucT III into an homologous region in FucT V produced a protein with both a1,3and a1,4-fucosyltransferase activity. This region contains 8 of the amino acid sequence differences that occur between the two proteins.

brane, stem, and catalytic (4,5). The presence of a stem region in these proteins is based on the observation that soluble, catalytically active forms occur naturally and are formed by proteolytic cleavage of the full-length enzyme in what is thought to be an extended protein domain (i.e. the stem region). The catalytic region is proposed to be a globular domain that contains the active site of the enzyme. Recently, we (6) have utilized a PCR based approach to determine which portions of the N-and C-terminal domains of murine and marmoset ␣1,3galactosyltransferases can be eliminated without loss of activity (i.e. catalytic domain). The location of the catalytic domain of other glycosyltransferases is unknown. In this report, the location of the catalytic domain of two members of the human FucTs, FucT III and V, is presented. These two proteins were chosen for analysis because they share a high level of amino acid sequence similarity and yet have distinct acceptor substrate specificity (7,8). As illustrated in Fig. 1, there are only 30 amino acid differences between the full-length sequences of these two proteins. In addition to these sequence variations, FucT V contains two regions not found in FucT III (shown as dashes in the FucT III sequence). A majority of the amino acid sequence differences between FucT III and V occur at the N terminus of these enzymes, with only one amino acid difference occurring over the last 200 amino acids. Therefore, precisely locating the catalytic domain should also identify the number of amino acid residues that account for the distinct acceptor substrate specificities of FucT III and V. The results presented in this study provide a precise location of the catalytic domains of FucT III and V and demonstrate there are less than 25 differences in the amino acid sequences of the catalytic domain of the two enzymes. Portions of the N-terminal domain from the truncated enzymes have been swapped producing chimeric proteins that were analyzed for acceptor specificity. These analyses demonstrated that the distinct acceptor specificity of the FucTs results from as few as 8 amino acid sequence differences.

Construction of Truncated Forms of Protein A-FucTs Fusion
Proteins-The vector pPROTA was designed to direct the synthesis of secretable, protein A fusion proteins in eukaryotic cells (e.g. COS-7 cells) (11). Primers (1-10, 15, and 17) were used for the synthesis of truncated protein A fusion forms of FucT III and V (Table I). The PCR product from each set of primers was digested with EcoRI (Promega, Madison, WI), separated on a 1% Agarose Low EEO (Fisher Scientific) and recovered with the QIAEX Gel Extraction Kit (Qiagen Inc., Chatsworth, CA) according to the manufacturer's protocol. The purified product was ligated into the EcoRI-digested, dephosphorylated plasmid vector, pPROTA using T4 DNA ligase (Promega). The resulting plasmids were propagated in the JM109 strain of Escherichia coli and transfected into COS-7 cells. The fusion product was secreted into the cell culture medium and purified by IgG-agarose chromatography as described previously (6,9). Western blot analysis was carried out after separation of the proteins by SDS-PAGE and transfer of the proteins to a sheet of nitrocellulose. The protein A-IgG binding domain of the chimeric proteins was detected by incubating the blot with the following reagents; (i) biotinylated rabbit IgG and (ii) goat anti-rabbit-alkaline phosphatase and an alkaline phosphatase substrate reagent.
Expression Studies Using the Cloning Vector pGIR-201 and the Eukaryotic Expression Vector pSVL-To obtain a soluble, secreted form of FucT without a protein A domain, Fuc-T DNA constructs were prepared using primers 11-13 (FucT V containing aa 65-374, aa 70 -374, or aa 76 -374), and 18 and 19 (FucT III containing aa 44 -361 and aa 59 -361). The upstream primers contain an XbaI restriction site and the downstream primer 14 contains a SacI restriction site. The PCR products were subcloned into the XbaI and SacI restriction sites (downstream from the dog insulin signal peptide gene) of pGIR-201. The FucT construct, along with the dog insulin signal peptide gene, was isolated by digestion with NheI and SacI and cloned into the XbaI and SacI sites of the mammalian expression vector pSVL. The resulting plasmid (designated Fuc-T-pGIR-201-pSVL) was used to transfect COS-7 cells and the FucT protein product was secreted into the cell culture medium. The medium was used as an enzyme source after being concentrated 50-fold by ultrafiltration (Amicon, Danvers, MA).
Domain Swapping Experiments-Domain swaps were designed based on the presence of conserved restriction enzyme sites in FucTs III and V, and a HindIII site in pPROTA. EcoRV and HindIII digestion of the plasmids described above produces two fragments of approximately 350 and 4150 base pairs. The smaller fragments encode amino acids 62-110 of FucT III (referred to as DNA Sequencing-Plasmid inserts were purified using the Wizard purification protocol (Promega) according to the manufacture's instructions, and sequenced in their entirety by the dideoxy chain termination method (12) using double stranded plasmid DNA template and commercial reagents (Sequenase, U. S. Biochemical Corp.). All of the FucT V constructs had a sequence that corresponded to that predicted from the sequence previously reported for the full-length enzyme (8). In contrast, the FucT III preparations contained a sequence identical to that predicted from the sequence previously reported for the full-length enzyme (7) except at two positions (nucleotide base 550 C to T (Leu to Phe) and nucleotide base 568 A to C (Thr to Ala)).
Fucosyltransferase Assays-Transfer of [ 3 H]fucose to oligosaccharide acceptor substrate was measured by an anion exchange chromatography procedure (9), whereas transfer to an acceptor with a nonpolar aglycone was measured by a Sep-Pak reverse phase cartridge procedure (13). The assay conditions were as described previously (9). Control assays with no added acceptor were run in parallel under the same conditions. Reactions were incubated for time periods which yield linear rates of fucose incorporation into the acceptors.

Truncation of FucTs III and V-
The current model of the protein domain structure of glycosyltransferases indicates that a significant portion of the N terminus of these proteins is not involved in the binding of substrates or the catalytic process (4,5). Therefore, our first goal was to determine what portion of the N terminus of FucT III and V could be eliminated without loss of activity. In a previous study we demonstrated that soluble forms of FucT III and V missing their first 51 and 62 amino acids, respectively, retained activity. To more precisely define the number of N-terminal amino acids that are required for enzyme activity a PCR based approach was used to create truncated forms of FucT V that lacked larger portions of its N-terminal amino acid sequence. These constructs were pre- pared as protein A chimeras and were assayed after their isolation by affinity chromatography on IgG-agarose beads as described under "Experimental Procedures." As shown in Table  II, several truncated forms of FucT V were prepared by PCR and each assayed for enzyme activity.
The results presented in Table II demonstrate that the truncated forms of FucT V with Ն299 amino acids (aa 76 -374) retained activity; shorter forms were inactive. The studies shown in Table II were done with a disaccharide (type 2) acceptor which is known to be a poorer acceptor than the corresponding H-type structure. To rule out the possibility that the inactive proteins had simply lost the ability to utilize a single acceptor, each chimeric protein identified as inactive with the disaccharide acceptor was tested with several acceptors (results not shown). Only one protein (i.e. FucT V with a 76-amino acid deletion at the N terminus aa 77-374) had activity with any of the acceptors. This enzyme was minimally active (ϳ5% as active as the longer enzyme forms) with an H-type 2 acceptor. Finally, the other active forms of FucT V were analyzed with a range of acceptor substrates and found to have an acceptor substrate specificity similar to the full-length enzyme as previously reported (9) (not shown).
Based on the FucT V results, a more limited analysis of FucT III constructs was done. Thus, a FucT III containing amino acids 62-361 had an activity equivalent to that obtained with forms of the enzyme previously characterized (9), whereas a shorter form (aa 67-361) was inactive. Therefore, a form of FucT III with Ն300 amino acids (aa 62-361) retained activity. Assays with various acceptors demonstrated that these forms of FucT III had a substrate specificity identical to that previously reported (7,9) (not shown).
The current model of the protein domain structure of glycosyltransferases indicates that the C-terminal portion of a glycosyltransferase constitutes the catalytic domain, but little is known about the importance of amino acids at the C terminus for catalytic activity. To investigate this issue, FucT V proteins were prepared which lack one or two of the C-terminal amino acids of the full-length enzyme. The results presented in Table  II demonstrate that removal of one amino acid from the C terminus of FucT V drastically alters catalytic activity. A pro-tein missing two of the C-terminal amino acids was inactive, even when tested with an H-type 2 acceptor.
It is possible that the lack of detectable enzyme activity for the shorter FucT III and V constructs is due to the fact that the COS cells do not secrete these forms of the proteins, or that these forms are rapidly degraded. To rule out these possibilities the medium from COS cells, transfected with plasmids containing inserts encoding various FucT constructs, was mixed with IgG-agarose beads and the bound proteins were analyzed on Western blots. Fig. 2 shows that the inactive constructs (Table  II) were produced and secreted into the medium. Furthermore, the relative amounts of inactive chimeric proteins were similar to that of the active proteins. Finally, these proteins appear to have the expected molecular weight and thus, inactivity does not appear to be due to proteolytic degradation.
Protein A-FucTs versus non-Protein A-FucTs-The truncation studies just described were done with chimeric forms of the FucTs which contained a protein A, IgG binding domain at their N terminus. To investigate whether this N-terminal mod-  ification (i.e. addition of the protein A, IgG binding domain) altered enzyme activity, we constructed truncated forms of the enzymes without the protein A N terminus. Using the plasmid pGIR-201 (10), which contains the sequence for the dog insulin signal sequence, and the eukaryotic expression vector pSVL, truncated forms of the FucTs without a protein A N terminus were prepared and analyzed for enzyme activity. Two forms of FucT III (44 -361 and 59 -361) and V (76 -374 and 65-374) were constructed and analyzed for activity. The shorter constructs of FucT III and V did not have detectable activity, but the longer forms were active. Therefore, the protein A N-terminal extension may provide some stability to the chimeric enzymes.
To further evaluate the two forms of FucTs, the acceptor specificity of the non-protein A forms of FucT III and V were compared with similar protein A constructs (Table III). Both forms of FucT III were most active with acceptors based on a type 1 core structure, whereas the FucT V constructs produced significantly more product with type 2 acceptor substrates. Although some quantitative differences were apparent, the overall acceptor specificity pattern was similar for each enzyme pair.
Domain Swapping between FucT III and V-The truncation studies with the protein A-FucTs demonstrate that proteins of approximately 300 amino acids can catalyze the ␣1,3/4-FucT III and V reactions. A comparison of the amino acid sequences of the truncated forms of FucT III and V is shown in Fig. 3 and demonstrates that the two truncated proteins only differ at 23 amino acid residues. All but one of these amino acid differences occurs near the N terminus of the truncated enzymes. Acceptor substrate specificity studies of the truncated enzymes demonstrated that these proteins have distinct acceptor specificities. This is most evident when the activity of these two enzymes is compared based on activity with LacNAc and lacto-N-biose I (compounds 1 and 4, respectively, in Table III). Taken together this information demonstrates that the amino acid differences occurring between the truncated forms of FucT III and V account for their acceptor substrate differences.
Domain swapping experiments were designed to obtain a more precise location of amino acids that contribute to the acceptor specificity of FucT III and V. Fig. 4 shows a diagrammatic representation of the four proteins that were constructed by the domain swaps. The first domain swap (FucT III-62-227/ FucT V-241-374) was designed to produce a protein that contained all of the amino acids found in FucT III except residue 336 (Asp) which was exchanged for the residue found in FucT V (Ala). This protein was active and had an acceptor substrate specificity similar to FucT III (Table IV). The second domain swap (FucT V-76 -240/FucT III-228 -361) produced a protein that contained all of the amino acids found in FucT V except residue 349 (Ala) which was exchanged for the residue found in FucT III (Asp). This protein was active and had a substrate specificity similar to FucT V. Thus, the single amino acid difference that occurs at the C terminus of FucT III and V does not contribute to the acceptor specificity of these enzymes.
The third and forth domain swaps were designed to produce proteins which contain approximately half (i.e. 8 and 12 amino acid sequence differences in regions designated 1 and 2, respectively) of the N-terminal amino acid differences occurring between FucT III and V. The third domain swap (FucT III-62-110/FucT V-124 -374) produced a protein containing 8 amino acids representative of FucT III, attached to the remaining sequence of FucT V. This protein catalyzed fucose transfer to both type 1 and type 2 acceptors and therefore, had a new and broader acceptor specificity than FucT III and FucT V. The  3. A comparison of the predicted amino acid sequences of the N-truncated forms of FucTs III (62-361) and V (76 -374). Sequences are based on those previously reported (7,8). Underlined amino acid residues in FucT III are those found to differ (i.e. L to F and T to A) in the constructs described in this report compared to those previously reported. The position of Cys 143 and Cys 156 of FucT III and V, respectively, is shown in outline type.

DISCUSSION
Very little information is currently available about the substrate binding and catalytic sites of glycosyltransferases. The studies that are available in the literature have provided information on amino acid residues that may form part of the nucleotide sugar binding site. For example, Yadav and Brew (14) have used a chemical modification approach to obtain evidence that regions near Lys 341 and Lys 351 of bovine ␤1-4 galactosyltransferase (␤1-4GalT) are involved in UDP-Gal binding. Aoki et al. (15) have obtained evidence that Tyr 309 of ␤1-4GalT is also involved in UDP-Gal binding. Recently, Wang et al. (16) used a site-directed mutagenesis approach to obtain evidence that Cys 340 is involved in UDP-Gal binding by ␤1,4galactosyltransferase. In a study of blood group A and B transferases, Yamamoto and Hakomori (17) have demonstrated the importance of a limited number of amino acid residues for determining the nucleotide sugar specificity (i.e. UDP-Gal versus UDP-GalNAc) of these enzymes. A recent study by Datta and Paulson (18) has provided evidence that some residues within the so called "sialyl motif" of sialyltransferases can influence the affinity of the ␣2,6-sialyltransferase for CMPsialic acid (18). Finally, we (19) have recently completed a study that demonstrates that Cys 143 and Cys 156 of FucT III and V, respectively, are in or near the binding site for GDP-Fuc. Our current efforts are directed at more precisely defining the amino acids that affect acceptor specificity.
The protein domain structure proposed several years ago for glycosyltransferases indicates that the C-terminal portion of these enzymes contains their catalytic domain. The results we have obtained in this study demonstrate that about 20% of the N-terminal amino acids of the FucT III and V sequences are not required for enzyme activity and therefore, constitute the other three recognized protein domains (i.e. cytoplasmic, transmembrane, and stem). In contrast, truncation of the C terminus of these enzymes results in their inactivation. We have reported similar results for two forms of ␣1,3-galactosyltransferase (6). In all of these cases approximately 300 amino acids have been found to be required for enzyme activity. However, this does not appear to be the minimum length required by glycosyltransferases since forms of ␤1-4 galactosyltransferase missing 127 out of 400 amino acids are active (15,16).
Our truncation studies also demonstrate that the two amino acid segments that occur in FucT V which make it 13 amino acids longer than FucT III are located between the transmembrane domain and the catalytic domain (i.e. in the stem region). Thus, FucT V has a stem region that is 42 amino acids long, whereas FucT III's stem region is approximately 13 amino acids shorter. This result is reminiscent of the observation made by Joziasse et al. (20) that mice produce three forms of ␣1,3-galactosyltransferase which differ only in the length of their stem regions.
The most important result of the truncation study was that active forms of FucT III and V only differ at 23 out of about 300 amino acid residues. Since these enzymes have distinct acceptor specificities, it allowed us to conclude that some or all of the amino acid differences occurring between the two enzymes must account for their distinct acceptor specificities. This led us to carry out a series of domain swapping experiments. The results of the domain swapping experiments allow us to conclude that: (i) the single amino acid differences at the C terminus of FucT III and V do not affect acceptor substrate specificity, (ii) a protein with either region 1 or 2 of FucT III has a type 1 acceptor specificity, (iii) the combination of region 1 of FucT III with region 2 of FucT V produces an enzyme with both type 1 and type 2 acceptor specificity, and (iv) a protein containing amino acids in region 1 of FucT V does not have a type 2 acceptor specificity, whereas one containing the 12 amino acids of region 2 of FucT V does. Taken together these results demonstrate that the amino acids in regions 1 and 2 of FucT III and V are critically involved in defining the acceptor substrate specificity of these two enzymes.
The recognition of complex oligosaccharides by proteins, including lectins, antibodies, and enzymes is accomplished primarily through interactions with particular hydroxyl groups on the carbohydrate, but van der Waals interactions also occur in most instances primarily through stacking of the underface of pyranose residues with aromatic amino acids (21). Many of the interactions occur through hydrogen bonds between the sugar hydroxyls and side chains of amino acids. In a recent study we evaluated the ability of FucTs III and V to utilize several modified forms (deoxygenated and containing modified amino groups on the GlcNAc residue) of type 1 and type 2 disaccharides as acceptors. An important result from this analysis was that both enzymes had an absolute requirement for a hydroxyl group at carbon C-6 of galactose. The minimum energy conformations of type 1 and 2 disaccharides have been recognized to FIG. 4. Diagrammatic representation of the four proteins produced by domain swaps. The portions of the diagram that are underlined by a single line represent FucT III sequences, whereas those underlined by the double line represent FucT V sequences. The top two constructs were designed to produce proteins which differ in only a single amino acid residue compared to their parent protein. The bottom two constructs where designed to produce proteins which contain approximately half (i.e. 8 and 12 amino acid sequence differences in regions designated 1 and 2, respectively) of the N-terminal amino acid differences occurring between FucT III and V. The lower case letters in the FucT III sequence represent the amino acid sequence differences between those originally published and those found in the FucT III sequence of the proteins analyzed in this study. have different molecular topographies. Since a correctly oriented Gal residue appears to be essential for enzyme activity (based on the absolute requirement of its C6-hydroxyl group), the minimum energy conformations of type 1 and 2 structures, in effect, invert the relative orientation of the Gal and GlcNAc residues of the disaccharides by approximately 180°. Therefore, the positions of the NHAc and CH 2 OH groups of the GlcNAc residue are effectively interchanged. The present study suggests that the active site of FucT III and V forms a pocket capable of discriminating between OH-6 of the Gal residue and either the NHAc or CH 2 OH group of the GlcNAc residue, and this differentiation is realized by interaction with amino acids in the N-terminal domain of these enzymes. Another interesting result from the domain swapping experiments was that the amino acids (Asp 336 and Ala 349 of FucT III and V, respectively) that differ near the C terminus of FucTs III and V did not alter their activity or acceptor specificity compared to the parent enzymes. Based on a report by Nishihara et al. (22) we had anticipated that the first domain swap shown in Fig. 4 would produce an inactive protein. These workers (22) had reported that the coding region of FucT III of some Lewis negative individuals contained a single nucleotide base change that resulted in an Asp 336 3 Ala mutation. This would produce a FucT III that contained a catalytic region similar to the first domain swap protein shown in Fig. 4. Thus, we had predicted that this domain swapped protein would be inactive. Recently, the same research group reported that their original sequencing results were incorrect and that the actual mutation in these Lewis negative individuals is Ile 356 3 Lys (23).
DNA sequencing of the FucT III and V revealed that all of the FucT V constructs had a sequence that corresponded to that previously reported for the full-length enzyme, whereas all of the FucT III constructs contained two nucleotide base differences compared to the sequence previously reported for the full-length enzyme. These differences many represent natural variations in the DNA sequence obtained from different sources of DNA. Our original template for cloning FucT III was human placental DNA, whereas the source of template for the original report on FucT III was the human tumor cell line A431 (7). Regardless of the origin, the resulting changes in the amino acid sequence for the FucT III proteins prepared for our studies did not have a major affect on enzyme activity or acceptor substrate specificity. This is in contrast to other single amino acid substitutions detected in Lewis negative individuals (22)(23)(24)(25)(26)(27). Furthermore, the domain swap construct that had an altered acceptor substrate specificity did not contain these amino acids and thus, these amino acids do not seem to be involved in acceptor substrate recognition.
During the review of our manuscript, Legault et al. (28) published a study which also demonstrated that a discrete peptide fragment within ␣1,3/1,4-fucosyltransferases is responsible for discriminating among different oligosaccharide acceptor substrates. They identified a so called "hypervariable region" in the fucosyltransferases that contains as few as 11 amino acids and participates in the binding of type I acceptor substrates. This area is very near the region we have found to affect acceptor substrate specificity. In contrast to the work presented here, Legault et al. (28) utilized full-length constructs of the enzymes and relied largely on cell surface staining with antibodies to various type I and II carbohydrate epitopes to analyze the effect of swapping different domains between fucosyltransferases. In spite of the differences between our approach and those of Legault et al. (28), the conclusions drawn are similar. This adds strength to the concept that a small peptide region at the N terminus of the enzymes' sequence-constant C terminus is critical for determining acceptor substrate specificity. Future studies will determine which amino acids in this region are critical for substrate recognition.
The results presented here offer some useful insights into the active site of glycosyltransferases. Identification of the amino acids that control acceptor substrate recognition will refine the domain structure that has defined glycosyltransferases for several years. Since several glycosyltransferases recognize either a type 1 or, more often, a type 2 acceptor it will be interesting to determine if a common set of amino acid residues can be defined among a group of glycosyltransferases that have similar acceptor substrate specificities.