Trefoil Factor Family Domains Represent Highly Efficient Conformational Determinants for N-Linked N,N′-di-N-acetyllactosediamine (LacdiNAc) Synthesis*

Background: Absent microheterogeneity of LacdiNAc N-glycan on human gastric TFF2 points to high stringency control mechanism. Results: Single intact TFF domains of TFF2 control the β4-GalNAc transfer to terminal GlcNAc residues as conformational determinants. Conclusion: The role of a hydrophobic patch is hypothesized to form the essential part of the determinant. Significance: The restricted expression of LacdiNAc on extracellular matrix proteins relates to important biological processes. The disaccharide N,N′-di-N-acetyllactose diamine (LacdiNAc, GalNAcβ1–4GlcNAcβ) is found in a limited number of extracellular matrix glycoproteins and neuropeptide hormones indicating a protein-specific transfer of GalNAc by the glycosyltransferases β4GalNAc-T3/T4. Whereas previous studies have revealed evidence for peptide determinants as controlling elements in LacdiNAc biosynthesis, we report here on an entirely independent conformational control of GalNAc transfer by single TFF (Trefoil factor) domains as high stringency determinants. Human TFF2 was recombinantly expressed in HEK-293 cells as a wild type full-length probe (TFF2-Fl, containing TFF domains P1 and P2), as single P1 or P2 domain probes, as a series of Cys/Gly mutant forms with aberrant domain structures, and as a double point-mutated probe (T68Q/F59Q) lacking aromatic residues within a hydrophobic patch. The N-glycosylation probes were analyzed by mass spectrometry for their glycoprofiles. In agreement with natural gastric TFF2, the recombinant full-length and single domain probes expressed nearly exclusively fucosylated LacdiNAc on bi-antennary complex-type chains indicating that a single TFF domain was sufficient to induce transfer of this modification. Contrasting to this, the Cys/Gly mutants showed strongly reduced LacdiNAc levels and instead preponderant LacNAc expression. The probe with point mutations of two highly conserved aromatic residues in loop 3 (T68Q/F59Q) revealed that these are essential determinant components, as the probe lacked LacdiNAc expression. The structural features of the LacdiNAc-inducing determinant on human TFF2 are discussed on the basis of crystal structures of porcine TFF2, and a series of extracellular matrix-related LacdiNAc-positive glycoproteins detected as novel candidate proteins in the secretome of HEK-293 cells.

The initiation of glycosylation is generally determined by structural features in proximity to the glycosylation site; however, evidence has been provided for the existence of additional control mechanisms on the initiation level. A cis-located peptide stretch together with a specific amino acid pattern near O-glycosylation sites within the mucin domain were shown to be necessary for one of the rare types of protein-specific O-glycosylation, i.e. mammalian O-mannosylation (1). A similar type of double control was shown to be exerted by the peptide core on peripheral modifications with LacdiNAc of N-linked chains (2). Both observations have in common that a peptide stretch located distant from the putative glycosylation site represents a necessary prerequisite for the protein-specific glycosylation. These cis-controlling peptide elements provide another level of protein specificity to the glycosylation event. Other examples of protein sequence-or signal patch-controlled peripheral glycan modifications are found in the modification of N-linked chains with mannose 6-phosphate, the modification of neural cell adhesion molecule (NCAM) with polysialic acid, and the extension of O-fucose with GlcNAc by Fringe (3).
The peripheral modification of N-linked glycans with Lacdi-NAc 2 represents a unique protein-specific glycosylation found on a restricted number of glycoproteins and appears to play crucial roles in the regulation of circulatory half-lives of hormones (4) and in cell recognition (5). The modification of N-linked glycans was detected in a series of mammalian glycoproteins including the glycoprotein hormone LH (luteinizing hormone) (6), glycodelin (7), prolactin-like proteins (8), proopiomelanocortin (9), SorLA/LR11 (10), sialoadhesin (11), tenascin-R (12), and carbonic anhydrase VI (13). On glycodelin the LacdiNAc modification was claimed to have contraceptive functions. As a peripheral modification of O-linked chains, LacdiNAc modifications were found on murine zona pellucida glycoprotein ZP3 (14), POMC (15), and on the extracellular matrix glycoproteins nidogen-1, extracellular matrix protein 1, AMACO, ␣-dystroglycan, and neurofascin (16). The LacdiNAc dihexosamine on N-and O-linked chains has been shown previously to be the substrate for further modification by fucosylation, sialylation, and sulfation and, as a specific feature of O-linked chains, by phosphorylation of the subterminal GlcNAc residue (16).
Two isoforms of the ␤4-GalNAc-transferases, T3 and T4 (␤4GalNAc-T3, ␤4GalNAc-T4), were described to be responsible for protein-specific LacdiNAc synthesis (17,18). The group of Baenziger and co-workers (2) provided evidence that for induction of ␤4GalNAc-T3 and -T4 activities, a 19-meric peptide sequence within the target protein (LRRFIEQKITKRK-KEKYWP) is necessary and sufficient. This determining cislocated peptide is characterized by a high content of basic amino acids and an ␣-helical structure. It is located at the C terminus of carbonic anhydrase VI and could induce the modification of N-linked chains on a normally unmodified protein (transferrin) after recombinant translocation of the peptide. We recently obtained evidence that the human Trefoil factor 2 (hTFF2 or TFF2) expressed in gastric mucosal secretions carries exclusively fucosylated LacdiNAc antennae on a biantennary N-linked chain located at Asn-38 in loop 1 of the N-terminal P1 TFF domain (19). No obvious microheterogeneity of N-glycosylation became apparent at this site except for a minor under-fucosylation of the LacdiNAc moieties pointing to a LacdiNAc inducing determinant with high stringency. Strikingly, the amino acid sequences of TFF2 downstream or upstream of the N-glycosylation site lack any similarities on the level of primary and secondary structures with the previously published LacdiNAc synthesis-inducing determinant.
Based on this observation we hypothesized that a conformationally stabilized determinant, forming a structural part of TFF domains, may have LacdiNAc-inducing capacities. To prove this assumption we expressed TFF2 as a recombinant fulllength probe in HEK-293 cells. HEK-293 cells represent a well established cellular model in the context of LacdiNAc biosynthesis (20) as they express both enzymes, ␤4GalNAc-T3 and -T4, involved in LacdiNAc synthesis (17,18). These two enzymes do not compete with other ␤4GalNAc-Ts, the isoenzymes T1 and T2, which are involved in ganglioside biosynthesis by converting GM2 to GD2 (␤4GalNAc-T1) or in the Sda/ Cad blood-group synthesis (␤4GalNAc-T2). However, a competition with ␤4Gal-T1 (lactose synthase), the enzyme involved in LacNAc synthesis, has certainly to be expected as this enzyme uses the same substrates.
Besides a full-length wild type probe (hTFF2-Fl), we generated a series of truncation probes in HEK-293 cells corresponding to the N-terminal TFF domain P1 (N-P1, carrying the natural N-glycosylation site) or to the C-terminal TFF domain P2 (C-P2*, carrying a designed new N-glycosylation site). Moreover, we generated and expressed the Cys/Gly mutant variants with defective TFF domains TFF2-Fl-(C52G) and TFF2-Fl-(C42G) and a double point mutant lacking aromatic residues within a putative hydrophobic patch N-P1-(W68Q/F59Q). Based on these glycosylation probes and mass spectrometric analyses of their N/O-glycoprofiles, we could provide evidence that the LacdiNAc-inducing capacity is actually confined to conformationally intact TFF domains and that this capacity is strictly dependent on highly conserved aromatic residues in loop 3 of the TFF domains, which form essential components of a hydrophobic cleft with putative protein binding capacity.

EXPERIMENTAL PROCEDURES
Materials-The applied chemicals were generally purchased from Sigma. The quality was for analysis. Deviations are indicated. DNA, primers, and proteins were handled at ϩ4°C or on ice, and long time storage was at Ϫ20°C or Ϫ80°C.
Cells and Cell Culture-Culture media and cell culture flasks were purchased from Biochrom (Berlin, Germany). Human embryonic kidney cell line HEK-293 (Invitrogen) was cultivated at 37°C and 5% CO 2 , humidified in Dulbecco's minimal essential medium (DMEM), and supplemented with 5% fetal calf serum (FCS), 100 units/ml penicillin, and 100 g/ml streptomycin. Lipofectamine 2000 Reagent (Invitrogen) was utilized for all transfection procedures. Selection of transfected cells was established with 5 g/ml puromycin (Sigma) as an additional culture media supplement. At a confluence of ϳ80% cells were triple rinsed with PBS and kept in culture for 3 days in medium lacking FCS. Supernatants were collected and stored at Ϫ20°C until purification of recombinant protein via affinity tag.
Preparation of Proteins from Cell Culture Supernatants-500 ml of supernatants were centrifuged for 15 min at 10000 rpm, ϩ4°C. 500 mM NaCl, 20 mM Na 2 HPO 4 , 10 mM imidazole, 25 l of PMSF (1 M in DMSO) were added. The pH was adjusted to 8.0 at ϩ4°C with HCl. Supernatant was then passed through a cellulose filter and applied to the nickel-nitrilotriacetic acid column twice in a row. For detailed description of the procedure refer to Breloy et al. (16). Eluted fractions containing the recombinant protein were pooled then concentrated and desalted via Amicon centrifugation units, molecular mass cutoff 10 kDa (Millipore, Darmstadt, Germany).
Gel Electrophoresis and Western Blotting-Aliquots of the recombinantly expressed fusion proteins were treated before Western blot analysis by thrombin digestion to remove the strep tag. Proteins in 5-20% or 15% SDS gels were either stained with silver or blotted onto nitrocellulose (Protran BA 83, Schleicher & Schüll) in a wet blot transfer cell (Bio-Rad) for antibody detection or onto PVDF membranes in a semidry blot chamber (both from Bio-Rad) for lectin detection (16). Fusion proteins were detected with anti-strepII mouse IgG (IBA, Göttingen, Germany). Modification with LacdiNAc was tested either with the monoclonal mouse antibody 273-372 (kindly provided by Dr. C. H. Hokke, Leiden University Medical Clinic, Leiden, The Netherlands; 1:5 diluted culture supernatant, 2 h at ambient temperature) or with biotinylated Wisteria floribunda lectin (Sigma, 10 g/ml, 2 h at ambient temperature). Horse radish peroxidase (HRP)-conjugated rabbit anti-mouse IgG (DAKO, Hamburg, Germany) was applied as a secondary antibody or Strep-tactin-HRP (IBA) for staining of lectin binding. Immunolabeled protein was detected by enhanced chemiluminescence (Roche Applied Science). An aliquot of the full-length probe hTFF2-Fl was pretreated with 0.1 M aqueous TFA at 80°C for 1 h to chemically desialinate and partially defucosylate the glycoprotein before W. floribunda (WFA) staining.
Proteomics of Wild Type and Mutant Probes-Intact probes (1 g of protein) were digested with chymotrypsin in 50 mM ammonium bicarbonate, pH 8.0, at 37°C for 4 h using an enzyme to substrate ratio of 1:50 (w/w). After heat denaturation of the protease (10 min at 90°C), the resulting peptide mixture was further digested with trypsin (both enzymes were sequencing grade from Promega (Mannheim, Germany)) in the same buffer at 37°C overnight using a 1:50 enzyme substrate ratio (w/w). After reduction of disulfides and carbamidomethylation of cysteine residues, peptides were analyzed by LC-electrospray ionization mass spectrometry as described (22).
Enzymatic Cleavage and Methylation of N-Glycans-Dried protein was dissolved in 100 l of 50 mM NH 4 HCO 3 and submitted to tryptic digestion at 37°C for ϳ16 h. Trypsin was inactivated at 95°C for 5 min. Digested sample was dried by vacuum centrifugation and solubilized in 50 l of 50 mM NH 4 HCO 3 . An aliquot of 0.5 l of peptide N-glycosidase F (New England Biolabs, Frankfurt, Germany) was added, and N-deglycosylation was performed at 37°C for at least 16 h. Liberated N-glycans were separated from peptides by solid-phase extraction on C18 cartridges (Agilent Technologies, Waldbronn, Germany). N-Glycans in the flow-through were dried by vacuum rotation and in a desiccator for 1 h in the presence of P 2 O 5 /KOH. Methylation was performed in 100 l of water-free DMSO containing finely dispersed NaOH (30 min at 22°C) followed by the addition of 50 l of methyliodide (30 min at 22°C). After methylation, 300 l of chloroform was added, and the sample was repeatedly extracted with 200 l of water. The chloroform phase was dried under N 2 , and methylated glycans were solved in 20 l of methanol.
␤-Elimination of O-Glycans and Permethylation of Glycan Alditols-The glycan chains were released from the protein by reductive ␤-elimination. For this purpose the glycoproteins were incubated with 1.0 M NaBH 4 in 50 mM NaOH for 18 h at 50°C. The reaction was stopped by adding 2 l of acetic acid. Salt was removed with 100 l of Dowex 50WX8 aqueous suspension (Bio-Rad) in a batch procedure. Excessive borate was co-distilled as methyl ester in a stream of nitrogen by adding several 0.1-ml aliquots of 1% acetic acid in methanol. Permethylation of the glycan chains was performed as described above and in Breloy et al. (16).
Matrix-assisted Laser Desorption Ionization (MALDI)-TOF-TOF Mass Spectrometry-MALDI mass spectrometry was performed on an UltrafleXtreme instrument (Bruker Daltonics). The permethylated glycans (ϳ500 ng) contained in methanol were applied to the stainless steel target by mixing a 0.5-l aliquot of sample with 1.0 l of matrix (saturated solution of 2,5-dihydroxy benzoic acid in acetonitrile, 0.1% TFA, 1:2). Alternatively, 0.75 l of sample were mixed with 0.75 l of ␣-cyano-4-hydroxycinnamic acid (Bruker, Bremen, Germany) matrix solution on the MALDI target and subjected to MS analysis as described (16,21). Peptide samples were analyzed after co-crystallization of aqueous solutions in 0.1% TFA, 50% ace-  tonitrile and a saturated matrix solution of ␣-cyano-4-hydroxycinnamic acid in the same solvent. Analyses were performed by positive ion detection in the reflectron mode. Ionization of cocrystallized analytes was induced with a pulsed Smart-beam laser (accumulation of ϳ5000 shots), and the ions were accelerated in a field of 20 kV and reflected at 23 kV. MALDI-MS/MS was performed by analysis of post-source decay fragments in the laser-induced dissociation mode. Fragment annotation of glycans was assisted by application of the Glyco-Workbench tool (23).
Analysis of (Glyco)peptides-(Glyco)peptide analysis of the recombinant proteins was performed by LC-MS/MS of tryptic peptides on an electrospray ionization iontrap, the HCT ultra ETDII PTMDiscovery-System (Bruker-Daltonics) coupled with an online easy-nano-LC system (Proxeon, Odense, Denmark). The sample was separated on an analytical C18 column (75 m ϫ 10 cm) using gradient runs from 0 to 35% acetonitrile in 0.1% TFA during 30 min. Ions were scanned with 8100 atomic mass units/s in a range from m/z 300 to 2500 in MS mode and m/z 200 to 3000 in MS/MS mode. MS/MS spectra were generated by collisioninduced dissociation fragmentation (16).
Affinity Chromatography on WFA of Glycoproteins in the Secretome of HEK-293 Cells-Culture supernatants of HEK-293 cells transfected with hZP3 (p29 -176) as an internal Lac-diNAc-positive standard (16) were collected after 3 days. A volume of 200 ml of supernatant was ultracentrifuged for 30 min at 10,000 rpm (4°C) and dialyzed overnight at 4°C against water. After drying by vacuum rotation, the sample was resolubilized in 500 l of Tris-buffered saline, pH 7.4, with 100 mM glucose (TBS-Glc) and applied onto a 1-ml column of W. floribundaagarose (Vector Laboratories, Burlingame, CA) equilibrated with at least 10 ml of the same buffer. After overnight column circulation of the sample at 4°C, the column was washed with 10 column volumes of TBS-Glc at 7 ml/h, and bound proteins were eluted consecutively with 10 ml of TBS containing 100 mM lactose or 25 mM GalNAc. Each fraction was dialyzed and concentrated by centrifugation in Amicon Ultra tubes (Millipore) before mass spectrometric identification of proteins.

Generation and Expression of Full-length, Truncation, and
Point-mutated hTFF2 Probes-The full-length construct (TFF2-Fl) generated in this study comprises 106 genuine TFF2 amino acid residues, which are extended N-terminally by a 23-meric BM40 signal peptide and C-terminally by an oligo-histidine/ strep-tag peptide ( Figs. 1 and 2). Derived from this wild type probe, we generated truncated constructs corresponding either to the N-terminal P1 TFF domain (N-P1, p24 -78) or to the C-terminal P2 domain (C-P2, p74 -129). Mutation of Asp-87 to Asn and of Arg-89 to Thr in C-P2 resulted in an artificial N-glycosylation site in the C-P2 probe (refer to C-P2*). Mutant probes with aberrant disulfide structures in the P1 domain were generated by C52G mutation (expectedly affecting loop 3 formation) and by C42G mutation presumed to result in the formation of a superloop comprising loops 1 and 2. Peptides generated by sequential chymotrypsin and trypsin digestion of recombinant probes (followed by cystine reduction and cysteine alkylation) revealed evidence for at least partial changes of disulfide bridge configurations in the C52G-mutated probe TFF2-Fl-(C52G) compared with the wild type full-length probe TFF2-Fl by distinct proteolytic cleavage patterns (data not shown). Striking differences in the peptide patterns are apparent particularly for those TFF2 peptide stretches that link the two domains P1 and P2.
The expression and purity of the probes was proven after affinity isolation via their oligo-histidine tags by mass spectrometric proteomics (data not shown) and by gel electrophoresis (Fig. 3). The tagged probes revealed shifts in their apparent masses when analyzed before and after thrombin cleavage of the tags (Fig. 3).
Immunochemical Evidence for LacdiNAc Expression on hTFF2 Probes-Natural glycoforms of TFF2 from human stomach had been tested negatively for LacdiNAc expression when using mouse monoclonal antibody 273-372 and shown to be weakly stained with the lectin from WFA (17). Both findings can be explained by the nearly complete substitution of the N-linked LacdiNAc termini on TFF2 with fucose ␣3-linked to the subterminal GlcNAc. Similar results were obtained with the recombinant TFF2 probes generated in this study, i.e. no staining with antibody 273-372 and sporadically very weak staining with WFA (not shown). To prove expression of LacdiNAc on hTFF2-Fl, we demasked the glycotope recognized by WFA by chemical desialination and partial defucosylation. As shown in Fig. 3, this treatment resulted in considerably increased stain- ing of a defined protein band at 18 kDa (in agreement with the natural glycoform expressed in human stomach; see Ref. 19). Efficiency of desialination and partial defucosylation were checked by MALDI mass spectrometric analysis of enzymatically released and permethylated N-glycans (see below).
N-Glycoprofiling of TFF2 Truncation Probes: Single TFF Domains P1 and P2 Have the Capacity to Induce LacdiNAc Formation-Natural TFF2 (p24 -129) consists of two symmetrically positioned TFF domains, the N-terminal glycosylated P1 and the C-terminal P2 domain. We wanted to address the question of whether a single domain exhibits the structural qualities of a determinant that induces LacdiNAc formation. For this purpose the full-length and two truncation probes were generated corresponding to P1 (p24 -78) and P2 (p74 -129), all exhibiting an N-glycosylation site in loop 1. Expression of the three constructs in HEK-293 cells and mass spectrometric analysis of the affinity-purified probes revealed N-glycoprofiles similar to the natural gastric protein (19), which was characterized by a core fucosylated bi-antennary N-glycan with monofucosylated LacdiNAc antennae (refer to the dominant signal at m/z 2674.5 in Fig. 4 corresponding to the sodium adduct of the dodecasaccharide F3H3N6; see Table 1). The dominant glycan was accompanied by an under-fucosylated (F2H3N6 at m/z 2500.4) and by a monosialinated species (S1F2H3N6 at m/z 2861.6). After chemical desialination and partial defucosylation, the dominant signal found at m/z 2294.5 (data not shown) corresponds to the previous base peak at m/z 2674.5 (Fig. 4) that had lost two fucose residues in the antennae (MϩNa-32) as revealed in MS2 by formation of B ions at m/z 260 and 505 (data not shown).
The dominant signals in the spectra shown in Fig. 4 were selected as precursor ions for analysis by post-source decay MALDI-MS/MS (Fig. 5) to elucidate the structural features of the antennae. We demonstrate the presence of antennary dihexosamines HexNAc 2 giving rise to the B2 ion at m/z 505 (derived from the precursor at m/z 2500.4, not shown), monofucosylated HexNAc(Fuc)HexNAc giving rise to formation of the B1 ion at m/z 260, and B2 ions at m/z 679 or 701, respectively, derived from the precursor at m/z 2674.5 (Fig. 5A).
Fragments of the precursor at m/z 2861.6 indicated the presence of monosialylated and monofucosylated antennae: NeuAc-HexNAc-HexNAc (B1 ion at m/z 376, B3 at m/z 866 or 888) and HexNAc(Fuc)HexNAc (B1 ion at m/z 260, B2 ion at m/z 701 (Fig. 5B). A further major N-glycan species was detected at m/z 2820.6 and assigned as a biantennary N-glycan with one NeuAc-Hex-HexNAc (sialyl-LacNAc) and one Hex-NAc(Fuc)HexNAc antenna (Fig. 5C). Structural assignments of fragments with respect to glycan sequences of the most relevant signals are given in Fig. 5.
N-Glycoprofiling of TFF2 Cys/Gly-mutated Probes: TFF Domains Need to Be Conformationally Intact to Induce Lacdi-NAc Synthesis on Asn-38 -To define the structural requirements of the LacdiNAc inducing determinant, we generated Cys/Gly mutant probes with aberrant disulfide bridge patterns. In the full-length construct TFF2-Fl-(C52G) Cys-52 (the third cysteine in the 1-5, 2-4, 3-6 disulfide bridge configuration of TFF domains) is mutated to Gly, which should prevent loop 3 formation in the P1 TFF domain and result in an aberrant disulfide bridge pattern (Fig. 2B). In TFF2-Fl-(C42/G), where the second cysteine in the TFF domain-characteristic disulfide configuration is mutated, the missing cysteine fuses loops 1 and 2 to a superloop and should also provoke aberrant disulfide bridge formation (Fig. 2B). The N-glycoprofiling of these two mutant probes revealed a strikingly different pattern of major glycans that was dominated by core-fucosylated biantennary glycans with LacNAc antennae (Fig. 6B and Table 1). Only a minor portion (ϳ12% of total glycans derived from TFF2-Fl-(C52G) contained LacdiNAc antennae, which contrasts with the pattern of the wild type full-length (TFF2-Fl) and the single TFF domain probes (N-P1, C-P2*), where on the average ϳ84% of the total glycans had been found to contain LacdiNAc antennae. The most abundant N-glycan species in these profiles were detected at m/z 2792 and 3602, respectively, corresponding to the biantennary or triantennary N-glycans with NeuAc-Hex-HexNAc antennae and lacking core-fucosylation.

N-Glycoprofiling of N-P1-(W68Q/F59Q)-mutated Probe with an Affected
Hydrophobic Patch-To demonstrate the potential effects of highly conserved aromatic residues in loop 3 on the formation of N-linked LacdiNAc, we expressed a double pointmutated probe N-P1-(W68Q/F59Q) in HEK-293 cells and analyzed its N-glycoprofiles ( Fig. 6C and Table 1). As hypothesized, the MS1 spectrum did not reveal any molecular ion signals that would correspond to the expected masses of LacdiNAc-positive N-glycans. Moreover, the N-glycoprofile of this probe showed dramatic shifts from complex-type glycans to high mannose type (M5-M9), indicating incomplete processing of the chains. The dominant signals were found at m/z 1526.1 (M5) and 1730.2 (M6) representing M-54 ions that arise by loss of sodium methylate. Compared with the profile obtained for the wild type single domain probe N-P1, the double point-mutated probe N-P1-(W68Q/F59Q) exhibits a totally aberrant N-glycoprofile, which supports the assumption that these highly conserved aromatic residues may play an essential role in N-glycan processing and followup glycosylation.
O-Glycoprofiling of TFF2 Probes-Although no evidence for O-linked glycans on the natural human TFF2 was obtained, we detected considerable O-glycosylation of the recombinant probes TFF2-Fl and N-P1 (Fig. 7). The O-glycoprofile was devoid of any LacdiNAc-containing oligosaccharide, as neither the core 2-based tetrasaccharide (m/z 1024) nor any of its fucosylated (m/z 1198) or sialylated derivatives (m/z 1385) were detectable. According to glycopeptide analyses performed by LC-MS of tryptic peptides, we could localize the O-glycans to the peptide SLVPR (calculated mass 571.4 Da), which is part of the C-terminal-located oligo-His and strep2-tag sequence.

Exp. rel. mass M؉Na (m/z) Glycan
hTFF2 constructs a Only two major glycan species were detectable as sample amounts of Fl-(C42G) were significantly lower than those of other TFF2 probes. OCTOBER 24, 2014 • VOLUME 289 • NUMBER 43

WFA Affinity Chromatographic Enrichment of LacdiNAcpositive Glycoproteins from the Secretome of Human HEK-293
Cells-To identify new candidate proteins forming substrates of the glycosyltransferases ␤4GalNAc-T3 or -T4, we analyzed affinity-enriched fractions of glycoproteins from the secretome of HEK-293 cells that bound to immobilized WFA and eluteable with GalNAc or lactose ( Table 2). In replicate experiments we identified by mass spectrometric proteomics a series of 12 potentially LacdiNAc-positive secreted N-or N,O-glycoproteins mostly related to the extracellular matrix, whereas 8 glycoproteins were found in only one replicate. Among the former were galectin-3-binding protein, peroxidasin, two nidogen isoforms, several laminin isoforms, the metalloproteinase TIMP1, fibulin-1, fibrillin-2, clusterin, cochlin, and collagen 6␣1. For nidogen-1 we previously reported diHexNAc-positive O-glyco-peptides as potentially LacdiNAc-expressing (16). For some of the proteins listed in Table 2 it cannot be ruled out that their binding to the affinity matrix was mediated by protein-protein interactions, as for example in the case of galectin-3-binding protein and its known laminin-binding interactors fibro nectin, nidogen-1, and collagen 6.  number of these may be estimated to range at ϳ40 glycoproteins. Although this is still a limited number, the question arises whether all these proteins may have a common structural element that could serve as a determinant with LacdiNAc-inducing capacity. Most of the known LacdiNAc-positive N,O-glycoproteins have neither a CA6-related cis-controlling peptide element in their primary structures nor do they contain TFF domains. These considerations raise the question of whether particular structural features of TFF domains may underlie the observed phenomenon. We, therefore, aligned the primary structures of human TFF2 with those of other mammalian TFF domain-containing proteins to reveal highly conserved amino acids (Fig. 8). Beyond the characteristic cysteine patterns we found highly conserved positions in loop 2 (corresponding to the hTFF2 positions Ile47 and Thr48) and in loop 3 (corresponding to the hTFF2 positions Phe-108 and Trp-117). In particular the latter two aromatic residues were claimed to stabi-lize the fold of the domain and to play a role in protein binding (24).

Structural Features of TFF Domains That Could
Sequence alignments of the LacdiNAc-positive neuropeptide hormone ␤-subunits (FSH, LSH, TSH) with TFF2 revealed considerable similarities, in particular with respect to the cysteine positions relative to the N-glycosylation site (Fig. 9). The cysteines at Ϫ7 and ϩ4 were conserved, whereas the cysteine at ϩ20 in TFF2 was shifted to ϩ27 (FSH, LSH) or ϩ29 (TSH). In TFF2 domain P1 cysteines at Asn-38 Ϫ7, ϩ4, and ϩ20 correspond to TFF domain cysteines I, II, and V, and the disulfide bridge of cysteines I and V contributes to formation of loop 1. The disulfide bridge pattern in the FSH ␤-subunit is strikingly different, as six disulfide bridges are formed (Cys-21-Cys-69, Cys-35-Cys-84, Cys-38 -Cys-122, Cys-46 -Cys-100, Cys-50 -Cys-102, Cys-105-Cys-112), and no overlap with the disulfide bridge pattern of TFF2 becomes obvious, as Cys-35 in FSH␤ corresponds to cysteine I of the TFF2 domain and Cys-69 (FSH␤) to cysteine V (TFF2), for example. A subset of the disulfide bridge pattern in FSH␤ (Cys-21-Cys-69, Cys-46 -Cys-100, and Cys-50 -Cys-102) constitutes a cysteine knot motif similar to that found in the growth factor superfamily.
TFF Domains as Conformational Determinants-TFF domains were identified in this study as conformational determinants inducing LacdiNAc formation on N-linked glycans of human TFF2. The LacdiNAc-inducing capacity was so strong that even though ␤4Gal-T1 (lactose synthase) expression in HEK-293 cells is in the same range as ␤4GalNAc-T4 expression (Model Organism Protein Expression Database), the latter overrides the former by increasing the N-linked LacdiNAc/LacNAc ratio

Glycosylated (N-or O-linked)
LG3BP to ϳ10. The capacity of LacdiNAc induction was not confined to the native two-TFF domain structure of TFF2 but was seen also when monomeric (single) P1 or P2 domains of TFF2 were expressed as N-glycosylation probes N-P1 or C-P2*. Mutation of Cys-52 to Gly resulted in a nearly complete loss of LacdiNAc inducing capacity, and similar effects were seen in the Cys-42 mutant. The ␤4-GalNAc transfer-inducing TFF domain needs to be in close proximity to the N-glycosylation site, as there was no compensating effect exerted by an intact P2 domain on LacdiNAc formation in the Cys/Gly-mutated P1 domain. An even more pronounced effect on N-glycosylation and LacdiNAc formation was revealed for the double point-mutated probe, which lacked components of a hydrophobic patch. To understand the strong effects seen with the latter mutant probe, three-dimensional views of the crystal structures of the struc-turally closely related porcine spasmolytic protein (pSP, pTFF2) were inspected. The Potential Role of Hydrophobic Patches-In pSP aromatic residues and a series of other hydrophobic residues were found to line a cleft (Fig. 10), which has been suggested to bind mucin glycoproteins, and these aromatic rings were claimed to play also a role in stabilization of the fold of the domain (24). Two of these aromatic residues are highly conserved in most TFF domains throughout mammalian species and even in Xenopus mucins (Fig. 8). These are Phe-59 (corresponding to Phe-36 in the expressed protein) and Trp-68 (or Trp45) in the P1 domain (see also Phe-108 and Trp-117 in P2) both flanking loop 3. According to the crystal structure of porcine spasmolytic protein (25)(26)(27), the aromatic residues corresponding to Phe-59/ Trp-68 in the TFF2 P1 domain are located within the strand 1 (Phe-59) and strand 2 (Trp-68) of the two-stranded antiparallel ␤-sheet at the central core region of loop 3 (Fig. 10). These aromatic residues together with other aromatic/hydrophobic residues clustering around the ␤-turn in loop 2 (Gly-43, Phe-44, Gly-46, Ile-47, Phe-53, Val-63, Gly-65, Val-66) form a hydrophobic binding pocket that might be involved in protein-protein interactions. In accordance with this, the double mutant TFF2-(W68Q/F59Q) was devoid of any LacdiNAc expression and showed a shift from complex to high mannose-type N-glycosylation.
Similar Hydrophobic Patches in Other LacdiNAc-expressing Proteins-The above discussed structural features of TFF2 resemble the cystine-knot modules of the neuropeptide hormones, where a number of hydrophobic residues are exposed in a loop-like structure between the two highly twisted doublestranded ␤-sheets (28).
TFF domains are also found in hZP1 (Fig. 8) but not in the LacdiNAc-positive hZP3 and most of the other structurally defined or proposed LacdiNAc-positive glycoproteins. According to the above discussed features of a conformationally stabilized hydrophobic patch, it can be assumed that similar patches may be involved in the control of LacdiNAc formation on other glycoproteins. Two highly conserved regions of the N-terminal ZP-N domain on chicken ZP3, for example, expose hydrophobic residues (Tyr-111 and Tyr-141, both conserved in human ZP3) and do not overlap with sites that can be glycosylated (29).
In human nidogen-1, previously reported to express LacdiNAc (16), an 11-stranded ␤-barrel is characterized by a patch on the barrel surface, which is conserved in all metazoan nidogens. Site-directed mutagenesis shows that the residues in the conserved patch are involved in the binding of perlecan (30). Among these surface-exposed residues two invariant tyrosines (Tyr-431 and Tyr-440) and one valine (Val-433) may contribute to the formation of a hydrophobic patch.
Potential Role of Cis-located Peptide Elements-Inspection of the conformational pSP model further reveals that the only surface-exposed basic residues are in close proximity of the N-glycosylation site within loop 1 (Asn-38 Ϫ5, Ϫ1, ϩ1). An involvement of these residues in the binding of ␤4GalNAc-T3 or -T4 is highly unlikely due to steric hindrance by the glycan substrate. Although previous work had suggested that basic residues within a cis-located ␣-helical peptide may contribute to a LacdiNAc-inducing determinant, we could not obtain any supporting evidence for the existence of such a peptide determinant in the TFF domains of TFF2. Our findings do not rule out the contribution of CA6-related basic peptide determinants to the formation of LacdiNAc on other glycoproteins. Actually, we could confirm, at least for artificial peptides with a high content of basic residues (oligo-His tag), that LacdiNAc synthesis is induced at adjacent O-glycosylation sites in hZP3 probes. 3 However, the more distant oligo-histidine tag in hTFF2-Fl or N-P1 did not induce LacdiNAc synthesis on the N-glycosylation site (refer to the LacdiNAc-negative T68Q/F59Q double mutant probe). The observation that O-linked chains on TFF2 do not express LacdiNAc (Fig. 7) appears to be of considerable importance as it demonstrates that the effect exerted by the hydrophobic patch on activation of ␤4-GalNAc-T3/T4 is dependent on steric aspects.
Biological Relevance of Protein-specific LacdiNAc Expression-The restricted and apparently protein-specific expression of LacdiNAc points to important biological functions. LacdiNAc-positive glycans play vital roles in regulating the circulatory half-lives of pituitary glycoprotein hormones (4 -6) and other glycoproteins (31) including tenascin-R produced by oligodendrocytes and small interneurons in the hippocampus and cerebellum (12). Other glycoproteins containing LacdiNActerminating glycans include human glycodelin, a human glycoprotein with potent immunosuppressive and contraceptive activities (7), and zona pellucida glycoproteins from murine eggs (14). Although still speculative, LacdiNAc expression on ZP3 has been claimed to form the molecular basis of initial sperm-egg binding (14,32). Many human pathogens, among them Schistosoma mansoni, synthesize LacdiNAc (LDN) and fucosylated LacdiNAc glycans, such as GalNAc␤1-4-  sylated LacdiNAc as well as other modified LDN-glycans represent important recognition determinants in adaptive immune responses and are recognized by various carbohydrate-binding proteins within the innate immune system, such as DC-SIGN (33)(34)(35)(36)(37)(38). Another role of LacdiNAc expression in innate immune defense can be seen in the entrapment of Helicobacter pylori in the visco-elastic mucin gel overlaying the gastric surface epithelium. This mucin layer is characterized by the preponderant expression of LacdiNAc-positive MUC5AC, which was recently claimed to play a role in H. pylori colonization (39).
There is a growing body of evidence that ␤4-GalNAc-T3 and LacdiNAc expression is involved in the modulation of tumor cell growth and signaling. ␤4-GalNAc-T3 was shown to predict a favorable prognosis for neuroblastoma and to suppress the malignant phenotype via decreasing ␤1-integrin signaling (40). In the same context the enzyme was demonstrated to regulate cancer stemness and the invasive properties of colon cancer cells through modifying EGFR glycosylation and signaling (41).
Recent work has demonstrated the involvement of LacdiNAc in the self-renewal of mouse embryonic stem cells by regulating LIF/STAT3 signaling (42). Undifferentiated state mouse embryonic stem cells expressed LacdiNAc at higher levels than differentiated state cells, and expression of the glycan on the LIF receptor (gp130) was shown to be required for the stable colocalization of the receptor with lipid raft/caveolar proteins, such as caveolin-1, and for the transduction of a sufficiently strong LIF/STAT3 signal.
The functional implication of LacdiNAc-modified glycans in important biological processes, as outlined above, raises the question of how the specificity of enzymatic sugar transfer is achieved. These considerations have driven our attempts to define the minimal essential elements of a strong determinant as found on TFF domains. Future work will have to show whether the hypothesized role of hydrophobic patches can be generalized to explain protein-specific LacdiNAc formation.